diction tasks. We introduce two distinct paradigms for
learning with constraints and show how they may be
used to supervise learning algorithms, particularly
modern methods based on deep neural networks. The
frameworks of the two paradigms are shown in figures
1 and 2. A special case of our framework is a new
approach to semisupervised learning. Our results are
based on work by Stewart and Ermon (2017).
Our work focuses on structured prediction problems,
and uses implicit generative models to learn con-
straints. In this section, we introduce structured pre-
diction and implicit models. The following sections
explore various forms of constraint learning.
Supervised Learning and Structured Predic-
In supervised learning, we are given a training set of
n examples, where each example includes an input xi
(that belongs to X) and the corresponding label yi
(that belongs to Y). We learn a function f, mapping
inputs to labels by minimizing a loss l within a
hypothesis class H. By restricting the space of possi-
ble functions to H, we are leveraging prior knowledge
about the specific problem we are trying to solve.
Alternatively, we may incorporate prior knowledge
by specifying an apriori preference for certain func-
tions in the hypothesis class via a regularization term
R(f). Together, these concepts give rise to the super-
vised learning objective
In this work, we consider the class of functions H
parameterized by convolutional neural networks. We
are also interested in structured prediction problems
(Koller and Friedman 2009), where the outputs yi are
complex vector-valued objects with strong correlation among their components. Examples of structured outputs include vectors, trees, sequences, and
graphs (Taskar et al. 2005; Daumé, Langford, and
Adversarial Training and
Implicit Probabilistic Models
Adversarial training is a particular technique for fitting structured prediction models. It is most widely
used to train implicit probabilistic models (Mohamed
and Lakshminarayanan 2016). Implicit models are
defined as the result of a stochastic sampling procedure, rather than through an explicitly defined likelihood function. A prominent example is generative
adversarial networks (GAN), in which samples are
obtained by transforming Gaussian noise via a neural network G, called the generator.
In this work, we will be interested in placing con-
straints on a probability distribution over the output
f = arg min
; (f (xi ), yi )+ R(f ).
space Y. We are going to define this distribution
implicitly by a sampling procedure that samples an
input and generates a label using a neural network.
Note that evaluating the likelihood of the model
defined by this sampling procedure is typically
Adversarial training of implicit models possesses
two interpretations. The first is that of a mathematical game, in which the generator G tries to fool a discriminator D from distinguishing generated samples
from real samples. This process results in a minimax
objective, which can be optimized through stochastic gradient descent. Alternatively, this process can be
interpreted as minimizing distances or divergences
between distributions, which include approximations of the Jensen-Shannon divergence (Goodfellow
et al. 2014), the Earth Mover’s distance (Arjovsky,
Chintala, and Bottou 2017; Gulrajani et al. 2017), or
differences in statistics between samples from two
distributions (Li, Swersky, and Zemel 2015).
Figure 1. Framework of the Paradigm for
Learning with Explicit Constraints.
Explicit constraints are algebraic or logical formulas that hold over the output space Y and are specified based on prior domain knowledge.
Figure 2. Framework of the Paradigm for
Learning Implicitly from the Data.
Constraints are learned implicitly from data by forcing f to produce outputs
that are indistinguishable from representative outputs Y by an auxiliary discriminator D.