probability model up to a total multiplicative perturbation of η. Acting conservatively confers robustness!
Idea 4: Robust Inference
In addition to robust optimization, robust learning,
and robust decision making, several researchers have
studied methods for robust inference.
One line of research is based on hierarchical
Bayesian models. The central idea underlying the vast
majority of contemporary work on the known
unknowns is to represent our uncertainty in terms of
a joint probability distribution. This can include
treating the parameters of the joint distribution as
hidden random variables and employing probability
distributions to represent uncertainty about their values. These hierarchical models can be represented as
standard probabilistic graphical models, although
exact inference is rarely feasible. Fortunately,
advances in Markov chain Monte Carlo methods now
provide practical ways of sampling from the posterior distribution that results from conditioning on
observations (Neal 1993; Gilks, Richardson, and
Spiegelhalter 1995; Betancourt 2017). Such samples
can be easily applied to make robust decisions (for
example, based on conditional value at risk and other quantile-related measures).
A second line of research has studied extensions of
probabilistic graphical models to capture sets of probability distributions. For example, credal networks
(Cozman 1997; Cozman 2000) provide a compact
method of representing convex sets of probability
measures and then performing inference on them.
Exact inference is generally intractable, but for
restricted classes of credal networks, it is possible to
define an efficient variable elimination algorithm
(Antonucci and Zaffalon 2007).
One important application of probabilistic reasoning is in diagnosis, where the diagnostic system must
iteratively decide which tests to perform in order to
arrive at a diagnosis as quickly and cheaply as possible. One standard heuristic is to compute the expected value of the information (VOI) that will be gained
through each candidate test and perform the test that
maximizes the VOI. Adnan Darwiche and his collaborators have studied a robust version of this problem
where they perform the test that is most likely to
result in a diagnosis that is robust in the sense that
further tests will not change the diagnosis (Chen,
Choi, and Darwiche 2014, 2015).
Robustness to the
What ideas does the AI research community have for
creating AI systems that are robust to unmodeled
aspects of the world? In this section, I will discuss four
ideas that I am aware of. I expect there are others, and
I hope we can extend this list as we do more research
in this direction.
Idea 5: Detecting Model Failures
When an AI system’s model is inadequate, are there
ways to detect this prior to taking an action that
could result in a serious error?
In machine learning, the model can fail when the
distribution of training objects Ptrain and the distribution of test objects Ptest (on which the learned model
will be applied to make predictions) are different.
Learning theory only provides guarantees when Ptrain
= Ptest. There are many ways that the training and testing distributions can be different. Perhaps the setting
that best illustrates this problem is open category
classification. Let me describe it with an example
from my own work.
To monitor the health of freshwater streams, scientists monitor the population of insects that live in
these streams. In the United States, the Environmental Protection Agency conducts an annual randomized survey of freshwater macroinvertebrates belonging to the families of stoneflies, caddisflies, and
mayflies. The specimens are collected using a kicknet
and brought back to a laboratory where each insect
must be identified to at least the level of genus. This
is a time-consuming and tedious process that requires
substantial expertise. Our research group at Oregon
State trained a computer vision system to recognize
the genus of a specimen from an image. We created a
training set of images of 54 taxonomic groups covering most of the stoneflies, caddisflies, and mayflies
found in the US Pacific Northwest. Figure 10 shows
images of some of these taxa as captured by our photographic apparatus.
Evaluations on a separate test set showed good predictive accuracy. However, when we considered
deploying this to process real kicknet samples, we
realized that those samples would contain lots of other things beyond the 54 categories that our vision system had learned to recognize. There are often leaves
and twigs, and there are also other species of bugs and
worms. Following standard machine-learning practice, we had trained our system using discriminative
training, which has been repeatedly shown to produce higher recognition accuracy than the alternative
method of generative training. Discriminative training divides the image space into 54 partitions separated by decision boundaries. The result is that any
possible image will fall into one of the 54 partitions
and be assigned to one of the 54 insect categories that
the system was trained to recognize. Hence, any
image containing a leaf, twig, or bug belonging to a
“novel” species would be guaranteed to be misclassified. One might hope that these novel items would
fall near to the decision boundaries and, hence, result
in lower-confidence predictions. This is sometimes
true, but when we attempted to define a rejection rule
— a rule for abstaining when the predictions have
low confidence — the result was an equal error rate of
more than 20 percent. That is, 20 percent of images
from the 54 taxa were misclassified as novel, and 20