Because a model of every aspect of the world would
be extremely complex, this equation tells us that to
learn the parameters of such a model, we would need
an extremely large set of training data.
These arguments drive us to the conclusion that
every AI system will need to act without having a
complete and correct model of the world.
Digression One: Uncertainty
and the History of AI
Before we explore methods for achieving robust AI
systems, let’s pause for a moment to consider the role
of uncertainty in the history of AI. We can divide this
history into three periods. The period from 1958 to
1984 can be called the period of the known knowns.
AI research focused on reasoning and search: meth-
ods for theorem proving, planning in deterministic,
fully observed worlds (the blocks world), and games
of perfect information (checkers and chess). Such ful-
ly known worlds are not devoid of uncertainty, but
the uncertainty can be resolved by deeper search and
additional reasoning. The uncertainty is a conse-
quence of incomplete computation rather than lack
of knowledge (Dietterich 1986).
Beginning around 1980, AI researchers started
attacking applications, such as medical diagnosis, in
which observations (for example, symptoms, lab
tests) are processed to make uncertain inferences
about hidden variables (such as diseases). The field of
uncertainty in AI was founded, and Judea Pearl and
colleagues developed practical ways to deploy probability theory to represent uncertainty about the values of large sets of variables (Pearl 1988). This period,
from 1980 to the present, could be called the period
of the known unknowns. The dominant methodology is to identify those variables whose values are
uncertain, define a joint probability distribution over
them, and then make inferences by conditioning on
observations. A wave of textbooks have been published with titles such as Probabilistic Graphical Models (Koller and Friedman 2009), Probabilistic Robotics
(Thrun, Burgard, and Fox 2005), Machine Learning: A
Probabilistic Perspective (Murphy 2012), and, of course,
Artificial Intelligence: A Modern Approach (Russell and
Norvig 2009). Recently, probabilistic programming
languages have been developed to make it easy to
define and reason with highly complex probabilistic
models (Gordon et al. 2014, Pfeffer 2016).
I believe we have now entered a third period of
AI—the period of the Unknown Unknowns. In this
period, we must develop algorithms and methodolo-
gies that enable AI systems to act robustly in the pres-
ence of unmodeled phenomena.
error rate ; model complexity
training data size
. Digression Two: Robustness
In computer science, an important paradigm for ana-
lyzing and solving problems is to formulate them in
terms of optimization. By stating the optimization
objective, we gain clarity about what counts as a solu-
tion, and we can prove guarantees on the correctness
of our systems. Examples abound. In machine learn-
ing, we often seek maximum likelihood estimates for
the parameters in our probabilistic models. In per-
ception, we wish to estimate the depth of each pixel
in an image or the most likely sequence of words spo-
ken by a person. In planning, we seek the optimal
plan or the plan that maximizes the expected cumu-
lative discounted reward.
However, the optimization paradigm is not robust:
it assumes that the optimization objective is correct.
The optimum is often attained on the boundary of
the feasible region (such as in linear programming) —
precisely where the model is most likely to be incorrect. In machine learning, for example, maximizing
the likelihood is well known to cause overfitting and
result in poor predictive performance.
In biological evolution — in contrast — natural
selection can be seen to select organisms that survived threats from a complex and uncertain environment. The internal models that those organisms
might possess are certainly not complete and may not
even be particularly accurate, but the organisms are
robust. Evolution does not optimize an objective; it
does not necessarily lead to increases in complexity
or intelligence. Instead, it can be viewed as optimizing robustness (Kitano 2004, Whitacre 2012, Félix
and Barkoulas 2015).
Biology also relies on maintaining diverse populations of individuals. This can be viewed as a “
portfolio” strategy for robustness, much along the lines that
Minsky suggested in my opening quotation. Even
within individuals, we often find redundancy. Many
organisms have multiple metabolic pathways for producing critical molecules. Each of us has two copies of
our genes, because (with the exception of the sex
chromosomes of males), we carry two of each chromosome. This allows recessive alleles to be passed on
to future generations even though they are not
expressed in the current one.
Finally, biological organisms disperse spatially,
which confers robustness to spatially localized disturbances such as droughts, fires, landslides, and diseases.
Perhaps biology has lessons for us as we seek to create robust AI systems?
Approaches to Robust AI:
The Known Unknowns
The goal for the remainder of the article is to make an
inventory of ideas within the AI community for
improving the robustness of our systems. I will begin