“Does it look like an arch?” and also “Would I need to
change hands to push something through it?”
(which, if I’m evaluating the St. Louis arch, would
require me to imagine I am a giant). This redundancy helps us be more robust to unusual arches.
Unfortunately, virtually all of our current AI sys-
tems understand things only one way. Consider, for
example, the recent work on image captioning. Fig-
ure 15 shows the output of the Berkeley image-cap-
tioning system. The result seems impressive until you
realize that the computer vision system has a very
narrow understanding of cats, chairs, and sitting. It
has developed a good model of the kinds of images
that people will label with keywords such as cat and
chair. This is an impressive accomplishment, because
there is a high degree of variability on the appearance
of these objects. However, this vision system has not
learned to localize these objects within the image, so
it knows nothing about the typical size of cats versus
chairs, for example. It chooses to include the word sit-
ting based on word co-occurrence statistics: when
people write captions for images that contain both
cats and chairs, they often use the word sitting.
Beyond the task of linguistic description, the sys-
tem doesn’t know anything about the typical context
in which a cat is sitting on a chair. It doesn’t know
that there is a human who owns the cat and the chair.
It doesn’t know that the cat is preventing the human
from sitting on the chair and that the human is often
annoyed by this because the cat also leaves hair on
the chair. It therefore can’t predict that the cat will
soon not be sitting on the chair.
In my view, an important priority for AI research is
to find ways to give our computers multifaceted
understanding of the world. To do this with machine
learning, we need to give our computers experience
performing tasks, achieving goals through natural
language dialogue, and interacting with other agents.
The greater the variety of tasks that the computer
learns to perform, the larger the number of different
facets it will acquire, and the more robust its knowl-
edge will become.
Figure 14. Performance of SATzilla on HANDMADE Problem Set.
Originally published in Xu et al (2008) (figure 8, p. 594). Reprinted with permission.
10− 1 100 101 102 103
Runtime [CPU sec]
Presolving Avg Feature