Speech and Natural Language Processing
Knowledge Representation, including knowledge
graph statistics and more
Again, these candidate areas are illustrative and
will surely evolve by the time the initial index is
defined. The index will also track progress in certain
key application domains that combine performance
on the more specific technical areas, such as self-dri-
ving cars, game playing, or conversational bots. Much
will depend on the availability of data and its charac-
teristics. This brings up serious methodological chal-
lenges, more on which will follow in the next section.
The third dimension is aimed at tracking the
impact of AI on society, be it on employment and
other economic aspects, finance, medicine, education, transportation, government, military, and
beyond. If the first two dimensions focus on the production side of AI, this third one focuses on the consumption side. Consumption is arguably the most
important aspect to understand, and the hardest. It’s
not clear what to track, the data is highly diffuse, and
the problem of “credit or blame assignment” (to what
degree AI is responsible for changes taking place)
looms large. We felt that at this stage tackling this
aspect would cause the project to grind to a halt and
decided to forego it in the first versions of the AI
Index. At a minimum, delivering on the first two
components of the index will provide data for other
researchers to do more complex and grounded analyses of the societal implications. For now, the only nod
in this direction that we feel comfortable giving at
this stage is some gauge of public interest in AI, both
the level of interest (for example, as indicated by
Google Trends) and perhaps some measure of “
sentiment analysis” in the general media.
Finally, we are considering including an element of
subjective, expert commentary. We imagine that the
Index will be published often (perhaps continually),
but that at some cadence (for example, annually) a
report will be issued that will present the findings for
the period. This could be an opportunity for a panel
of experts to provide commentary on it, add information not captured by the Index, and perhaps make
predictions on where AI is headed.
Designing an AI index is not a trivial intellectual task.
We’re not aware of a similar effort to track a scientific
or technological area. There are some measures of
specific aspects of an area, but not of the entire area.
For example, Moore’s law has been very influential,
but of course it hardly captures all aspects of progress
in hardware development. We obviously think at-
tempting a broad index is a worthwhile effort, but
we’re not blind to the challenges. Some of them fol-
A first challenge is the availability of data. There are
some well-established benchmarks in certain area
such as vision, machine learning, satisfiability, and
planning. But there’s the “drunk and lamppost” danger of ignoring key areas in which such benchmarks
are lacking. Here again I remind that our approach is
to aim for representativeness rather than exhaustiveness. We hope to end with a set of pillars that together span AI reasonably well. And if we (as a community — more on this below) feel that certain key areas
aren’t represented, to help catalyze an effort to create
benchmarks in these areas.
A second challenge is the instability and discontinuity of the data. Financial indexes are an inspiration, but can be misleading if taken literally. In a fast-moving area the benchmarks are a moving target for
two reasons, one shallow and one deep.
In certain areas (such as the annual propositional
satisfiability problem [SAT] competition) the benchmarks are changed simply because they weren’t established with an eye toward tracking progress in a reliable, quantified way. Our philosophy here is twofold.
First, where a stable signal can be extracted from the
existing data, whether by insights into the domain or
algorithmically, to help extract it. And second, when
this isn’t possible, to work with the community to
establish a more stable benchmark.
More fundamentally, new areas emerge that simply
didn’t exist previously, and existing methods and
data sets inevitably become obsolete as knowledge
and technology advance. Here we will need to continually revisit the components of the index, and
update them to reflect the current status of the field.
In a sense this isn’t that radical; the S&P 500 routinely swaps stocks in and out of the index. But this will
be trickier in the case of the AI Index. First, it’s likely
that more subjective judgment calls will be needed
than in the case of the S&P. But more deeply we will
need to embrace discontinuity. We imagine a framework of “punctuated continuity” whereby for a period of time progress is tracked in a uniform, measure-able way, and at some point the measure is replaced
by a new one, since the old one has been “solved
away.” Rather than be dismayed by this, we should
note and celebrate this achievement, and start tracking the new measures.
Finally, there’s a challenge of creating a composite
index out of a heterogeneous set of data. To take a
(relatively) simple example, how do you turn the
number of conference attendees, venture capital
investment dollars, and the number of job openings
into a composite measure of level of activity? Or, how
do you roll up progress in different facets of machine
learning into a composite measure of machine-learning progress? There is a certain methodology of index