early AI work, mostly by Newell and Simon, and the
RAND Corporation publications. And it convinced me
that we could make a logic of discovery out of the paradigm of search, a constrained search. So that was the
focus within which I got to know Ed and came into
So when Ed offered the opportunity to work on DENDRAL, it was just a godsend because here was an
opportunity — one of the early experiments in
computational philosophy [emphasis added] — to try to do
philosophy but with an empirical bent, namely writing programs that would actually produce something
that was testable. Then started these discussions with
Carl Djerassi’s postdoc Alan Duffield and his reasoning
process about mass spectrometry and the interpretation of mass spectra was just exactly what I needed in
order to instantiate some of those ideas about capturing knowledge, about data interpretation, and then,
subsequently, theory formation.
You’ve got to, I think, want to contrast this work with
other work that was going on at the same time in
which people were acting as their own experts. I could
not, by any means, claim to be an expert in chemistry
or certainly not mass spectrometry. There were other
people though: like Joel Moses [and his colleagues] at
MIT, who was an expert in symbolic mathematics; and
Tony Hearn in symbolic algebra; Ken Colby in psychiatry, Todd Wipke in chemistry. These people were also
doing knowledge elicitation but it was from their own
heads, so it was more like just introspection.
As Buchanan showed, modeling the expertise of
others as opposed to introspective self-modeling did
not fully distinguish the subfield of expert systems
from other areas of artificial intelligence work.
Rather, the development of expert systems relied on
mixtures of both kinds of modeling.
Whether from self-modeling or modeling of oth-
ers, Buchanan and others created a particular kind of
representation of the modeled knowledge known as
production rules, a system of “If, Then” statements.
There was a logician who published a paper, Emil Post
in 1943, using “production rules” as a complete logi-
cal system. That certainly has to be one of the precur-
sors of our work on production systems. Although we
weren’t following it directly, it was certainly there.
Art Samuel’s work on the checker player: Art had interviewed experts to understand … the feature vector and
then he did a good deal of reading about checkers.…
And the influential part about that was … his machine
learning component — that once you had the expertise in, in a first-order form, it could be improved …
automatically. That impressed me a great deal and I
always wanted to be able to do that.
So we subsequently developed a learning program we
called META-DENDRAL that did learn the rules of
mass spectrometry from empirical data. A footnote on
that. The data were very sparse. It took about one
graduate student one year to obtain and interpret one
mass spectrum, so we couldn’t ask for very much data.
This was not a big data problem. And we substituted
knowledge for data in that and we continued to
believe, I continue to believe, that that’s a good trade-
off when you don’t have enough data for the big data
kind of learning.
So just three other things:
John McCarthy’s paper “Programs with Common
Sense” made a very strong case that whatever knowledge a program was using, it had to be in a form that
it could be changed from the outside … that was
something Art Samuel was doing with the feature vector weights, but something also we were doing with
the DENDRAL rules of mass spectrometry that made a
very big difference.
Now, Bob Floyd and Allen Newell developed a production rule compiler at CMU [Carnegie Mellon Uni-versity] and that led to [Feigenbaum’s PhD student]
Don Waterman’s work on representing the knowledge
about poker play in a production system. Don’s work
was extremely influential in giving us the sense that
that was the way to do it.
And, finally, Georgia Sutherland had been working
with Joshua Lederberg on knowledge elicitation and
putting that knowledge into separate tables. They
were not rules, they were constraints for the chemical
structure generator, but they were referenced in a way
that they could be changed as data. Those were in my
mind the most important precursors.
This is not to say that Buchanan and others
believed that these production rules were the last
word in modeling human expert knowledge in a
computer. When asked if he believed that the repre-
sentation of knowledge as a rule had limitations,
Buchanan replied, “We saw a lot.” He continued:
And our friends at MIT and elsewhere were quick to
point out others. We wanted to be testing the limits of
a very simple production rule architecture and we
knew it was limited, we just didn’t know quite where
it would break and why. So that was the nature of
many of the experiments that we subsequently pub-
lished in the MYCIN book [Rule-Based Expert Systems
by Bruce Buchanan and Edward Shortliffe] and I
would encourage people to take a look.
But let me quote from that, “Our experience using
EMYCIN to build several expert systems has suggested
some negative aspects to using such a simple representation for all of the knowledge. The associations that
are encoded in rules are elemental and cannot be further examined except with,” some additional text that
we put into some extra ad hoc slots. So, continuing the
quote, “A reasoning program using only homogeneous
rules with no internal distinctions among them thus
fails to distinguish among several things, chance associations, statistical correlations, heuristics based on
experience, cause of associations, definitions, knowledge about structure, taxonomic knowledge,” all of
those were things that we were failing to capture in the
very simple more or less flat organization.
The modeling of human experts’ knowledge in
expert systems as production rules was provisional,
intended to reveal what kinds of performance they
could produce and what they could not.
From Buchanan’s involvement with knowledge
engineering into the middle 1980s, he drew three