humans have the passion and consistency to tag and
manage their own unstructured data ... look at your
own hard drive or your organization’s file shares if you
doubt it. This is one of the main reasons why so much
unstructured corporate data is “lost in the cloud.” It
may be there, but you are likely to struggle to find it if
you didn’t write it yourself. More than half of employees in companies surveyed worldwide express deep
dissatisfaction with the findability of corporate information. 4 In contrast to Internet content, today it is
rare to see search engine optimization applied to
But now armed with billions of crowdsourced
examples from the web, we have learned that data-driven, statistical methods are “unreasonably effective” in several domains. The statistics bring the ability to deal with noise and to cover problems where
humans either have difficulty explaining how they
do it, or where they don’t do it very well in the first
The bottom line is that machine learning is a way
around the knowledge-acquisition bottleneck in a
surprisingly broad number of domains, but two
caveats are worth considering:
Howie Shrobe made an observation that rings true. “...
when you look closer at successful statistical approaches, a lot of the success is in the choice of features to
attend to or other similar ways of conveying human
insight to the technique ...” (private communication).
Indeed, mitigating this problem is a focus of some
research on deep learning algorithms — to learn feature representations from unlabeled data. 5
There is a very long tail on the types of problems
encountered in the world. Developers will not have
millions of examples for all of them. In those cases,
some kind of reasoning is essential; for example, from
basic principles captured via case-based reasoning or
encoded in a rule-based system.
Apps Can Be Built with Components
That Reason from Different Starting Points.
In the early days of expert systems running on
machines with relatively little processing power and
memory, the standard starting point for delivering
domain and task-specific knowledge can be characterized by labels like slow, cognition, search, top-down,
Today, armed with the compute power, data, and
machine-learning algorithms now available to us, we
are much better equipped to build apps that reason
Figure 7. The Dipmeter Advisor System.