percent of the images of novel objects were misclassified as belonging to one of the 54 taxa. This is unac-ceptably high.
Several research groups have been studying the
problem of open category learning (Scheirer et al.
2013; Da, Yu, and Zhou 2014; Bendale and Boult
2015; Rudd et al. 2016; Steinhardt and Liang 2016).
At Oregon State we have been experimenting with
the architecture shown in figure 11 in which each
input query x is first analyzed by an anomaly detector to compute an anomaly score A(x). If A(x) is
greater than a specified threshold τ, the query is
judged to be anomalous relative to the training examples and rejected. If the anomaly score is smaller than
the threshold, then the trained classifier makes its
prediction. We evaluated this method on the Letter
Recognition task from the University of California,
Irvine (UCI) machine-learning repository (Lichman
2013). We trained an isolation forest anomaly detector (Liu, Ting, and Zhou 2012) on the classes corresponding to the letters ‘A’ and ‘B’ and then measured
how well it could detect that new examples belonged
to these classes versus novel classes (the letters ‘C’
through ‘Z’). Figure 12 plots an ROC curve for this
problem. The dot corresponds to applying the
method of conformal prediction (Shafer and Vovk
2008) to assess the confidence in the classifications.
The ROC curve is significantly above and to the left
of the dot, which indicates that the anomaly detector
is able to do a better job. However, note that in order
to achieve fewer than 5 percent missed alarms (novel
objects incorrectly classified as known), we must suffer a false alarm rate of 50 percent (known objects
rejected as being novel), so there is a lot of room for
One thing that makes open category classification
Figure 10. Images of Some Freshwater Macroinvertebrates.