30 AI MAGAZINE
Given the importance of both identifying and
designing the right contextual cues for autonomous
system operation, a central design question for such
systems, particularly those that are safety-critical like
those for cars and manufacturing or surgical robots, is
how designers know which cues are the right ones to
emphasize in a display, and when. For example, how
does a designer of the display in figure 1 know that the
message at the bottom of the display, or an accompa-
nying aural warning, is likely not to be effective?
Other critical related design questions include how
exogenous cues influence the perception of intended
design cues, such as road signage and other displays
in the vehicle like the large map display in the inset
of figure 1. Although the investigation of cue interpretation of individual behaviors provides useful insight,
designers of mass consumer products need to understand potentially large population effects such as the
role of culture, age, and experience. Thus, there is a
need for a design evaluation method that can analyze large data sets and identify individual interaction
strategies, but also the influence of potentially secondary variables such as demographic characteristics.
Currently, designers of interfaces for autonomous
systems evaluate how well their cue selections match
mental models and align with system needs by con-
ducting surveys, focus groups, individual usability
testing, and on occasion, statistical hypothesis test-
ing — which typically takes the form of A-versus-B
testing (that is, testing two competing versions of
a design to determine which display more often
produces the desired behavior). Although such meth-
ods contribute to a designer’s understanding, most
of these methods (other than inferential testing) are
highly subjective. While useful for understanding
preferences, subjective evaluations may not address
the true effectiveness of intended design cues.
It has long been established that people are generally not effective at determining what cues influence their judgments (Andre and Wickens 1995).
As a result, designers who elect to use focus groups
and subjective surveys to assess their designs for
cue salience are likely to obtain inaccurate results.
Moreover, although hypothesis-driven tests like
A-versus-B testing provide objective results, they
are often costly to develop, and obtaining statistical
significance can be difficult without large sample
sizes. Moreover, the hypotheses are typically very
narrow by design, and the ability to see the interaction of various factor effects is often lost as part
of the focus on minimizing model error to increase
Figure 1. Tesla Model S Instrument Display Reminding Driver to Put Hands on the Steering Wheel.
The instrument panel below is directly behind
the steering wheel.