by denoting a specific building or what structures
in the environment the team considers buildings.
The MMI is a gateway device toward a shared
context between robot and human teammates.
It provides flexible methods for a soldier to give
high-level commands to a robot, such as, “Screen
the back of the building,” down to specific goal-driven semantic navigation, such as, “Go to the
town square.” It also enables bidirectional communication of a robot’s state on a continual basis or
through speech dialogues to request scene descriptions or explanations of its behavior (for example,
“Where are you going?” or, “What do you see?”).
Beyond the provision of dialogue with the robot
and a command input, the MMI further attempts
to classify the human teammates’ states to contribute to the others’ environmental context. Information about what is in the environment informs
the MMI about factors that may be changing the
soldier’s decision-making behaviors. Hence, the
MMI is not just a portal into the robot for human
teammates but also a sensor about the humans for
the robot. This sensor facilitates the acquisition
of information for all three categories of context,
capturing what each soldier is doing, where they
are, physiologic and cognitive capacity, and what
information they are communicating to all supporting actors. These specific technical advances to
context-driven AI support natural language communication, world model development, and novel
AI and Natural Language
One important component of the MMI is development that could support natural language communication. Natural language is a capability critical to
facilitating direct human-to-robot mission-specific
communication. Speech is the most commonly used
method of interaction among human teammates.
When a team of agents (human and robot) is performing a shared task, the clarity of the communication and how the context is understood are crucial
for the team’s success. How language is understood
directly affects development of the shared context,
that is, whether teammates interpret the task and
the environment in the same way such that they
can perform the task as one cohesive team. However,
our work in developing AI for natural language
processing brought to light a number of challenges with integration of robot perception and associated cognition for interpreting human-to-robot
communication and performing associated actions
that support the team leader’s intent. The key finding in this research was that the addition of visual
descriptors alone does not provide enough contextual understanding to initiate appropriate robot response (figure 2).
Natural language processing can also support
reduction of ambiguity and improved shared situation awareness by leveraging teammates’ inferences
of environmental or social context. This added capability can support current technical limitations
Figure 5. Assumptive Planning Approach.
This is an example illustration of the assumptive planning approach for a robot given the command, Stay to the left of the building; then
go to the barrel behind the building (Oh et al. 2015b). Steps 1 and 2 show the camera and LIDAR sensor data. Steps 3 to 6 represent the
hypothesized space for robot reasoning about the spatial constraint behind the building needed to locate the hypothesized barrel as a
target goal and generate a plan. Steps 7 and 8 demonstrate the robot’s ability to continuously update its world model of the environment
and its subsequent plan as it perceives more information about the actual environment.
8 traffic barrel