anyway. If an agent is the sole observer of a POI, it
gains the full value of the POI observation.
The difference reward under partial observability,
Di(PO), is calculated in the same manner as Di, but
with restrictions on what agent i can observe. Each
rover evaluates itself in the same way as Di, but
because of the partial observability, it is possible that
two rovers will be observing the same POI from
opposite sides, and neither will realize that the POI is
doubly observed (which does not increase the system
performance), and both will credit themselves. Likewise, each rover cannot sense POIs located outside of
its observation radius. This is represented in figure 7.
Visualization of Reward Structures
Visualization is an important part of understanding
the inner workings of many systems, but particularly
those of learning systems (Agogino, Martin, and
Ghosh 1999; Bishof, Pinz, and Kropatsch 1992; Gal-
lagher and Downs 1997; Hinton 1989; Hoen et al.
2004; Wejchert and Tesauro 1991). Especially in costly
space systems we need additional validation that our
learning systems are likely to work. Performance simu-
lations can give us good performance bounds in sce-
narios that we can anticipate ahead of time. However,
these simulations may not uniformly test the rovers in
all situations that they may encounter. Learning and
adaptation can allow rovers to adapt to unanticipated
scenarios, but their reward functions still have to have
high sensitivity and alignment to work. The visualiza-
tion presented here can give us greater insight into the
behavior of our reward functions. Our visualizations
can answer important questions such as how often we
think our reward will be aligned with our overall goals
and how sensitive our rewards are to a rover’s actions.
Through visual inspection we can see if there are
important gaps in our coverage, and we can increase
our confidence that a given reward system will work
The majority of the results presented in this work
show the relative sensitivity and alignment of each
of the reward structures. We have developed a unique
method for visualizing these, which is illustrated in
Points of Interest Sensor
Points of Interest
Figure 5. Rover Sensing Diagram.
Each rover has eight sensors: four rover sensors and four POI sensors that detect the relative congestion of each in each of
the four quadrants that rotate with the rover as it moves.