More POIs M
Figure 9. Alignment and Sensitivity Visualization for the Four Reward Types,
Projected Onto a Two-Dimensional Space Representative of the State Space.
Note that the perfectly learnable reward Pi has low alignment through most of the space, and the team reward Ti is
extremely nonsensitive through most of the space, while both instances of the difference reward maintain high performance by both metrics.