Di (PO) Pi
Figure 10. Alignment Visualization for the Perfectly Learnable Reward Pi,
and the Difference Reward Under Partial Observability, Di(PO).
Projected onto the actual plane the rovers operate within.
1 10 100
Figure 11. Final Performance Attained Versus Communication Radius for the Different Reward Structures.
Difference rewards maintain robust performance, but team rewards lose significant performance under restricted communication.