artificial neural networks to learn the expected
reward for attacking or fleeing with a particular unit
in a given state (figure 4), and chose the action with
the highest expected reward when in-game. The system learned to beat the inbuilt StarCraft AI scripting
on average in only small three-unit skirmishes, with
none of the variations learning to beat the in-built
scripting on average in six-unit skirmishes (Shantia,
Begue, and Wiering 2011).
RL techniques have also been applied to other RTS
games. Sharma et al. (2007) and Molineaux, Aha, and
Moore (2008) combine case-based reasoning (CBR)
and RL for learning tactical-level unit control in
MadRTS7 (a description of CBR is presented later on
in this article). Sharma et al. (2007) was able to
increase the learning speed of the RL agent by begin-
ning learning in a simple situation and then gradual-
ly increasing the complexity of the situation. The
resulting performance of the agent was the same or
better than an agent trained in the complex situation
Their system stores its knowledge in cases that pertain to situations it has encountered before, as in
CBR. However, each case stores the expected utility
for every possible action in that situation as well as
the contribution of that case to a reward value, allowing the system to learn desirable actions and situations. It remains to be seen how well it would work
in a more complex domain.
Molineaux, Aha, and Moore (2008) describe a system for RL with nondiscrete actions. Their system
retrieves similar cases from past experience and estimates the result of applying each case’s actions to the
current state. It then uses a separate case base to estimate the value of each estimated resulting state, and
extrapolates around, or interpolates between, the
actions to choose one that is estimated to provide the
maximum value state. This technique results in a significant increase in performance when compared
with one using discrete actions (Molineaux, Aha, and
Human critique is added to RL by Judah et al.
(2010) in order to learn tactical decision making for
Agent Internal State
t;;4BNF;UBSHFU;BT;t –; 1
X2 XJ XO
Agent Environment (Grids)
Figure 4. Game State Information Fed into a Neural Network to Produce an Expected Reward Value for a Particular Action.
Adapted from Shantia, Begue, and Wiering (2011).
Table 1. AI Techniques Used for StarCraft.
Tactical Decision Making Strategic Decision Making
and Plan Recognition
State Space Planning