Intelligent systems have the ability to adapt and general- ize quickly in the presence of change and uncertainty in their environments. Fundamentally, the success of their
adaptation and learning strategies hinges on the quality of
their representations. Simon (1969, 132), in fact, argued that
“[s]olving a problem simply means representing it so as to
make the solution transparent.”
Building good representations is a challenge of long stand-
ing in artificial intelligence. In this article, we examine this
problem in the context of reinforcement learning, the learn-
ing paradigm in which an agent interacts with its environ-
ment by making observations, choosing actions, and receiv-
ing feedback in the form of a numerical reward. The goal of
the agent is to maximize an expected cumulative measure
over the rewards. Since the environment might be enormous
(as in the case of the game of Go, for example), and the
reward may be sparse, a good representation needs to gener-
alize well not only over observations (or perceptions) but
additionally over multiple time scales.
Constructing Temporal
Abstractions Autonomously
in Reinforcement Learning
Pierre-Luc Bacon, Doina Precup
; The idea of temporal abstraction,
that is, learning, planning, and representing the world at multiple time
scales, has been a constant thread in AI
research, spanning subfields from classical planning and search, to control
and reinforcement learning. While temporal abstraction is a very natural concept, learning these abstractions without human input has proved quite
daunting. In this paper, we present a
general architecture called option-critic
for learning temporal abstractions end
to end from the agent’s experience. This
approach allows for continual learning
and provides interesting qualitative and
quantitative results in several tasks.