Please Join Us in Zürich, Switzerland
from July 5–8, 2018, for the
Sixth AAAI Conference on
Human Computation and
Partitioning. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05), 816–823. New York:
Association for Computing Machinery.
Singh, S. P. 1992. Reinforcement Learning with a Hierarchy
of Abstract Models. In Proceedings of the 10th National Conference on Artificial Intelligence, 202–207. Menlo Park, CA:
Stolle, M., and Precup, D. 2002. Learning Options in Reinforcement Learning. In Abstraction, Reformulation and
Approximation, 5th International Symposium (SARA
2002). Lecture Notes in Computer Science 2371, 212–223.
Berlin: Springer. doi.org/10.1007/3-540-45622-8_16
Sutton, R. S. 1984. Temporal Credit Assignment in Reinforcement Learning. Ph.D. dissertation, University of Massachusetts Amherst.
Sutton, R. S. 1988. Learning to Predict by the Methods of
Temporal Differences. Machine Learning 3( 1): 9–44. doi.org/
Sutton, R. S. 1995. TD Models: Modeling the World at a Mixture of Time Scales. In Machine Learning, Proceedings of the
Twelfth International Conference on Machine Learning, 531–
539. Amsterdam, The Netherlands: Elsevier
Sutton, R. S. 2012. Beyond Reward: The Problem of Knowledge and Data. In Inductive Logic Programming: 21st International Conference, 2–6. Berlin: Springer.
Sutton, R. S., and Barto, A. G. 1998. Introduction to Reinforcement Learning. Cambridge, MA: The MIT Press.
Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y.
1999. Policy Gradient Methods for Reinforcement Learning
with Function Approximation. In Advances in Neural Information Processing Systems 12, 1057–1063. Cambridge, MA:
The MIT Press.
Sutton, R. S.; Precup, D.; and Singh, S. P. 1998. Intra-Option
Learning About Temporally Abstract Actions. In Proceedings
of the Fifteenth International Conference on Machine Learning
(ICML’ 98), 556–564. San Francisco: Morgan Kaufmann.
Sutton, R. S.; Precup, D.; and Singh, S. P. 1999. Between
MDPS and Semi-MDPS: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112(1-
2):181–211. doi.org/10.1016/S0004-3702( 99)00052-1
Tanner, B.; Bulitko, V.; Koop, A.; and Paduraru, C. 2007.
Grounding Abstractions in Predictive State Representations.
In Proceedings of the 20th International Joint Conference on
Artifical Intelligence (IJCAI’07), 1077–1082. San Francisco,
CA: Morgan Kaufmann Publishers Inc.
Tesauro, G. 1995. Temporal Difference Learning and TD-Gammon. Communications of the ACM 38( 3): 58– 68.doi.org/
Tesauro, G.; Gondek, D.; Lenchner, J.; Fan, J.; and Prager, J.
M. 2013. Analysis of Watson’s Strategies for Playing Jeopardy! Journal of Artificial Intelligence Research 47: 205–251.
Thrun, S., and Schwartz, A. 1995. Finding Structure in Reinforcement Learning. In Advances in Neural Information Processing Systems 7. Cambridge, MA: The MIT Press.
Watkins, C. 1989. Learning from Delayed Rewards. PhD dissertation, King’s College, Cambridge, UK.
Pierre-Luc Bacon is a PhD candidate in the School of Computer Science at McGill University, whose research interests
are in reinforcement learning. He received the Outstanding
Student Paper Award at AAAI’2017 for his work on the
Doina Precup shares her time between McGill University,
where she codirects the Reasoning and Learning Lab, and
DeepMind Montreal. Her research interests are in machine
learning, especially reinforcement learning, reasoning and
planning under uncertainty, and applications of these
methods. She became a senior member of AAAI in 2015, a
Canada research chair in 2016, and a senior fellow of CIFAR