“solvable” instances are used for evaluation.
Second, while replaying hints from human games
offers the great benefit of fully automated evaluation,
successive hints in the replay do not take previous
guesses into account, as they did when the original
game was played by human players.
Unless competent Describer agents are implemented (which would, however, make the performance of a given Guesser dependent on the quality of
the Describer), the only solution to this problem is to
introduce human-based evaluation, where a Guesser
plays against a human Describer who can take past
guesses into account. Undoubtedly, this would also
encourage implementations of more interesting dialogue strategies in Guesser agents, which is something we would like to see.
We believe that, even if these two problems were
solved, the scenario would remain challenging in the
future — after all, the success of the commercial
board game version suggests that humans find it
challenging enough to get enjoyment and suspense
out of playing it repeatedly.
Future Plans
We aim to continue running the competition, and
expect that its next installment will take place in
spring 2018, with results presented at the joint IJCAI
ECAI conferences in Stockholm, Sweden in July
2018. We are currently planning to add a Describer
track to the current Guesser track, and to explore
human-based evaluation as an additional way of
assessing entries.
Using data from unsuccessful games is an avenue
we wish to explore further. Our experience regarding
the difficulty of the task even when using only successful human games, however, suggests that this
direction may only become relevant once the submitted solutions achieve a higher performance on
the current, simpler task.
Acknowledgements
This research has been funded by the European Com-
munity’s Seventh Framework Program (FP7/2007-
2013) under grant agreement no. 607062, ESSENCE:
Evolution of Shared Semantics in Computational
Environments. We wish to thank the participants in
the competition, and the volunteers who helped
with the process of crowdsourcing Taboo words and
human games.
Notes
1. See www.essence-network.com/challenge for further
details.
2. www.crowdflower.com.
3. The app is available from the Google Play Store at
play.google.com/store/apps/details?id=com.guessence.iiia
.essence.
References
Adrian, K.; Bilgin, A.; and Van Eecke, P. 2016. A Semantic
Distance–Based Architecture for a Guesser Agent in
ESSENCE’s Location Taboo Challenge. Paper presented at
the International Workshop on Diversity-Aware Artificial
Intelligence (DIVERSITY 2016), The Hague, the Nether-
lands, July 26.
Clark, P., and Etzioni, O. 2016. My Computer Is an Honor
Student — but How Intelligent Is It? Standardized Tests as a
Measure of AI. AI Magazine 37( 1): 5–12. doi.org/10.1609/
aimag.v37i1.2636
Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.;
Kalyanpur, A. A.; Lally, A.; Murdock, J. W,; Nyberg, E.;
Prager, J.; Schlaefer, N.; and Welty, C. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine 31( 3):
59–79. doi.org/10.1609/aimag.v31i3.2303
Levesque, H. J. 2011. The Winograd Schema Challenge. In
Logical Formalizations of Commonsense Reasoning — Papers
from the AAAI 2011 Spring Symposium. Technical Report SS-
11-06.
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van
den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Pan-neershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.;
Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach,
M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis D. 2016.
Mastering the Game of Go with Deep Neural Networks and
Tree Search. Nature 529(7587): 484–489. doi.org/10.1038/
nature16961
Michael Rovatsos is a reader at the School of Informatics of
the University of Edinburgh, where he has led the Agents
Group since 2004. He has published over 90 papers in multiagent systems on topics related to agent communication,
multiagent planning, multiagent learning, and argumentation, and he is the overall coordinator for the 4-million euro
ESSENCE Marie Curie Initial Training Network, which conceived of and organized the Taboo Challenge Competition.
Dagmar Gromann is a postdoc researcher at the Artificial
Intelligence Research Institute (IIIA) in Spain and an experienced researcher in the ESSENCE Network. Her research
focuses on learning cognitive schemas and knowledge representations from multilingual texts using machine learning and distributional semantics approaches, as well as
aligning domain-specific resources.
Gábor Bella is a research associate at the University of Edinburgh and at the University of Trento. He is a senior member of the ESSENCE Network. His main area of study is mul-tilingualism in computer systems, with a current focus on
cross-lingual and domain-aware semantic interoperability
(for example, data integration, ontology matching) over
structured data sets.