Matei Zaharia (Stanford University and Databricks)
presented Mlflow, a cloud platform for large-scale
data analytics and machine learning that supports
experiment tracking, comparing experiments, reusable
workflows, and more. Matei argued that reproducibility actually matters more for practitioners than
scientific researchers and discussed why developing
machine-learning systems is harder than traditional
software development. Odd Erik Gundersen presented
arguments for why a framework for measuring reproducibility is needed and suggested characteristics it
Joel Grus (Allen Institute for AI) shared his not-so-very-secret opinion of not liking Notebooks
because they make reproducibility harder. Many
well-formed arguments were made, and good examples of what to do instead were shown. Daniel
Garijo (University of Southern California) explained
the requirements of the scientific paper of the
future. The future entails papers that contain not
only text but also data, software, experiment setup,
and dependencies, while supporting open science
and a digital scholarship. Examples of how this
can be achieved were given. Hugo Jair Escalante
(The ChaLearn Collaboration) presented work on
machine-learning challenges and how these can be
used to establish benchmarks and fair comparison
After the lunch break, which was far too short,
Peter Bull (DrivenData) talked about how we
should apply the lessons learned from 50 years of
software development to increase reproducibility.
Prabhat Nagarajan (University of Texas at Austin)
explained how they were able to achieve deterministic implementations on deep reinforcement
learning algorithms. Surprisingly, although deterministic on individual computers, because of the
parallelism of GPUs, the results will differ between
computers and hence are irreproducible. Yuandong
Tian rounded off the presentations by explaining
the work at Facebook AI Research on reproducing
AlphaZero on the ELF platform. The presentation discussed how superhuman performance was achieved,
and a thorough ablation analysis of the system was
After the presentations, a lively panel discussion
was moderated by Yolanda Gil. The panel con-
sisted of Pascal Van Hentenryck, Ashok Goel, and
Odd Erik Gundersen, but the audience had many
questions and comments as well. The panel discus-
sion started with short introductions. Professor Van
Hentenryck presented statistics showing that papers
with supplemental material were more likely to be
accepted at AAAI 2019 than papers without. He also
argued for the controversial view that reproduci-
bility could be linked to publication. Ashok Goel
highlighted an issue related to reproducibility and
one-shot robot learning from demonstration caused
by the variation of the human instructors. Also,
Ashok talked about AI Magazine’s commitment
to reproducibility by introducing a new column on
the topic. Among other things, the panel discussed
how the research community is reluctant to make
reproducibility a key concern although it is a key
component of science.
Finally, most of the workshop participants joined
in on a working session with the goal of making a
roadmap for how to increase the reproducibility of
results published by AAAI. The participants worked
in four different groups and proposed concrete
actions for the various actors in the research community on what they could do. After discussing
and making a list of actions in the groups, the four
groups presented their proposals to the other participants. One hour on overtime, after an energetic
final session, the workshop was concluded. The
workshop was organized by the cochairs Yolanda
Gil, Joelle Pineau, Satinder Singh, and Odd Erik
Gundersen. This report was written by Odd Erik
Guy Barash is affiliated with Western Digital.
Mauricio Castillo-Effen is a senior researcher at Lockheed
Niyati Chhaya is affiliated with Adobe Research.
Peter Clark is a senior research manager at the Allen Institute for Artificial Intelligence.
Huáscar Espinoza is principal researcher at Commissariat à
l´Énergie Atomique, France.
Eitan Farchi is affiliated with IBM Research, Haifa.
Christopher Geib is affiliated with SIFT LLC.
Odd Erik Gundersen is the chief AI Officer at TrønderEnergi
AS and an adjunct associate professor at Norwegian University of Science and Technology.
Seán Ó hÉigeartaigh is the executive director of the University of Cambridge’s Centre for the Study of Existential
Risk and program director at the Leverhulme Centre for the
Future of Intelligence.
José Hernández-Orallo is a professor at the Universitat
Politècnica de València, Spain.
Chiori Hori is a principal research scientist of Mitsubishi
Electric Research Laboratories (MERL), USA.
Xiaowei Huang is a lecturer at the Department of Computer
Science, University of Liverpool, UK.
Kokil Jaidka is affiliated with Nanyang Tech University.
Pavan Kapanipathi is a research staff member at IBM
Sarah Keren is affiliated with Harvard University.
Seokhwan Kim is a research scientist at Adobe Research,
Marc Lanctot is a research scientist at DeepMind Alberta,
Danny Lange is vice-president of AI and machine learning
at Unity Technologies.