Type of Paper Method Code Experiment Code Hardware Specification Software Dependencies
All 0.08 ± 0.03 0.06 ± 0.02 0.27 ± 0.05 0.16 ± 0.04
Academic 0.09 ± 0.03 0.06 ± 0.03 0.30 ± 0.06 0.18 ± 0.05
Collaboration 0.04 ± 0.06 0.04 ± 0.06 0.13 ± 0.10 0.04 ± 0.07
Industry 0.10 ± 0.20 0.10 ± 0.20 0.20 ± 0.26 0.20 ± 0.26
C+I 0.05 ± 0.06 0.05 ± 0.06 0.14 ± 0.09 0.07 ± 0.07
Table 4. The 95-Percent Confidence Interval for the Mean
of All Variables of the Factor Experiment for the Different Types of Papers.
= 1. 96 and ˆ / . x x N ε σ σσ =
Shown separately and together, here the axes and
sizes of the dots are individual papers’ scores on
R1D-, R2D-, and R3D-reproducibility metrics.
The results, although not statistically significant, paint
a clear picture: The quality of the documentation
shared by industry is lower than the documentation
shared by academia. Given the assumption that it
would be harder to reproduce research results that
are poorly documented than results that are well
documented, it would be easier to reproduce results
from academia than from the C + I group. Out of
the 16 variables that the survey covered, the aca-
demic papers have higher scores on 15 variables
when compared with the C + I. The variable Problem
description has the same score for academic and
C + I. This means that academia scores better on 94
percent of the variables. Also, academia scores bet-
ter on all three factors as well as the mean of the
reproducibility metrics. The median is the same for
academia and C + I on all the reproducibility metrics.
To be fair, there is still much to desire when it comes
to documentation quality of AI research accepted
at the top conferences — whether the research is
presented by academic researchers, collaborations,
or industry researchers.
Does academia share more of the data than industry? The answer is yes, academia scores higher than
the C + I group for all the four variables describing
the Data factor. The results, however, are not statistically significant, except for the variable training
data. However, the scores for data sharing are relatively high. Academia shares the training data in
over 60 percent of the papers, while this is true for
only 40 percent of the papers in the C + I group.
Academia shared more code than industry as
well, both method code ( 9 percent versus 5 percent)
and experiment code ( 6 percent versus 5 percent)
Industry shares the same amount of code whether it
is for setting up the experiment or for implementing
the AI method; academia shares more AI method code
than the code used for setting up the experiment.
One of the questions I asked in the introduction
was whether one could expect industry to more easily
share code than data. The premise is that data hold
the most value, as data are used to generate machine
learning models; however, without the data, the
value of the model is low, so the foregoing premise
is refuted. Interestingly, the difference between data
sharing and code sharing for industry is large ( 40
percent versus 5 percent). How can this be so? Does
this indicate that industry values the code used for
running the experiments more highly than the data?
Is the code used when conducting the experiments,
the same code that will be used in production? This
does not sound right.
Typically, experiment code is used for prototyping.
Different code that has been through proper quality
assurance is typically deployed, especially for large
companies. Startups might not follow this practice
for obvious reasons. Is there something else that lies
behind? Could it be that industry is less willing to
spend time on maintaining the code or answer questions related to it than academia is? Does industry
have higher expectations for code quality than academia has, and does not want to share the code because of this? Or could it be that the code specifies the
hyperparameters and other experiment settings, and
hence renders the complete experiment transparent?
Why are industry researchers eight times more
willing to share data than code? Is the data shared
not that valuable for industry? Does industry share
data that are relevant for proving their methods, but
which have little value to competitors? Does industry use open data shared by others to prove their
methods and in this way share nothing — not the
code and not their own data? I have not investigated
these questions in my study.
Hyperparameters could be documented both as
part of the experiment code and in the experiment
description where the setup is explained. Although
the experiment code is not shared to a large degree
(only 5 percent for C + I), the experiment setup is