Figure 3. How Academic Research Compares
to Industry and Academic-Industry Collaboration.
Of the 325 empirical papers that were surveyed, 265 of them were written
by researchers from academe only, 47 were collaborations by academia and
industry, and 10 had authors from industry alone.
surveyed empirical research, while figure 5 shows the
same for the combination (C + I) and academic research. When comparing the outline of the spider
plots for academia and all, one can see that they
have very similar forms. This is no surprise, as academic research comprises 81. 5 percent of all papers.
Figure 5 shows that academic research has higher or
equal scores on all variables for the factors Method,
Data, and Experiment as the plots fully envelop the
plots for the C + I research.
An observation is that most of the scores are quite
low. The only variables scoring higher than 50 percent are Pseudo code, Experiment setup, and Training data. Pseudo code is very good for conveying an
AI method in a concise way, so this is very positive.
The fact that 56 percent of the research papers share
the training data is also very positive. Experiment
setup is the highest-scoring variable with a score
of almost 70 percent. However, I have not checked
whether the experiment can be reproduced based on
the description of the experiment setup, so the descriptions of the experiments might not be complete.
Table 5 shows mean and median for the three
factors grouped on research affiliations. The mean
values indicate that the factor Experiment is documented at the same level as Data, and that Method
is documented significantly better for all the surveyed studies. However, the median values of the
factors differ widely with Experiment and Data on
one side and Method on the other, as the median
value for Method is 0.25 while it is 0.00 for the other
two. Hence, the distribution is positively skewed for
Experiment and Data and almost symmetric for
Method. It should be noted that the median values,
surprisingly, are the same for all groups. The factor
Method is, on average, the one best documented.
This observation is supported by both mean and median values. According to the mean values, academic
research is documented better than industry, collaborations, and the combined group of collaborations
and industry research. For the factor Experiment, the
result when comparing academic and the combination between industry and collaborations is statistically significant.
Figure 6 shows one bar chart for each of the three
factors. The y-axis of the bar charts is the frequency
and the x-axis represents the mean value of the variables for each of the factors. The bar chart is not
stacked so the frequency count starts at 0 for all of
them. Let me explain how to interpret the bar charts
by looking at the bar chart for the factor Data.
The x-axis of the bar charts ranges from 0 to 1, and
this range has been divided into five equally sized
partitions, that is, one partition for each variable
that the factor comprises and one partition for those
papers that have documented none of the variables.
As part of the survey, every paper has been scored on
each of the four variables that comprise Data. This
means that a paper that has only documented one of
the four data variables will have a mean for the factor Experiment of 0.25. Hence, it will be put into the
group [0.20, 0.40), and thus increase the frequency
of this group with 1. If a paper has documented all
of the variables, the mean for the factor will be 1 and
the paper will be put into the partition [0.8, 1.0]. The
bar charts allow us to understand the distribution of
the mean of the factors for all the papers that have
been surveyed. As can be seen, the distributions are
similar for all, academic and C + I papers. A total
of 203 papers have not documented any of the variables for Experiment while 167 have not documented
any of the variables of Data. Only 18 papers have
not documented any of the variables of Method.
Table 6 presents the mean and median scores for
each of the three reproducibility metrics, R1D, R2D,
and R3D. Academic research has the highest scores
for all the three reproducibility metrics. Compared
with C + I and collaborations, industry scores higher
on R1D and R3D, but the confidence in the industry