presentations. They will also inquire about choosing
a license and about specifying a descriptive name and
authors for a submitted dataset. AAAI could, as a service, provide a list of recommended data repositories.
This list could be modeled on a service provided by
COPDESS, which is a large coalition for publishing
data in the earth and space sciences. 5 Universities
also offer general repositories, whether developed in-house or as installations of general infrastructure
such as Dataverse. University repositories are typically maintained by library departments, and always
offer DOIs, licenses, and citations.
We encourage maintainers of data repositories that
serve the AI community to adopt mechanisms for
assigning DOIs or persistent URLs (PURLs) to datasets
that they provide. The management of PURLs or DOIs
can be complex. We suggest consulting with organizations such as FORCE11 and the Research Data
Alliance, which have working groups with extensive
and detailed recommendations on this topic.
Basic metadata includes a descriptive title, the
dataset’s authors, and creation date. Additional metadata is always valuable to others in terms of understanding and reusing the dataset.
Licenses for Data
Recommended licenses for data are Creative Commons licenses, 9 preferably CC-BY (unlimited reuse as
long as there is attribution) or CC0 (unlimited reuse
Permanent Unique Identifiers for Data
Many authors make data available by providing a
URL to their personal or lab pages. These references
may not last long due to changes in sites and in
author affiliations (Klein et al. 2014). Instead, we
encourage authors to use persistent unique identifiers so that their data is always available. DOIs are
managed by data repositories and given to individual datasets or to collections (DeRisi et al. 2003).
Most data repositories provide DOIs, and for this
they forge an agreement with a DOI authority.
Another option that anyone can use is PURLS. PURLs
can be assigned by anyone to any web resource using
a trusted service such as the W3C’s w3id. 10 Data
repositories also have the option of using PURLs.
A data citation can be directly provided by a data
repository, or it can be constructed by hand. A cita-
tion for a dataset consists of a descriptive name (or
title) for the dataset, its creators, the name of the
repository where it can be accessed, and the perma-
nent URL. For example, a citation for a dataset in Gil
et al. (2017) is:
Adusumilli, Ravali. (2016). Sample datasets used in
(Gil et al. 2017) for AAAI 2017 (Data set). Zenodo.
Note that by simply uploading the dataset to the
Zenodo repository, we obtained the DOI and the citation. Specifying the authors, the name, and the
license takes negligible effort. The author checklist
for data required little time to implement.
Recommendations for Source Code
We refer to source code as the human-readable computer instructions written in plain text and software
as computer programs that are executable by a computer. Typically, source code is compiled to software
for a computer to run it. Our recommendations for
source code are summarized in table 3.
Source Code Repositories
Source code repositories can be used by any scientists
to share code, and as such they are available to the AI
community. These code repositories include general
repositories such as GitHub and BitBucket, and lan-guage-specific repositories such as CRAN for R code
or File Exchange in MATLAB Central. General data
repositories such as those mentioned above accept
source code as an entry, and as with any dataset they
Recommendations Source code used for implementing an AI method and executing an experiment should:
6. Be available in a shared community repository, so anyone can access it
7. Include basic metadata, so others can search and understand its contents
8. Include a license, so anyone can understand the conditions for use and extension of the
9. Have an associated digital object identifier (DOI) or persistent URL (PURL) for the version
used in the associated publication so that the source code is permanently available
10. Be cited and referenced properly in the publication so that readers can identify the version
unequivocally and its creators can receive credit for their work
Table 3. Author Checklist Part II.
Recommendations for source code implementing AI methods and experiments in publications.