A powerful strategy employed by materials scientist
is high-throughput materials discovery (Green,
Takeuchi, and Hattrick-Simpers 2013), the idea being
to rapidly synthesize thousands of different materials
and quickly screen them for desirable properties.
These materials are developed by depositing different
elements on a wafer in varying amounts, which is
analogous to atomic spray painting, where elements
are mixed with different proportions, rendering simple mixtures as well as enabling the emergence of new
materials, just as the mixture of primary colors results
in both obvious mixtures and secondary colors.
In this article, we address the phase-mapping problem, a central problem in high-throughput materials
discovery, which has critically lacked an efficient
solution method. At a fundamental level, the phase-mapping problem entails demixing data measurements in terms of a few simple components or crystal
structures, each describing a single material, subject
to intricate constraints on the solutions induced by
the physics of the underlying materials. A material’s
phase describes a range of elemental composition and
other conditions over which its properties and structure, the arrangement of the constituent atoms,
change little. X-ray diffraction (XRD) is a ubiquitous
technique to characterize crystal phases, as it produces a signal containing a series of peaks that serve
as a fingerprint of the underlying atomic arrangement
or crystal structure. Using traditional methods, materials scientists can obtain and interpret 1 to 10 XRD
measurements per day, and with the recent development of automated, synchrotron-based XRD experiments, the measurement throughput has been accelerated to 103 to 105 measurements per day (Gregoire
et al. 2009; 2014). The creation of a phase-mapping
algorithm that generates phase diagrams from these
data remains an unsolved problem in materials science despite a series of advancements over the past
decade (Hattrick-Simpers, Gregoire, and Kusne 2016).
The problem is challenging, given that often the X-ray diffraction patterns correspond to a mixture of
crystal structures, some of them not necessarily sampled individually, requiring an algorithm that can
demix patterns while simultaneously identifying the
basis patterns. The most pertinent need is to generate
a physically meaningful phase diagram (one that generates materials science knowledge) for the materials
in a given library, or a collection of co-deposited
materials on a substrate, which relies on the spectral
demixing of the 102–103 XRD patterns into a small set
of basis patterns (typically less than 10).
The traditional analysis workflow relies on iterative manual analysis and heuristics, resulting in the
analysis of only a few systems a year. This quickly
becomes a bottleneck, as manual analysis cannot
keep up with the rate at which data are generated.
Automatic analysis becomes imperative in order to
analyze the vast amount of data that are generated in
high-throughput experiments.
The need for automatic and scalable tools provides
unique opportunities to apply cutting-edge techniques in computer science and AI to accelerate the
materials discovery process. We developed Phase-Mapper, a comprehensive platform that tightly integrates XRD experimentation, AI problem solving,
and human intelligence (figure 1), to address this
computational challenge. In this platform, within
minutes, an AI solver provides for the phase-mapping problem physically meaningful results, which
are examined and potentially further refined by
materials scientists interactively and in real time. In
addition, the results of Phase-Mapper can be used to
further inform future experimental designs. The
demixing algorithm is a cornerstone of the Phase-Mapper platform. We have developed a novel solver
called AgileFD, which is based on convolutive nonnegative matrix factorization (cNMF). Nonnegative
matrix factorization (NMF) is commonly used in
applications such as computer vision and topic modeling (Lee and Seung 2001), and cNMF extends this
method to convolutive mixtures used in blind source
separation of audio signals and speech recognition
(Smaragdis 2004; Mørup and Schmidt 2006). AgileFD
features a computationally efficient, gradient-based
search method using lightweight iterative updates of
candidate solutions. In addition, AgileFD integrates
functionalities beyond cNMF. The extensions of
AgileFD, described here, include incorporation of
constraints to encode both human input, which capitalizes on a researcher’s knowledge of a particular
data set, and prior knowledge of the problem related
to the underlying physics of phase diagrams. This, as
demonstrated below, can be critical in obtaining
physically meaningful solutions. In developing the
Phase-Mapper platform, careful attention has been
given to delivering a rich suite of capabilities while
maintaining solver convergence times within minutes, which enables researchers to interact with the
solver to refine the solution.
We evaluate Phase-Mapper and several solvers that
were proposed in recent years. In general, we observe
that the solutions found by AgileFD and its variants
better match the ground truth. A vanilla NMF
approach performs poorly, as it fails to capture physical constraints. Conversely, constraint programming–based approaches are able to enforce some of
the physical constraints but scale poorly. We show
empirically that AgileFD with its extensions is able to
find solutions that are close to the physical reality.
We first encountered the phase-mapping problem
seven years ago as part of our computational sustainability (Gomes 2009) effort to address pressing problems in renewable energy. Phase-Mapper is the culmination of our work since then, in close
collaboration with experts in materials science. Over
the course of this collaboration, we have made
important contributions to the formal characterization of this problem, developed several synthetic