Expanding the Graph with Watson
In medical problem solving, experts reason with
chief complaints, findings, medical history, demographic information, and so on, to identify the
underlying causes for the patient’s problems.
Depending on the situation, they may then proceed
to propose a test whose results will allow them to distinguish between multiple possible problem causes,
or identify the best treatment for the identified cause,
and so on.
Motivated by the medical problem-solving paradigm, WatsonPaths first attempts to make a diagnosis
based on factors extracted from the scenario. The
graph is expanded to include new assertions about
the patient by asking questions of a version of the
Watson question-answering system adapted for the
medical domain (Ferrucci et al. 2013). WatsonPaths
takes a two-pronged approach to medical problem
solving by expanding the graph forward from the scenario in an attempt to make a diagnosis, and then
linking high-confidence diagnoses with the hypotheses. The latter step is typically done by identifying an
important relation expressed in the punch line question (for example, “What is the most appropriate
treatment for this patient” or “What body part is most
likely affected?”). This approach is a logical extension
of the open-domain work of Prager, Chu-Carroll, and
Czuba (2004), where in order to build a profile of an
entity, questions were asked of properties of the entity and constraints between the answers were enforced
to establish internal consistency.
The graph expansion process of WatsonPaths
begins with automatically formulating questions
related to high-confidence assertions, which in our
graphs represent statements WatsonPaths believes to
be true to a certain degree of confidence about the
patient. These statements may be factors, as extracted and typed by our scenario analysis algorithm, or
combinations of those factors.
To determine what kinds of questions to ask, WatsonPaths can use a domain model that tells us what
relations form paths between the semantic type of a
high-confidence node and the semantic type of a
hypothesis like a diagnosis or treatment. For the
medical domain, we created a model that we called
the Emerald, which is shown in figure 4. (Notice the
resemblence to an emerald.) The Emerald is a small
model of entity types and relations that are crucial
for diagnosis and for formulating next steps.
We select from the Emerald all relations that link
the semantic type of a high-confidence source node
to a semantic type of interest. The relations and the
high-confidence nodes then form the basis of instan-
tiating the target nodes, thereby expanding the asser-
tion graph. To instantiate the target nodes, we issue
WatsonPaths subquestions to Watson. All answers
returned by Watson that score above a predeter-
mined threshold are posted as target nodes in the
inference graph. A relation edge is posted from the
source node to each new target node where the con-
fidence of the relation is Watson’s confidence in the
answer in the target node.
In addition to asking questions from scenario factors, WatsonPaths may also expand backwards from
hypotheses. The premise for this approach is to
explore how a hypothesis fits in with the rest of the
inference graph. If one hypothesis is found to have a
strong relationship with an existing node in the
assertion graph, then our probabilistic inference
mechanisms allow belief to flow from known factors
to that hypothesis, thus increasing the system’s confidence in that hypothesis.
Figure 5 illustrates the WatsonPaths graph expansion process. The top two rows of nodes and the
edges between them show a subset of the WatsonPaths assertion graph after scenario analysis, with the
second row of nodes representing some clinical factors extracted from the scenario sentences.
The graph expansion process identifies the most
confident assertions in the graph, which include the
four clinical factor nodes extracted from the scenario.
These four nodes are all typed as findings, so they are
aggregated into a single fnding node for the purpose
of graph expansion. For a fnding node, the Emerald
proposes a single fndingOf relation that links it to a
disease. This results in the formulation of the subquestion “What disease causes resting tremor that
began 2 years ago, compromises the whole arm,
unexpressive face, and difficulty in walking?” whose
answers include Parkinson’s disease, Huntington’s
disease, cerebellar disease, and so on. These answer
nodes are added to the graph and some of them are
shown in the third row of nodes in figure 5.
In the reverse direction, WatsonPaths explores relationships between hypotheses to nodes in the existing graph based on the punch line question in the
scenario, which in this case is “What part of his nervous system is mostly likely affected?” Assuming each
hypothesis to be true, the system formulates subquestions to link it to the assertion graph. Consider
the hypothesis, substantia nigra. WatsonPaths can ask
“In what disease is substantia nigra most likely affected?” A subset of the answers to this question, including Parkinson’s disease and diffuse Lewy body disease
are shown in the fourth row of nodes in figure 5.
Matching Graph Nodes
When a new node is added to the WatsonPaths assertion graph, we compare the assertion in the new
node to those in existing nodes to ensure that equivalence relations between nodes are properly identified. This is done by comparing the statements in
those assertions: for unstructured statements,
whether the statements are lexically equivalent, and
for structured statements, whether the predicates and
their arguments are the same. A more complex operation is to identify when nodes contain assertions
that may be equivalent to the new assertion.