Text Summarization
Extracting or generating key information from documents for efficient retrieval and response generation.
Some of the teams adopted this technique for summarizing the articles or potential responses for efficient response generation.
Sentiment detection
Identifying user sentiment. Some teams developed
sentiment detection modules to help with generating engaging responses. This approach also helped
them to better understand a user’s intent and generate appropriate responses.
Knowledge Ingestion and
Common Sense Reasoning
Currently, available conversational data is limited to
datasets that have been produced from online
forums (for example, Reddit), social media interactions (for example, Twitter), and movie subtitles (for
example, OpenSubtitles, Cornell Movie-Dialogs
Corpus). While these datasets are useful at capturing the syntactic and semantic elements of conversational interactions, they also have many issues
with data quality, content (profanity, offensive
data), query-response pair tracking, context tracking, multiple users interacting without a specific
order, and short and ephemeral conversations. In
the absence of better alternatives, teams still used
these datasets. To address offensive content and profanity, teams built classifiers to detect this content.
Furthermore, we shared Washington Post (WaPo) live
comments, which are conversational in nature and
also highly topical. Several teams made use of these
comments.
The teams also used various knowledge bases,
including Amazon’s Evi, Freebase, and Wikidata for
retrieving general knowledge, facts, and news, and
for general question answering. Some teams also
used these sources for entity linking, sentence completion, and topic detection. Ideally, a socialbot
should be able to ingest and update its knowledge
base automatically; however, this is an unsolved
problem and an active area of research. Finally, teams
also ingested information from news sources such as
the Washington Post and CNN to keep current with
news events that users may want to chat about.
For commonsense reasoning, several teams built
modules to understand user intent. Some teams pre-processed open source and Alexa Prize datasets and
extracted information about trending topics and
opinions on popular topics, integrating them within
their dialogue manager to make the responses seem
as natural as possible. To complement commonsense
reasoning, some of the top teams added user satisfaction modules to improve both engagement and
conversational coherence.
To make sure that teams were leveraging relevant
datasets and knowledge bases, we emphasized early
availability of live user interactions to the socialbots,
Dialogue and Context Modeling
A key component of any conversational agent is a
robust system to handle dialogues effectively. The
system should accomplish two main tasks: help
break down the complexity of the open domain
problem to a manageable set of interaction modes,
and be able to scale as the diversity and breadth of
topics expands. A common dialogue strategy used by
teams was a hierarchical architecture with a main
dialogue manager (DM) and multiple smaller DMs
corresponding to specific tasks, topics, or contexts.
Some teams, such as Sounding Board, used a hier-
archical architecture and added additional modules
such as an error handler to handle cases such as low-
confidence ASR output or low-confidence response
candidates (Fang et al. 2017). 4 Other teams, such as
Alquist, (Pichl, J. et al. 2017) 4 used a structured top-
ic-based dialogue manager, where components were
broken up by topics, along with intent-based dia-
logue modules broken up by intents. Generally,
teams also incorporated special-purpose modules
such as a profanity or offensive content module to
filter a range of inappropriate responses and modules
to address feedback and acknowledgement and to
request clarity or rephrasing from users. Teams exper-
imented with approaches to track context and dia-
logue states, and corresponding transitions to main-
tain dialogue flow. For example, Alquist and Slugbot
(Bowden et al. 2017) 4 modeled dialogue flow as a
state graph. These and other techniques helped
socialbots produce coherent responses in an ongoing
multiturn conversation and guided the direction of
the conversation as needed. A few teams, such as
Magnus (Prabhumoye et al. 2017), 4 built finite-state
machines (FSMs) (Wright 2005) for addressing spe-
cific modules such as movies, sports, and others. One
challenge in using this technique for dynamic com-
ponents is scaling and context switching; however,
for small and static modules, FSMs can be useful.
The top teams focused not only on response generation but also on customer experience, and experimented with conversational strategies to increase
engagement as discussed in the next section.
Conversational User Experience
Participating teams built several conversational user
experience (CUX) modules, which included engagement, personalization, and other user experience–
related aspects. CUX modules are relatively easy to
build, but such modules may lead to significant gains
on ratings and duration. CUX is an essential component, and the teams that focused most of their efforts
on NLU and DM, with less emphasis on CUX, were
not received as top performers by Alexa users. Following are the five main components built by various
teams.