criteria to evaluate the socialbots. Therefore, as a part
of the future work, we will train the model with user-level features as well.
The Alexa Prize Finals
Following the conclusion of the semifinals, three
finalists were identified: Alquist (Czech Tech University), Alana (Heriot-Watt University), and Sounding
Board (University of Washington). These finalists
remained online for the remainder of the competition. They entered the last phase of the competition
with an average rating of 2. 77. After two months of
additional interaction with Alexa users, they went to
the finals with an average rating of 3. 48, an improvement of 26 percent from their earlier ratings. The
Alexa Prize finals were conducted in November as a
closed-door event with three rounds of conversations
with each socialbot. We invited professional judges
well versed in conversation: talk show hosts, authors,
speakers, and AI experts. Three judges were “
conversationalists” and asked to engage in conversations
with the socialbots; five judges were “evaluators” and
asked to listen to and rate the conversations. The
winner was Sounding Board from the University of
Washington with an average score of 3. 17 on a 5-
point scale and an average conversation duration of
10 minutes and 22 seconds.
The objective of the finals was to evaluate socialbots on their ability to conduct coherent and engaging voice conversations with humans and to chain
relevant entities into a dialogue. Highlights of these
conversations are available in a short film. 6 The best
interactions combined many of the techniques discussed in this article.
For example, a conversation that started off with a
request to discuss the book The Dark Tower evolved in
the following manner: The Dark Tower (book) → The
Dark Tower (movie) → Matthew McConaughey’s performance in the movie → Other movies that he’s acted in and that the user may find interesting → a topical humorous suggestion, “Someone needs to make
a mashup of Interstellar and Contact where Matthew
McConaughey tries to prevent Matthew McConaughey from going into space.” Such relevant
transitions and natural injection of humor give us a
glimpse into the future of conversational AI.
Another example that combines entity resolution
and reference disambiguation in the context of an
introductory ice-breaker experience:
Socialbot: Hi. This is an Alexa Prize socialbot. How’s it
User: It’s going great, how about you?
Socialbot: Awesome. I’m glad you’re doing well today.
I’m having a good day, thanks for asking. Ready to
enjoy our chat. If you like, you can tell me your name?
User: My friends call me John, but you can call me Mr.
Socialbot: Excellent, I’m happy to meet you Mr. Pickles.
There were some examples of failure points as well
(such as ASR failures). The field has a long way to go
yet, but there is definite promise and improvement
in the accomplishments of this first year, and we look
forward to seeing more gains made through Alexa
The finalist socialbots will continue to be a part of
the “let’s chat” experience until the Alexa Prize 2018
socialbots become available to the public in May
2018. To speak with the socialbots yourself, simply
say “Alexa, let’s chat” to any Alexa-enabled device.
The Alexa Prize was designed as a framework to sup-
port research on conversational AI at scale in a real-
world setting. The scientific advances described
above (and detailed in individual team papers) result-
ed in significant improvements in socialbot quality
and a significant amount of user engagement.
Customer engagement remained high throughout
the competition. Alexa Prize ranked in the top 10
Alexa skills by usage, with over 40,000 hours of conversations spanning millions of utterances by the
end of the finals. Customers chatted on a wide range
of popular and current topics with movies/TV, music,
politics, celebs, business, and scitech being the highest frequency (most popular) topics. The most popular topics from the post-semifinals feedback phase
were movies/TV (with an average rating of 3. 48),
scitech ( 3. 60), travel/Geo ( 3. 51), and business ( 3. 48).
Based on user ratings, the three lowest rated topics
were arts (with an average rating of 2. 14), shopping
( 2. 63), and education ( 3.03).
It is still early in the Alexa Prize journey towards
natural human conversation, but the high level of
engagement and feedback (over 130,000 hours of
conversation to date) demonstrates that users are
interested in chatting with socialbots and supporting
Over the course of the competition, socialbots
showed a significant improvement in customer experience. The three finalists improved their ratings by
29. 6 percent (from 2. 77 to 3. 59) over the duration of
competition. All 15 socialbots had an average customer rating of 2. 87, with a median conversation
duration of 1: 35 minutes and a 90th percentile of 5: 43
minutes by the end of the semifinal phase. The conversation duration of finalists across the entire competition was 1: 41 minutes (median) and 8:02 minutes
(90th percentile), improving 19. 4 percent and 58. 26
percent respectively from the start of the competition,
with 10 turns (median) per conversation.
We measured response error rate (RER) through the