terms and with terms learned in the early iterations
of bootstrapping. Candidates that are more similar to
recently learned terms than to earlier terms are likely to have drifted.
Subsequent research has also recognized the value
of learning multiple categories during bootstrapping
as a source of negative feedback. McIntosh and Curran (2010) developed a method to automatically discover new semantic categories that can be beneficial
as negative examples during learning. Vyas and Pantel (2009) also recognized the importance of identifying negative classes in their work on set expansion.
They had a human manually identify errors and
automatically removed additional words that had
high distributional similarity with the errors, under
the hypothesis that they were likely to belong to the
same negative class. Learning multiple categories
simultaneously has also been shown to be useful as
counter-training for pattern learning (Yangarber
2003) and for cross-category learning in a bootstrapped contextual semantic tagger (Huang and
Riloff 2010).
Bootstrapping New Types of Lexicons
Natural language processing systems need many
types of knowledge, and the same bootstrapped
learning mechanisms used to create semantic dic-
tionaries have proven to be beneficial for creating
other types of dictionaries as well.
In the years since the Basilisk algorithm was
developed for semantic lexicon induction, Basilisk
has been used to generate several novel types of lexicons. Given a seed list of subjective nouns, which
represent private states and opinionated language,
Basilisk learned to identify many new subjective
nouns to produce a subjectivity lexicon (Riloff,
Wiebe, and Wilson 2003; Wilson et al. 2005). Examples of the subjective nouns learned are barbarian,
atrocities, and exaggeration. For event extraction,
Basilisk learned to identify role-identifying nouns,
which are nouns whose semantics reveal the role of
an entity with respect to an event (Phillips and
Riloff 2007). For example, words such as assassin,
burglar, and sniper refer to people who participated
as the agent of an event, while casualty, fatality, and
victim refer to people who represent the patient of
an event.
For work on generating plot unit representations
(Goyal, Riloff, and Daumé III 2010, 2013), Basilisk
was used to identify patient polarity verbs (PPVs),
Figure 2: The Basilisk Algorithm.
Lexicon
Patterns
and
Extracted Nouns
5 best candidate nouns
best
patterns extractions
Candidate
Pool
Pool