which are verbs that affect their patients in positive
or negative ways. For example, being fed, paid, or
adopted are typically desirable events for the entities
that are acted upon, but being eaten, chased, or
hospitalized are usually undesirable events. Basilisk
learned to identify many PPVs using separate bootstrapping processes to learn positive and negative
PPVs, given 10 manually defined examples of each.
The contextual patterns, however, were quite different from the lexico-syntactic patterns used in previous Basilisk work. For the task of learning PPVs, conjunction patterns were defined to exploit a previous
observation from sentiment analysis that conjunctions usually join items with the same polarity
(Hatzivassiloglou and McKeown 1997). For example,
if rescued is known to be a positive verb, then seeing
the conjunction rescued and adopted should lead us to
believe that adopted is probably also a positive verb.
Research on sarcasm recognition by Riloff et al.
(2013) demonstrated that different types of lexicons
can be learned simultaneously if they co-occur in a
shared structure. This work used just one seed word,
love, as an example of a positive sentiment, and a collection of tweets that contain a #sarcasm hashtag.
The key idea behind this work is that sarcasm often
emerges from the juxtaposition of a positive sentiment and a negative situation (for example, I love
being ignored or I just adore waiting for the doctor). Given a sarcastic tweet that has a positive sentiment (for
example, love), we can infer that the target of the sentiment is probably a negative situation. Conversely,
given a sarcastic tweet that mentions a negative situation (for example, being ignored), we can infer that
the sentiment is probably positive. Using these dual
sources of knowledge in an alternating bootstrapping
cycle, the system learned lists of positive sentiments
and negative situations simultaneously.
These research efforts illustrate the flexibility and
generality of bootstrapped lexicon induction. Given
different types of seed words, patterns, and corpora,
this paradigm can be used to learn many different
kinds of knowledge.
Bootstrapped Pattern Learning
Bootstrapping has also been used to learn patterns
in several novel ways. The Ex-Disco system (Yangar-
ber et al. 2000) added a bootstrapping mechanism
around a pattern learner modeled after AutoSlog-TS,
which used relevant and irrelevant texts for train-
ing. Ex-Disco used a small set of manually defined
seed patterns to heuristically partition a collection
of unannotated texts into relevant and irrelevant
sets. Patterns were then ranked based on their asso-
ciation with the relevant texts, the best pattern(s)
were added to the pattern set, and the corpus was
repartitioned into new relevant and irrelevant sets
for the next iteration. Stevenson and Greenwood
(2005) also began with seed patterns and used
semantic similarity measures to iteratively rank and
select new candidate patterns based on their simi-
larity to the seeds.
Riloff and Wiebe (2003) used a bootstrapping
mechanism to learn lexico-syntactic patterns that
represent subjective expressions. A novel aspect of
that work was the use of high-precision (but low-recall) classifiers as the basis for seeding. Lukin and
Walker (2013) adopted a similar bootstrapping
approach to learn patterns associated with sarcasm
in dialogue. Recently, Gupta and Manning (2014)
used several types of unsupervised class predictors,
such as distributional similarity, to better estimate
the likelihood that an unlabeled term belongs to a
negative class during pattern scoring. They also
showed that incorporating distributed word representations to enhance the training sets during learning can improve results (Gupta and Manning 2015).
Lessons Learned
Bootstrapped learning can seem like a black art to
people who do not yet have experience with it. But
many lessons have been learned by researchers who
work in this paradigm. In this section, we discuss
some of the most important (but often unspoken)
principles behind bootstrapped learning techniques,
extrapolating both from the literature and from our
own personal experiences.
Seeding Principles
In NLP, the paradigm of bootstrapped learning takes
human input in the form of seeding heuristics and
applies those heuristics to unannotated texts to produce labeled examples. The heuristically labeled data
is then used to train a learning algorithm, which
kicks off an iterative process.
Since the seeding heuristics are a proxy for manually labeled data, it is critical that they be able to
assign labels with reasonably high accuracy. Heuristics will rarely be as accurate as human annotators,
but they can be applied automatically to large volumes of unannotated text, yielding a potentially
enormous amount of labeled training data. The
expectation is that large volumes of slightly noisy
training data will be nearly as good as, or potentially
better than, presumably smaller amounts of “
perfectly” labeled training data (which is typically available only in limited quantities). With this in mind,
we lay out three general principles for identifying
effective seeding heuristics: high frequency, high precision, and diversity.
High Frequency
Simply put, you want the seeding heuristics to be
able to label as many examples as possible. The more
instances they label, the more training data the learning algorithm gets. Many people have observed that
using different seeds can produce dramatically different results (for example, McIntosh and Curran
2009). One of the reasons is that different seeds can