getting expert annotations on posts directly, a
method that does not scale well to large datasets, we
obtained them on outcomes of topic models. This
strategy allowed us to scale our inference framework
to a large corpus of Instagram posts, where we developed a semisupervised approach to map the labels on
the topics to posts from users. Examples of high MIS
content spans from expression of negative self-perceptions to disordered thoughts about eating to
graphic illustration of acts that could lead to physical
and emotional harm or death. This coincorporation
of human feedback as gold standard information and
of analytical AI-based data enabled deep explorations
into the manifestation of MIS on the Instagram platform. We found that users who share pro-eating disorder content on Instagram exhibit a trend of
increasing MIS in their content over time.
Interpreting Large-Scale Analysis
As we noted, AI approaches can be complemented
with human feedback for interpreting the outcomes
of an analysis or a computational model. Another
way to combine AI methods with human intelligence
is to have experts contextualize the AI findings in
existing theory or theoretical/conceptual frame-
works. By integrating knowledge from existing theo-
ries and frameworks, we can test our understanding
of underlying mechanisms and our assumptions on
unobserved factors that might be affecting our con-
In prior joint work (De Choudhury et al. 2016), the
authors developed a causal inference framework
(Pearl 2009) to assess the likelihood that an individual will transition to discussions of suicidal ideation,
given a history of mental health discourse on social
media. This framework was developed on a large
dataset of 880 users who shared more than 12K posts
and 100K comments on the social media site Reddit.
The output of the framework included words and
phrases that indicated the likelihood of future suicidal ideation, given their usage in a post. Specifically,
we applied a high-dimensional stratified propensity
score method (Rosenbaum and Rubin 1983). This
approach attempts to isolate the effects of a particular treatment from the effects of covariates by dividing the treatment (those who use a particular
Figure 2. Facebook Data and Self-Reported Information to Predict Risk of Postpartum Depression.
Our prior work examining social media–based postpartum changes in activity, socialization, affect, and interpersonal attention of new
mothers (De Choudhury et al. 2014). The heatmaps show individual-level changes in the postnatal period, compared to the prenatal phase.
For 15 percent of mothers, these changes (for example, increase in NA and activation) are considerably higher following childbirth.
Volume Replies NA Activation First pp.
← Time (in days) →