The advent of social media presents a promising new opportunity for early detection and intervention in psychiatric disorders. Predictive screening methods have successfully analyzed online media to detect a number of harmful health conditions [1–11]. All of these studies relied on text analysis, however, and none have yet harnessed the wealth of psychological data encoded in visual social media, such as photographs posted to Instagram. In this report, we introduce a methodology for analyzing photographic data from Instagram to predictively screen for depression.

There is good reason to prioritize research into Instagram analysis for health screening. Instagram members currently contribute almost 100 million new posts per day [12], and Instagram’s rate of new users joining has recently outpaced Twitter, YouTube, LinkedIn, and even Facebook [13]. A nascent literature on depression and Instagram use has so far either yielded results that are too general or too labor-intensive to be of practical significance for predictive analytics [14, 15]. In particular, Lup et al. [14] only attempted to correlate Instagram usership with depressive symptoms, and Andalibi et al. [15] employed a time-consuming qualitative coding method which the authors acknowledged made it ‘impossible to qualitatively analyze’ Instagram data at scale (p.4). In our research, we incorporated an ensemble of computational methods from machine learning, image processing, and other data-scientific disciplines to extract useful psychological indicators from photographic data. Our goal was to successfully identify and predict markers of depression in Instagram users’ posted photographs.

Hypothesis 1

Instagram posts made by individuals diagnosed with depression can be reliably distinguished from posts made by healthy controls, using only measures extracted computationally from posted photos and associated metadata.

Photographic markers of depression

Photographs posted to Instagram offer a vast array of features that might be analyzed for psychological insight. The content of photographs can be coded for any number of characteristics: Are there people present? Is the setting in nature or indoors? Is it night or day? Image statistical properties can also be evaluated at a per-pixel level, including values for average color and brightness. Instagram metadata offers additional information: Did the photo receive any comments? How many ‘Likes’ did it get? Finally, platform activity measures, such as usage and posting frequency, may also yield clues as to an Instagram user’s mental state. We incorporated only a narrow subset of possible features into our predictive models, motivated in part by prior research into the relationship between mood and visual preferences.

In studies associating mood, color, and mental health, healthy individuals identified darker, grayer colors with negative mood, and generally preferred brighter, more vivid colors [16–19]. By contrast, depressed individuals were found to prefer darker, grayer colors [17]. In addition, Barrick, Taylor, & Correa [19] found a positive correlation between self-identification with depression and a tendency to perceive one’s surroundings as gray or lacking in color. These findings motivated us to include measures of hue, saturation, and brightness in our analysis. We also tracked the use of Instagram filters, which allow users to modify the color and tint of a photograph.

Depression is strongly associated with reduced social activity [20, 21]. As Instagram is used to share personal experiences, it is reasonable to infer that posted photos with people in them may capture aspects of a user’s social life. On this premise, we used a face detection algorithm to analyze Instagram posts for the presence and number of human faces in each photograph. We also counted the number of comments and likes each post received as measures of community engagement, and used posting frequency as a metric for user engagement.

Early screening applications

Hypothesis 1 is a necessary first step, as it addresses an unanswered basic question: Is depression detectable in Instagram posts? On finding support for Hypothesis 1, a natural question arises: Is depression detectable in Instagram posts, before the date of first diagnosis? After receiving a depression diagnosis, individuals may come to identify with their diagnosis [22, 23]. Individuals’ self-portrayal on social media may then be influenced by this identification. It is possible that a successful predictive model, trained on the entirety of depressed Instagram users’ posting histories, might not actually detect depressive signals, per se, but rather purposeful content choices intended to convey a depressive condition. Training a model using only posts made by depressed participants prior to the date of first diagnosis addresses this potential confounding factor.

Hypothesis 2

Instagram posts made by depressed individuals prior to the date of first clinical diagnosis can be reliably distinguished from posts made by healthy controls.

If support is found for Hypothesis 2, this would not only demonstrate a methodological advance for researchers, but also serve as a proof-of-concept for future healthcare applications. As such, we benchmarked the accuracy of our model against the ability of general practitioners to correctly diagnose depression as shown in a meta-analysis by Mitchell, Vaze, and Rao [24]. The authors analyzed 118 studies that evaluated general practitioners’ abilities to correctly diagnose depression in their patients, without assistance from scales, questionnaires, or other measurement instruments. Out of 50,371 patient outcomes included across the pooled studies, 21.9% were actually depressed, as evaluated separately by psychiatrists or validated interview-based measures conducted by researchers. General practitioners were able to correctly rule out depression in non-depressed patients 81% of the time, but only diagnosed depressed patients correctly 42% of the time. We refer to these meta-analysis findings [24] as a comparison point to evaluate the usefulness of our models.

A major strength of our proposed models is that their features are generated using entirely computational means - pixel analysis, face detection, and metadata parsing - which can be done at scale, without additional human input. It seems natural to wonder whether these machine-extracted features pick up on similar signals that humans might use to identify mood and psychological condition, or whether they attend to wholly different information. A computer may be able to analyze the average saturation value of a million pixels, but can it pick out a happy selfie from a sad one? Understanding whether machine learning and human opinion are sensitive to the same indicators of depression may be valuable information for future research and applications. Furthermore, insight into these issues may help to frame our results in the larger discussion around human versus machine learning, which occupies a central role in the contemporary academic landscape.

To address these questions, we solicited human assessments of the Instagram photographs we collected. We asked new participants to evaluate photos on four simple metrics: happiness, sadness, interestingness, and likability. These ratings categories were intended to capture human impressions that were both intuitive and quantifiable, and which had some relationship to established depression indicators. DSM-IV [20] criteria for Major Depressive Disorder includes feeling sad as a primary criterion, so sadness (and its anti-correlate, happiness) seemed obvious candidates as ratings categories. Epstein et al. [25] found depressed individuals ‘had difficulty reconciling a self-image as an ‘outgoing likeable person’’, which prompted likability as an informative metric. We hypothesized that human raters should find photographs posted by depressed individuals to be sadder, less happy, and less likable, on average. Finally, we considered interestingness as a novel factor, without a clear directional hypothesis.

Hypothesis 3a

Human ratings of Instagram posts on common semantic categories can distinguish between posts made by depressed and healthy individuals.

Hypothesis 3b

Human ratings are positively correlated with computationally-extracted features.

If human and machineFootnote 1 predictors show positive correlation, we can infer that each set of features tracks similar signals of depression. In this case, the strength of the human model simply suggests whether it is better or worse than the machine model. On the other hand, if machine and human features show little or no correlation, then regardless of human model performance, we would know that the machine features are capable of screening for depression, but use different information signals than what are captured by the affective ratings categories.