Daniel, right, along with Greg and April of the Insight team

Daniel Saunders participated in the Insight Health Data Science program in the Fall of 2016, and currently works as a Data Scientist at Wayfair. Previously, Daniel was a postdoctoral fellow at the Center for Mind/Brain Sciences of the University of Trento, and received his PhD in Psychology from Queen’s University. While at Insight, Daniel built an NLP-driven engine to generate stress impact scores for newspaper front pages, trained on the reactions of Facebook users to news story headlines. In this blog post, he describes his creative process in developing this project.

Introduction

For my Insight Health Data Science project, I wanted to tackle a problem related to mental health, since my Ph.D. is in Psychology and my father is a mental health advocate in British Columbia. In particular, I decided to focus on the immediate impact on our stress and happiness levels of the media we consume.

Many news website front pages are a pure blast of stress hormones, something researchers have pointed to as reducing our lifespan by potentially adding anxiety and sadness to our daily routines. So much so that a writer for The Guardian wrote News is bad for you - and giving up reading it will make you happier (and he works for a newspaper!). News sites are optimized to get as many eyeballs as possible, and to grab and hold our attention and emotions. Consumers who want to find out what’s going on in the world while also managing their stress levels should have something optimized for them.

With this in mind, I used a combination of web scraping, natural language processing (NLP) and machine learning to create my app, MyChillNews.co (GitHub link), which guides readers towards conscious consumption of stressful daily news. The idea is not to replace daily news with puppies and kittens, but to provide a portal to real news sources while alerting the user to the stress-inducing qualities of each one. You might still choose to go to that upsetting front page, but at least you will be prepared.

A False Start with Twitter Data

In the first two weeks of the project, I worked with Twitter data, using Tweepy to collect all tweets that contained links to top news stories. I applied sentiment analysis to the remaining words in those tweets to determine whether the user had a positive or negative reaction.

I collected over 200,000 tweets that contained links to front page news stories. Then I trained a logistic regression classifier on Stanford’s Sentiment140 training set, which contains tweets that were hand labelled as having positive or negative sentiment. I represented each tweet in my collection as a mean of word2vec word embeddings that were then classified as positive or negative.

Although I was able to score my tweets this way along a positive/negative sentiment axis, I was unsatisfied with the results. It seemed that most tweets containing a link added no commentary whatsoever. Even when there was commentary, there was only room for a few words in the 140 characters after the URL was included. When I looked at the results, they didn’t always make intuitive sense to me: stories about bombings were sometimes rated as having much less stressful content than stories about mild political conflict. My problem, I realized, was that I had no way either to measure the accuracy of my stressfulness predictions, or to interrogate what was driving a particular rating.

Switching to Facebook Reaction Data

I tried a lot of things to verify my stressfulness ratings — I even put some time into the idea of using a heartrate monitor on a test reader, to see how my ratings compared to their physical reaction to selected stories!

Then I realized that the Facebook Graph API would allow me to access emotional reactions to news stories, as they were posted on the official Facebook pages of news organizations and then passed around the internet. If you’ve used Facebook in the last year or two you’ve seen that there is now a menu of possible emotional reactions beyond “Like” — including “Angry” and “Sad”.

Example of a typical news story on Facebook, and the reactions that individual users can select for it.

Perfect! Now I had objective data to show that at least some people found a news story enraging or saddening, exactly what I meant when I thought of a story being stressful.

However, I no longer had the element of timeliness that Twitter gave me, since stories are often posted later in the day on Facebook and take time to circulate around and gather reactions.

I decided to pivot the project and build an engine to predict sad and angry reactions to a story based solely on its headline, trained on historical data. That way I could generate a stress impact score for a story before a single person read it. (If any current Insight fellows are reading this, although it worked for me, I don’t recommend this size of pivot in week 4!)

Data Collection and Model Training

I collected over 12,000 stories posted on Facebook by my 10 news websites during several weeks, including the headline text and counts of the angry and sad reactions to them.

I used the words in each headline as predictive features for reactions. For the dependent measure, I computed a metric for each story which I called the Stress Impact Score: first, I took the mean of the ratio of sad and angry reactions to the total number of reactions to the story, to adjust for the fact that some stories get many more reactions than others; I then took the square root of this measure, because the distribution of stories along the stress axis was highly skewed, that is, a few stories made some people very mad and sad.

Since I now had a ground truth to train against — the Facebook reactions — I could evaluate my first-pass model, which was a Random Forest regression implemented in Scikit-learn. This had some predictive power, although was a little disappointing, so I decided to return to it later to optimize it.

To train my Random Forest Regression model, I used the words in each headline as predictive features for the reactions to it on Facebook. My training set consisted of 12,122 stories posted on Facebook by various news sites. Each morning, my app scrapes the first 10 headlines from 10 news webpages, and uses my model to rank the sites by predicted proportion of angry/sad reactions.

The Stress Impact of a Front Page

Now that I had a model with which to score individual news headlines for their stress impact, I began to think about how to summarize entire front pages, defined as the top 10 stories on the news websites.

I decided to do this by taking a simple arithmetic average of the Stress Impact Score across the top 10 stories from each news source, which I then converted into a percentile across all headlines in the database. This way, I could evaluate today’s front page headlines not in isolation — since I expected some days to have more stressful front pages than others, such as when some traumatic national-level event happens — but relative to historical days overall. I represented these scores to the user as a colour ranging from blue (least stressful) to red (most stressful), since the scale of the numbers would be meaningless.

For each news source, I took the mean of the Stress Impact Score for its top 10 stories. The news sources were then ranked by the percentile in which its mean Stress Impact Score fell across all headlines in the database. The colours ranged from blue (least stressful) to red (most stressful).

Building the MyChillNews App

I conceived of MyChillNews as a site to visit every day instead of going directly to news sites. My Python scripts running on AWS download the HTML of the front pages for all 10 sites twice per day, at 6:30 a.m. and 11 a.m., and also grab an image of the front page with PhantomJS and Selenium which I then resize to a thumbnail. I extract the top headlines for each news source using BeautifulSoup. This was one of the most time consuming parts of the project, since I had to write a custom scraper for each of the 10 sites, with CSS selectors to extract only major headlines and nothing else.

Once I retrieve the morning’s headlines, I process the text and then assign each story a Stress Impact Score, using the model that was trained on Facebook headlines and reactions. I only retrained and reuploaded this model every couple of weeks, since many of the same stressful words and phrases will stay relevant day after day.

It was important to me that MyChillNews be simple and appealing to use, and give users the feeling of piloting their news consumption rather than having their choices limited. I made a slider to let users pick their maximum level of stress exposure for the day, and — with the help of my brother, who is a JavaScript whiz — added a drag-and-drop menu to pick the order of preference for news sources. When either of these options are changed, a recommendation for a news source to suit the day’s level of stress tolerance appears, along with the thumbnail preview of the front page, to give you another hint about what you might be in for. Clicking on the thumbnail takes you to that front page.

An example screenshot of the MyChillNews app.

Optimizing the Model

Late in the timeline I found a few hours to iterate on my basic Random Forest Regression model, and found that I could improve performance significantly by trying different algorithms and optimizing various parameters. The parameters included text pre-processing choices, as well as algorithm settings. In the end, I found that a ridge regression on a bag of words representation of each headline, using both single words and two-word combinations, with stop words and common words removed but no tf-idf or stemming, produced the best predictive model, with an R² of 0.43 on the withheld test set.

I included the news source as a feature in my model, after some internal debate. Although it greatly improved the predictive power of my model, it also led to less sensitivity to changes in news source stress ratings from one day to the next. In addition, the same exact headline would be predicted to be more angry/sad if it appeared on the Fox News website than if it appeared in the LA Times. I’m still not sure if this was the right call.

The success of my model is in its ability to predict the stressful impact of a front page as a whole, so I evaluated it on the test set on that basis, averaging my predictions for the top 10 stories and comparing them to the reality. After the optimization, my model achieved an R² of 0.80.

Even more than the test set scores, I was reassured by looking at the words and phrases that carried the largest coefficients in the regression model. For example, here are word clouds produced from the most and least stressful headlines from October 2016.

Word clouds produced from the least (left) and most (right) stressful headlines, from October 2016.

Conclusion

The worst part of this project was ironically having to expose myself to large quantities of high stress news. I’m even less of a fan now! The Daily Mail in particular, which is always red in my app, has headlines designed to push our worst buttons of fear and xenophobia (I made an Easter egg for my app that compiles the most horrifying Daily Mail headline of each day, but be warned, it’s rough stuff).

I was excited to bring together some past technical skills that I had, from Python to web scraping to machine learning, with new challenges such as Flask, AWS, and JavaScript, to a project that demanded creative judgment calls and directly addressed the topic of everyday mental health.