As a technologist, I’m fascinated about how people use social media. It’s such a vast space but people find places where they can make connections around any number of topics. Social media has fostered revolutions, saved lives, but also taken them. It enables freedom of expression but also allows an unprecedented level of hatred. Like a hammer, social media is a tool, and it’s up to humanity to use it to build or to destroy.

I read an article that described a language analysis done on comments from Reddit. Reddit is a community website that aggregates content. It also allows members to share, rate, and discuss the content. I thought it would be interesting to see how people on Reddit talked about epilepsy.

Why does it matter?

If you’re reading this post, you may have been lead to it by Twitter, Facebook, or Medium. Maybe you subscribed to the blog. In any case, you are using technology and the Internet to consume information. And there is a lot of information out there…some good, some bad, some supportive, some not. These types of analyses aren’t perfect, but they can provide some interesting insights.

I’m old enough to be able to navigate these platforms and decide what to take and what to leave. While my son is not of Internet age yet, he will be soon. And he’ll be more likely to look to social media for support. The more I know about the different systems, the abler I’ll be to guide him as he explores them.

More generally, though, these types of analyses can be helpful to see what aspects of epilepsy people are talking about. Or, not talking enough about.

What data did I look at?

For this project, I grabbed comments from March 2017 that contained the word “epilepsy“. That gave me 3,046 comments out of about 79 million (0.0038%). Literally, a drop in the bucket, but enough for a simple analysis.

Number of comments by day in March 2017

Here is how the epilepsy-related comments were distributed throughout March.

The big spike on March 22 was partly due to a question in AskReddit. AskReddit is where posters ask and answer “questions that elicit thought-provoking discussions”. The spike was the result of responses to the question “What are you sick and tired of having to explain to people?.” I can imagine people living with epilepsy having an opinion on that question.

Which subreddits are the most active?

Next, I wanted to break down the comments by the group they were posted in. On Reddit, the groups are called “subreddits”. Those discussions helped the AskReddit subreddit lead the comment count for epilepsy-related posts. The subreddit dedicated to discussions about epilepsy came in second.

What adjectives do people use when they talk about epilepsy?

Besides looking at simple numbers, I wanted to analyze the comments themselves. I ran them through Google’s Natural Language (NLP) API to see what I could learn. NLP takes a sample of text and breaks it down into parts of speech and sentiment.

First, I looked at the parts of speech. Here are the top adjectives most used in conjunction with the word “epilepsy.”

What is the sentiment of the comments about epilepsy?

Next, I wanted to add the sentiment piece. The NLP looked at each comment and to try to infer if it represented a positive or negative sentiment. “I won’t let epilepsy get me down” is an example of a positive sentiment. “I have epilepsy and am depressed” expresses a negative sentiment. I wondered if the adjectives used changed depending on the sentiment of the comment, and they did.

For comments characterized as positive, words like “good”, “great”, and “best” were included.

For negative comments, “bad”, different”, and “severe” made the list.

I also wanted to look at the sentiment across the different groups. The chart below shows the average sentiment of the epilepsy-related comments by subreddit.

Again, a positive score reflects an overall positive sentiment of the comments. Interestingly, the big negative score on the chart is for the subreddit “KotakuInAction.” The group relates to the “GamerGate” controversy and other gaming and Internet issues. The thread that contained the epilepsy comments related to the Eichenwald case. That was where a journalist with epilepsy was sent a seizure-inducing twitter message.

What else do people talk about when they talk about epilepsy?

Finally, Google’s algorithm also provides other topics (entities) that are discussed in text. Here are the most common entities mentioned in conjunction with epilepsy on Reddit.

Let’s look at January through March…

Since the data was available, I ran a few of the reports for the first three months of 2017, as well, to see if anything changed.

First, here are the number of comments for January through March.

I also wanted to see how different the entities report was over the three months. There was a lot of overlap from the March chart, showing that conversations about those entities are likely normal.

Finally, I also looked at the occurrences of specific references to a handful of positive and negative terms that often come up when speaking about epilepsy.

Looking at the two charts, clearly, references to medication, side effects, and depression were often discussed in the comments on Reddit.

What’s next?

This project was a first look at using natural language processing techniques to analyze social media posts about epilepsy. There are a number of applications for such technology, and it will be interesting to explore more sites and using different algorithms and techniques. If you have any thoughts or suggestions on other ways to look at the data, please leave a comment below.

If you’re interested in doing your own analysis, you can find the source code and other information on my GitHub page. A shout-out to Sara Robinson for her article, which was a guide and huge inspiration.

Also published on Medium.