The Download

To do any sort of text processing, we needed data. So, data is what I sought as a sensible first step! The folks over at HackerNews have a great, well documented API, perfect for this job. Yay! So, here’s a script I whipped in a couple of minutes to get the data:

Script for downloading posts

Running this on a RaspberryPi 3 for a couple of hours, I downloaded a little over a hundred and thirty thousand individual files. Which I assumed would be a good start for a weekend hack.(More on this later!)

Data!

The Judgment

Armed with the data, it was time to make sense of it. For the purpose of sentiment analysis, I decided to give TextBlob a try. Analysing text sentiment with TextBlob is a breeze, here’s what you do:

>>> from textblob import TextBlob

>>> text = TextBlob(“Textblob is amazingly simple to use. What great fun!”)

>>> text.sentiment

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

>>> text.sentiment.polarity

0.39166666666666666

>>> text.sentiment.subjectivity

0.4357142857142857

The only thing that’s left to do is to run this on the entire dataset. That’s what we do next:

This took about thirty seconds to run on my MacBook Pro(2015). The plot that it churned out is:

Comment sentiment over weekdays

What we observe is that the total comment volume increases as we progress through the week. By sheer numbers, Friday witnesses the highest number of comments with negative sentiment, but the more interesting trend is that it also witnesses the second highest count of positive sentiment!

Now, analysing the numbers as a percentage of the total comments on that day reveals an interesting trend. Let’s have a look:

Comment trends expressed as percentage

Looks like the percentages don’t vary much over the course of the week, let’s zoom in then? Yes!

Comment trends expressed as percentage — Zoomed in

It turns out, the percentages do vary, but not appreciably. Also, notice that by percentage, Saturday is the clear winner with 21% share of negativity. This might be a result of the low volume of comments during the weekends.

Remember that I said that I had dowloaded a hundred thousand posts to start with, well turns out these posts were from the year 2017 alone! To understand the trends better it would be great if the analysis was run over the entire dataset, which is a task for another weekend hack!