Extracting meaning from unstructured data is a difficult thing to do. Sometimes, if you’re lucky, there are telling characteristics about the data that provide an interesting angle into the text. One angle is web addresses. Web addresses are semi-structured data that can be extracted to give some proxy of meaning to a body of text.

I took this approach to find trends and meaning in Reddit comments. The corpus of all Reddit comments from January 2015 to June 2017 live on Google BigQuery datasets. I extracted all links that contained amazon.com in them and then ensured they were a product by scraping the Amazon page. What I found was fun and interesting. Based on this premise, I built a site called ThingsOnReddit which organizes products by subreddit. The diversity of people on Reddit leads to all sorts of products being mentioned. So, what were the interesting things found on Reddit? The graph below shows which subreddits link to the most products.

The top subreddits are popular subreddits or subreddits you would expect to have a large number of products posted to them (for example, femalefashionadvice). Another way to slice the data is to graph the number of items posted over time.

This graph is showing the number of items posted to Reddit bucketed by week. The first obvious trend is that, on average, people are posting more amazon links over time. This is could be due to the fact that Reddit is simply receiving more comments.

The peaks in this graph are quite fascinating. Christmas 2016 exhibited a huge spike in products being linked to. Curiously, no such trend happened in 2015. The products were evenly distributed amongst many subreddits.

The importance of a product can also be measured by how many times it has been linked to. These items were linked to the most on all of Reddit:

I had no idea what Passion Lubes were when I first started this exercise. Here’s in image of the most popular item linked to on Reddit:

From Amazon, it’s “The Ultimate Lube Keg.” Perhaps that says something about Reddit?

Passion Lubes had been linked to in many subreddits but not many times within one subreddit. It’s possible that items that have been linked to many times within one subreddit are more interesting because subreddits represent communities. One very telling item that was linked to 17 times in /r/Coffee was the Hario Skerton Ceramic Coffee Mill (100g). If you read the comments on it, it is one of the most recommended items.

Some of the products in subreddits provide an interesting insight into a culture. An example is /r/AsianBeauty. The subreddit mainly links to skin and Sun care products.

Here you can see that almost 80% of all products posted on AsianBeauty are Beauty & Personal Care. And here are the top products mentioned:

There are also a few subreddits that are genuinely useful for finding great products. If you’re looking for the best knives on Reddit, then /r/knifeclub is a great subreddit to browse. Or maybe you need a quality watch, then /r/Watches might be a good bet.

The quality of comments is another fascinating aspect. While most comments are just witty one-liners, there are many comments that read more like essays, giving very in-depth product reviews. The longest one is over 10,000 characters. Here’s one Herculean post reviewing dozens of boots.

There are certain subreddits that lend themselves better to product recommendations. You wouldn’t expect to find great products in /r/gifs even if a lot of products are posted. A proxy indicator for the quality of products being posted could be the length of a comment. Comments that are long, tend to be more detailed responses, often times with reviews. The below graph shows the subreddits with the highest median comment length for subreddits having over 50 products posted.

WarCollege takes the top spot. WarCollege is a military history subreddit, and like AskHistorians, it has many high quality comments with great book recommendations. The latterdaysaints subreddit was a surprising number two. Most of the products posted were books.

Diving into Reddit comments gives a surprising amount of insight into the different communities on Reddit. While text analysis is difficult and pulling out Amazon links is certainly not the best way to analyze text, it does provide a different lens to looking at all the unstructured content.