When we launched Soylent, we were flooded with comments on Hacker News, Twitter, and Reddit. There was so much feedback, both positive and negative, we actually couldn’t read through every comment. Faced with the problem of having too much customer feedback (an amazing problem to have), I built a solution that let us highlight the most important issues to both our supporters and detractors. The analysis generalizes pretty well, so I’m going to outline how I designed the system, and then use Elon Musk’s Hyperloop announcement and the discussion associated with it as an example.

How the NYTimes visualization actually works:

A comparison of how often speakers at the two presidential nominating conventions used different words and phrases, based on an analysis of transcripts from the Federal News Service.

Basically, the bigger bubble in the NYT visualization, the more times that word was mentioned in speeches, just like a traditional word cloud. The NYT visualization is different though because it actually gives you a useful perspective on where an issue sits between the two parties. Normal word clouds just highlight popular topics and don’t really teach you anything.

Example of a useless word cloud:

This word cloud really teaches you nothing about a topic except the buzzwords that are broadly associated with it. The different colors don’t actually correlate to any information. There are many great articles about the pitfalls of standard word clouds and how they can be misleading. I’m partial to this one, in which a New York Times senior software architect describes them as the “mullets of the Internet.”

How a “weighted” word cloud is different:

The weighted word cloud shown in the NYT example is different because it shows you not only what concepts are popular topics, but where they sit on two sides of an issue (political stance). This is relevant to customer feedback analysis because the feedback you receive about your product, comes on a continuum similar to the political spectrum. Just like Democrats and Republicans will highlight different policy topics in their convention speeches, supporters and detractors will highly different product features in their comments. When launching a new product, especially one that is receiving a mixed response or has sparked controversy, it is incredibly important to treat the feedback of your supporters differently from your detractors and a weighted word cloud helps you do just that. Using machine learning / natural language processing, we can automatically classify comments as positive, negative, or neutral with sentiment analysis and then find exactly what issue is at the heart of each comment using entity resolution. If you’d like to learn more about sentiment analysis and how it works Walaa Medhat, Ahmed Hassan, and Hoda Korashy wrote a great overview of the field as it stood in 2014.

Building an automated system to do this analysis:

I thought this would be a cool way to look at the data available online about various products, so I built an automation around this type of analysis using the HackerNews API, Google’s Natural Language API, and D3.js. All of this is wired together using Python. I would like to build a web interface to this and make it publicly available, but I’m a bit worried about the cost of the all the API calls if people start using it frequently.

The HackerNews API is hosted on Firebase and documented on GitHub. It allows us to pull all the comments about a story very easily. Once we have these comments, we can use Google’s Cloud Natural Language API for entity resolution and sentiment analysis. Once we have aggregated all of the most frequently mentioned entities and their associated sentiment scores, plugging that data into a weighted word-cloud is trivial.

The code essentially does the following:

Use the HackerNews ID to get comment IDs from the HackerNews API. Traverse the comment threads to get all of the related story comments. Submit each comment to the Natural Language API and store the results. Transform the results into a JSON object for visualization in D3. Use D3 to draw a colored bubble for each entity according to sentiment.

One of the most critical steps here is the sentiment classification of each comment as either positive or negative. This is where the machine learning actually comes into play and what makes it possible to separate out comments into two categories (supporters and detractors) that map well onto the weighted word cloud. In the screenshot, you can see some example results from Google’s Natural Language API. Using this API essentially removes the need to dive into all the complexities that come with training and using a new machine learning model. Google’s model is highly accurate, reliable, and most importantly, available at affordable rates via a simple API call. In the API example, the entire document is classified as expressing positive sentiment (with a score of 0.3 on a range of -1.0 to 1.0). Additionally, nine different entities are identified, three of which have sentiments associated with them.

As each comment is passed to the Natural Language API, the document sentiment score along with the entities identified within it are stored. These form the basis of our weighted word cloud. Entities that are mentioned most frequently will have larger bubbles and the bubbled will be shaded according the their sentiment. Instead of the red for Republican and blue for Democrat color scheme, the sentiment analysis weighted word cloud uses red for detractors and green for supporters. Lastly, this visualization is drawn using the excellent JavaScript library D3.js, using the force-directed graph layout.

Using this analysis on the Hyperloop launch announcement

Concept imagery from Hyperloop Alpha

I thought it would be apt to revisit the Hyperloop launch announcement with a case study using this type of analysis for a few reasons. First, Elon Musk just announced on Tuesday that his newest venture, The Boring Company, will be building a fully-functional Hyperloop. Second, the Hyperloop manifesto is perhaps the most consummate pre-launch teaser I can think of in recent memory. Venture capitalists have been funding pre-launch companies for decades, providing the most rudimentary of market signals (“does a VC think this is a good idea”). Recently, KickStarter and crowdfunding more broadly have created a wave of companies that have benefited from being able to test their theories with potential users before investing heavily in development of the actual product, but the Hyperloop announcement took that idea to a new level. By announcing the idea of a product (high-speed trains in vacuum tubes) four years before announcing an actual plan to build that product, Elon was able to popularize the idea and learn what what most important to potential consumers with very little upfront investment.

Here is what HackerNews comments on the original Hyperloop launch look like when visualized using a sentiment-weighted word cloud.