Patrick Lestrange was an Insight Data Science Fellow (Winter 2018) from Insight’s second Seattle session. In his first four weeks at Insight, he built a Chrome extension for Amazon customers to easily identify useful topics in customer reviews. Previously, Patrick was a postdoctoral researcher at the University of Washington, where he also received his PhD in Chemistry in May 2017. He is now a Data Scientist at Boeing, working on pilot-less airplanes.

Like over 75% of Americans, I do most of my shopping online with Amazon in part because their wealth of product reviews allows me to make smart choices as a consumer. These product reviews are a great way to qualitatively assess different features of a product. For example, I found this 3-star review for a television:

“I would be in love with this TV if it didn’t have issues with black color. It blotches large areas of black a good amount of the time. Love every other feature but this is very annoying.”

If that feature is important to me then I would know not to get this TV, but if I don’t care about that then this could be a good find. But for some products, the sheer number of reviews overwhelms my ability to clearly identify how other customers felt about different features.

Amazon’s “Read reviews that mention” feature is their attempt to summarize topics from reviews, but it has a few shortcomings: its topics are redundant and the reviewers’ overall opinions for each topic are unclear (more in the next section). To improve Amazon’s “Read reviews that mention” feature, I developed the Chrome extension Gefilter Fish during my Insight Fellowship using Natural Language Processing techniques. This extension offers a concise overview of topics and reviewers’ sentiment.

Amazon’s “Read reviews that mention” feature

Amazon’s “Read reviews that mention” feature contains a series of buttons that filter reviews by the most used words. For example, for this TV wall mount Amazon suggests seventeen words from customers’ reviews:

Amazon’s “Read reviews that mention” feature with seventeen buttons for a TV wall mount.

Amazon’s buttons are redundant

Many of the buttons in this feature are redundant! For example, “install”, “installation”, and “installed”; as well as “mounting”, “mounted”, and “mounts”. This is because reviewers are discussing, respectively, the installation process and the wall mount in different ways. Some words pulled from reviews also describe hardware for the wall mount: “screws”, “stud”, “bolts”, “hardware”, and “plate”. That’s eleven words that could be condensed to just three!

Highlighting redundancies in Amazon’s feature. Red (installation), yellow (mounting) and blue (hardware) buttons.

Amazon’s buttons lack qualitative sentiment

Amazon’s feature also only offers topics of interest, but doesn’t give you an overall idea of whether the reviewers had positive or negative things to say about these aspects of the product. Clicking on the “install” button, it returned over five hundred reviews!

It would be nice to know if the installation process is easy or difficult just at a glance, rather than needing to scan through several reviews to get that information!

Gefilter Fish creates distinct & qualitative buttons

I created Gefilter Fish, a Chrome extension that replaces Amazon’s “Read reviews that mention” section with something more useful to help filter customer reviews:

Gefilter Fish replaces Amazon’s “Read reviews that mention” with more concise buttons that include sentiment for a TV wall mount.

The topics are much more concise (6 instead of the 17 in the original feature), and for each topic an emojis conveys the customers’ sentiment about that aspect of the product. Gefilter Fish’s 3 buttons, “install easy”, “mount wall tv”, and “stud plate screw” easily replace Amazon’s 11 buttons!

Let’s talk about how I built this…

The Gefilter Fish Recipe

Fishing for data

The first step in any Data Science project is to get data and for Gefilter fish I needed lots of reviews for Amazon products. In my first iteration to collect data, I scraped reviews directly from an Amazon product page with selenium. Unfortunately Amazon was quick to notice when I started scraping their pages for reviews, so the next best option was to use a static database of Amazon products with reviews.

To create a static database of Amazon product reviews for Gefilter Fish, I leveraged Julian McAuley’s database of customer reviews. This database includes Amazon product identification numbers (ASIN), the text of the review, the customer’s overall rating, and a few other pieces of information. I transformed the raw json files into a pandas dataframe for his 5-core electronics reviews where all users and products have at least 5 reviews. The database was clean with well-defined features, so I only had to remove a few reviews that were missing text from the database.

Using McAuley’s static database, Gefilter Fish only works for the products and reviews within the database. It however provides the data necessary to build a proof-of-concept to show how data science can vastly improve the user experience for products reviews.

Deboning reviews

How does a computer understand something as complex as human language? The answer is to transform that language into a form that computers can interpret and understand, like numbers!

To embed the product reviews into a numerical space, I used spaCy and the natural language toolkit (NLTK). First, I made all the words lowercase and then removed contractions, punctuation, and English stop words.

Because my objective is to ultimately reduce the redundancy in Amazon’s “Read reviews that mention” feature, I made sure to identify when different words have the same root. I used two techniques to do this: lemmatization and stemming. Lemmatization handles words that have an odd plural form like “geese”, “mice”, etc. Stemming trims off the ends of words and recognizes that “installation” and “installed” really have the same root.

For example,

Lemmatization turns “geese” into “goose”

turns “geese” into “goose” Stemming turns “installation” and “installed” into “instal”

Both of these techniques helped me find common roots for words, but unfortunately stemming often returns strings of letters that are not real words, e.g., “instal”. But because I want to show topics to customers, I needed to transform stemmed words back into real words. I simply had to create a lookup table that would match stems to their real words and display those on the buttons for the Chrome extension.

Reeling in topics

I used topic modeling to find different aspects of the products discussed in the cleaned-up reviews. Because the themes discussed for an HDMI cable are very different from a camera battery, I identified topics within each product’s reviews, rather than topics across all product reviews.

Topic modeling techniques are generally clustering algorithms that thematically groups reviews. Of the many different ways to cluster data for topic modeling, I chose non-negative matrix factorization (NMF) and latent Dirichlet allocation (LDA), because both topic modeling approaches return only positive weights for the topics. Other clustering algorithms like SVD sometimes return negative weights, which is not as interpretable in this case — what does it mean for a review to contain a negative amount of a certain topic?

Each review is tokenized by splitting into individual words (and bigrams). Stop words are removed and then each word is lemmatized and stemmed. Then the words are turned into numerical TF-IDF values.

For LDA, I used the raw count of the word in the review (like they do in the original paper) and for NMF I used the term frequency–inverse document frequency (TF-IDF) values. TF-IDF scales a word by how frequently it appears across all the reviews for that product. If a word appears a lot in one review, but is not common across all the reviews then it’s significance is increased. Alternatively, if a word appears across all the reviews it’s significance is decreased.

I used either the word count or the TF-IDF values to construct feature matrices for each set of reviews. Each column represents one review and each row is an individual feature. In my case a ‘feature’ was a single lemmatized/stemmed word or a bigram (two adjacent words). Also using bigrams ended up giving me topics that were a little more descriptive.

This is an unsupervised learning problem, so there’s no ground truth to compare against. Deciding between NMF and LDA was then a subjective choice. But I found that the topics from NMF tended to make more sense to me, so that’s what I went with for the final model.

Was it good or bad?

NMF identifies topics that come out loud and clear as important from reviews, but how are they important? To provide a contextual understanding of how reviewers feel overall about certain aspects of a product, e.g., whether installation was easy or hard, I performed sentiment analysis for each topic.

NMF returns a set of words that define a topic, e.g., “bolt”, “stud” and “screw” if customers are discussing hardware. For each of these words, I found the sentences in a review that discuss only that topic, and then analyzed their sentiment. This provides a clearer signal on how a customer feels about a particular aspect of the product and has more information than just their overall rating.

Find sentences containing the top words for a topic. Use those sentences only to identify how the customer feels about that topic.

To analyze sentiment, I used the lexicon and rule-based sentiment analysis tool VADER, which is known to work well for Amazon product reviews. I considered values below -0.5 as negative (👎), above 0.5 as positive (👍), and anything in between as neutral (😐).

Emojis summarize overall sentiment for. Customers can quickly glean that a product might be easy to install, but could also have bent studs or arrive broken.

Validation

With Gefilter Fish, I aimed to build a product that would reduce the number of topics from what Amazon’s “Read review that mention” feature provides. However as an unsupervised learning problem, there’s no ground truth to compare against nor an error metric for validation. So to validate my model, I had to be creative, particularly in the absence of A/B testing.

To validate my model, I measured redundancy in the topics that I’ve generated and compare to Amazon’s current feature. I had scraped words from Amazon’s “Read reviews that mention” feature for 400 products before Amazon started checking whether I was a robot. I used the equation for relative redundancy as it’s described in information theory

Equation for relative redundancy, R(X), as a function of information entropy, H(X). X represents the set of words output from Gefilter Fish and Amazon’s “Read reviews that mention” feature. Information entropy, H(X) depends on the probability of a certain word appearing in the set p(x). The redundancy function determines how redundant the set is relative to the maximum entropy. A value of 0 means that all words are unique and a value of 1 means that all words are the same (redundant).

I combined the words from the topics from Gefilter Fish and measured how redundant that list of words was. I did the same thing for Amazon’s feature and then compared how well the two approaches performed across different products.

Box plot distribution of the difference in redundancy between the topics from Amazon’s “Read reviews that mention” feature and Gefilter Fish. Positive values indicate Gefilter Fish’s topics are less redundant than Amazon’s feature.

On average, Gefilter Fish reduced the redundancy by about 18%! Amazon’s feature was more redundant than Gefilter Fish for most of the products (55%; positive values in the figure above).

Instances where Gefilter Fish performed worse than Amazon (45%; negative values), can be explained by the fact that the redundancy metric in the above equation is really a lower bound for how well Gefilter Fish performs. That’s because there are other sources of redundancy that the redundancy function does not account for.

For instance, let’s say there are two topics: “work great” and “great price”. In order to use the redundancy function, I first combine the topics into a single string: “word great great price”. Instead of the word “great” being used to describe two different aspects of the product, it now looks like a redundant word and is treated as such by the redundancy function. It’s not a perfect metric, but it does give a rough idea of how well Gefilter Fish is doing.

Conclusion

I created Gefilter Fish as a Chrome extension to address problems I noticed in Amazon’s current feature for summarizing topics in reviews. Gefilter Fish reduces the number of topics by more 18% compared to Amazon’s original feature, while also providing a quick overview of customer sentiment .

Gefilter Fish is something that Amazon could easily deploy in place of their existing feature. This small addition has the potential to vastly improve the user experience, reducing the amount of time spent rifling through reviews!