As a regular user of sites like Amazon and Yelp , I am fairly influenced by the star rating system in making my choices. I have always been interested in analyzing the ratings given by users, and the possible motivation behind them. Questions like, can a user actually differentiate between 1 and 2 star?, why do certain items/restaurants have bipolar ratings (is someone gaming the system or is it a politically charged item) have been in my mind for some time. So I decided to do some “fishing expedition” research and analyze the ratings and their characteristics on a larger set of data. The easiest data set that I could glean off the internet using Matlab was the amazon best seller list, specifically for books , since it provided 100 best selling books on amazon for the past 19 years (beginning in 1995). Since the books were already best sellers, I did not have to worry about them having enough number of ratings (but for a few exceptions from the early years ) to avoid falling victim to the Law of small numbers. The conclusions are made as mere observations and any causal analysis is done keeping the Texas sharpshooter in mind!

Analysis Highlights

Close to 2.8 million ratings were given for the best sellers by users. As expected bestsellers have mostly 5-star ratings, and the average rating is ~4.15-star. On an average users are least likely to give 2-star ratings, a person disappointed enough to give a 2 star would be more than willing to give 1 star given the nature of internet feedback!

The above graph is the result of averaging the normalized votes given for each bestselling book. It is a first order approximation of what fraction of votes do each star ratings get for a typical book in the bestseller list. Also visualized below is the variation in average rating over the years. The graph indicates that the distribution of ratings is fairly uniform across the years. The only exception being the early years, which show more closer totals of 2-star and 1-star votes.

One can also analyze the total number of ratings given for books in the list (for each year) as a function of the year. This metric serves as a proxy for number of people buying books over internet. A mostly monotonic increase is observed, except for a post 9/11 reading lull. The dip seen in 2013 will correct in a matter of time, since the newer books have not been reviewed enough. This can be inferred by looking at the data gathered from earlier times as shown below for October of 2013 and January 2014.

Lastly, shown below are the distribution of the total number of votes and the distribution of the star ratings, for the best selling books. Of the 1900 (19*100) listed books (including some repetitions), ~230 of them have less than 100 ratings, and most of the best sellers have 100-5000 votes. As observed earlier the star-rating distribution is skewed towards 5-stars since these are bestsellers already.

With these overall insights in hand, one can look at the specifics.

Analysis specifics or name calling

For the purpose of analyzing the specifics I considered only those books which have at least ten ratings to avoid extreme outliers. My initial motivation was to look at the controversial books, but later expanded to include the analysis best and the worst rated best sellers. The table below shows the worst rated best selling book for each year starting from 1995.

Unsurprisingly in most of the years novels end up being the worst rated best sellers as opposed to politically charged books, since politically charged books seem to have strong supporters (irrespective of book quality) to balance the strong detractors. Whereas fans of novelists do not have the religious zeal of political supporters, and thus don’t give 5-star ratings just to support their favorite author. Novelists Tom Clancy (The Bear and the Dragon, Red Rabbit), Patricia Cornwell (Blow Fly, Trace), John Grisham (The Appeal, The Associate), Michael Crichton (Next), and J.K Rowling (The Casual Vacancy), and political pundit Bill O Rielly all have the dubious honor of authoring a worst rated best seller. Along similar lines one can look at the best rated best selling books given in the table below. There are quite a few repetitions here, with best rated best sellers remaining the same across many years. Most of the these are children books (Dr. Seuss’s Oh, the Places You’ll Go!, Brown Bear, Brown Bear, What Do You See?), young adults books (Percy Jackson series), or religious book (Jesus Calling: Enjoying Peace in His Presence).

To quantify the controversial nature of books, I created a simplified controversy index (C.I) given by C.I=(number of 5 star votes /number of 1 star votes)+(number of 1 star votes /number of 5 star votes). Anyone familiar with minimization can quickly see that the C.I is of the form x+1/x and can obtain a minimum value of 2 for a positive x. The C.I results and the corresponding most controversial book for each year is shown in the table below.

Initially my guess was that during election years politically charged books become the most controversial ones. Although this hunch turned out to be true for 2002 and 2004 (with Sean Hannity’s Let-Freedom-Ring-Winning-Liberalism, and Ann Coulter’s How to Talk to a Liberal ), the subsequent years don’t show such coincidences. In many cases the worst rated best selling novel are often the most controversial indicating some sort of fan following for the novelists countering my earlier assertion. Another interesting (and obvious) observation is that the early web catered mostly to software professionals with all the three tables showing books related to the profession from 1995-1997.

Analysis caveats: 1) Analysis was done on data extracted on May 2 2014, 2) Some books appear on best sellers for multiple years, 3) In less than 5 cases with no votes a single vote was assigned for simplification of analysis, 4) Ratings are often given after the year in which a book entered the top 100 list, so total number of ratings per year graph must be taken with a grain of salt, 5) Critical ratings for some of the books may be directed towards vendor/kindle edition/service.

Bonus Table: