“Overrated” and “underrated” are slippery terms to try to quantify. An interesting way of looking at this, I thought, would be to compare the reviews of film critics with those of Joe Public, reasoning that a film which is roundly-lauded by the Hollywood press but proved disappointing for the real audience would be “overrated” and vice versa.

To get some data for this I turned to the most prominent review aggregator: Rotten Tomatoes. All this analysis was done in the R programming language, and full code to reproduce it will be attached at the end.

Rotten Tomatoes API

This API is nicely documented, easy to access and permissive with rate limits, as well as being cripplingly restrictive in what data is presents. Want a list of all films in the database? Nope. Most reviewed? Top rated? Highest box-office takings? Nope.

The related forum is full of what seem like simple requests that should be available through the API but aren’t: top 100 lists? Search using mulitple IDs at once? Get audience reviews? All are unanswered or not currently implemented.

So the starting point (a big list of films) is actually kinda hard to get at. The Rube Golbergian method I eventually used was this:

Get the “Top Rentals” list of movie details (max: 50) Search each one for “Similar films” (max: 5) Get the unique film IDs from step 2 and iterate

(N.B. This wasn’t my idea but one from a post in the API forums, unfortunately didn’t save the link.)

In theory this grows your set of films at a reasonable pace, but in reality the number of unique films being returned was significantly lower (shown below). I guess this was due to pulling in “walled gardens” to my dataset, e.g. if a Harry Potter film was hit, each further round would pull in the 5 other films as most similar.

Results

Here’s an overview of the critic and audience scores I collected through the Rotten Tomatoes API, with some outliers labelled.

On the whole it should be noted that critics and audience agree most of the time, as shown by the Pearson correlation coefficient between the two scores (0.71 across >1200 films).

Update:

I’ve put together an interactive version of the same plot here using the rCharts R package. It’ll show film title and review scores when you hover over a point so you know what you’re looking at. Also I’ve more than doubled the size of the film dataset by repeating the above method for a couple more iterations — take a look!

Most underrated films

Using our earlier definition it’s easy to build a table of those films where the audience ending up really liking a film that was panned by critics.

Somewhat surprisingly, the top of the table is Facing the Giants (2006), an evangelical Christian film. I guess non-Christians might have stayed away, and presumably it struck a chord within its target demographic — but after watching the trailer, I’d probably agree with the critics on this one.

This showed that some weighting of the difference might be needed, at the very least weighting by number of reviews, but the Rotten Tomatoes API doesn’t provide that data.

In addition the Rotten Tomatoes page for the film, shows a “want to see” percentage, rather than an audience score. This came up a few times and I’ve seen no explanation for it, presumably “want to see” rating is for unreleased films, but the API returns a separate (and undisclosed?) audience score for these films also.

Looking over the rest of the table, it seems the public is more fond of gross-out or slapstick comedies (such as Diary of a Mad Black Woman (2005), Grandma’s boy (2006)) than the critics. Again, not films I’d jump to defend as underrated. Bad Boys II however…

Most overrated films

Here we’re looking at those films which the critics loved, but paying audiences were then less enthused.

Strangely the top 15 (by difference) contains both the original 2001 Spy Kids and the sequel Spy Kids 2: The Island of Lost Dreams (2002). What did critics see in these films that the public didn’t? A possibility is bias in the audience reviews collected, the target audience is young children for these films and they probably are underrepresented amongst Rotten Tomatoes reviewers. Maybe there’s even an enrichment for disgruntled parent chaperones.

Thankfully, though, in this table there’s the type of film we might more associate with being “overrated” by critics. Momma’s Man (2008) is an indie drama debuted at the 26th Torino Film Festival. Essential Killing is a 2010 drama and political thriller from Polish director and screenwriter Jerzy Skolimowski.

There’s also a smattering of Rom-Coms (Friends with Money (2006), Splash (1984)) — if the API returned genre information it would be interesting to look for overall trends but, alas. Additional interesting variables to consider might be budget, the lead, reviews of producer’s previous films… There’s a lot of scope for interesting analysis here but it’s currently just not possible with the Rotten Tomatoes API.

Caveats / Extensions

The full code will be posted below, so if you want to do a better job with this analysis, please do so and send me a link! 🙂

Difference is too simple a metric. A better measure might be weighted by (e.g.) critic ranking. A film critics give 95% but audiences 75% might be more interesting than the same points difference between a 60/40 rated film.

There’s something akin to a “founder effect” of my initial chosen films that makes it had to diversify the dataset, especially to films from previous decades and classics.

The Rotten Tomatoes API provides an IMDB id for cross-referencing, maybe that’s a path to getting more data and building a better film list.

Full code to reproduce analysis

Note: If you’re viewing this on r-bloggers, you may need to visit the Benomics version to see the attached gist.