You may have noticed that here on Plot and Theme I never attach a grade to my reviews. Distilling an entire film into a single number or letter has always rubbed me the wrong way, as it inherently removes any critical nuance from the discourse. But, I am aware that most reviews do provide a grade in summation, and these can help gauge the overall quality of the film. More recently, with the rise of review aggregators like Rotten Tomatoes and Metacritic, scores of these reviews are condensed down into a single number. The result is a peculiar derivative of a derivative – thoughts and words transformed into a number, then that number lost in a sea of others. The purpose of this piece is to explain that process in more detail, and ultimately determining if any of these procedures result in answering a simple question: Is <Insert Film > any good?

Let’s get this out of the way from the get-go: any grade, score, or full-fledged review of a movie cannot compare to seeing it for yourself. Well-reviewed movies can leave you unimpressed, and critically-panned ones can land right in your wheelhouse. This is not a discussion on the merits of movie critics or reviews as a whole. People like (or hate!) reading reviews for different reasons, but suffice to say that I think critics can play an important role in evaluating the merit of a creative work. This piece is not about any of that – it is about aggregating multiple reviews into a digestible number for the purposes of “summing up” the quality of a movie.

Why is this even an issue to discuss? Rotten Tomatoes scores and the like are becoming the de-facto way to package a movie for public consumption. I have seen “98% on Rotten Tomatoes!!!” used in TV spots in the past year, and have many friends who make their movie choices based on the score on the Tomatometer. This is especially true for smaller films that will never be able to boast “#1 movie in the nation” after a top-grossing opening weekend. Furthermore, Wikipedia pages for all films have both the Rotten Tomatoes and the Metacritic listed prominently (usually in the “Reception” section). Moreover, Google results for a film post these same metrics at the very top of the results. At the very least, it is important to understand what these scores mean.

So, what goes into the Tomatometer? According to the website, “ The Tomatometer™ rating represents the percentage of professional critic reviews that are positive for a given film or television show.” Essentially, Rotten Tomatoes curates a pool of critics who review films, either in print or online. Some of the more established critics get labeled a “top critic”, which doesn’t really factor into the final calculation, but when you look at individual reviews you can see who those critics are. Each review is characterized as “positive” or “negative”, based either on the actual grade given in a review, or a general understanding of the review. It appears as though a film has to be better than 6/10 to be counted as “positive”. For example, scores of 3/5 or C- on an A-F scale are counted as “negative” reviews for The Revenant, which you can see here. If 60% or more of the total reviews are “positive”, then the movie is given a “fresh” rating. Otherwise, it is given a “rotten” rating. There are some refinements which lead to “certified fresh”, but these are not important to the discussion at hand.

This is a far simpler system than the one used by Metacritic. Like Rotten Tomatoes, Metacritic takes a number of professional reviews, but instead of simply classifying them as “positive” or “negative”, assigns an actual score to the review on a 100-point scale. On the Metacritic website, you can see some of the conversions that they do to change a C+ into a 58/100, or 3.5 out of 4 stars into an 88/100, for example. Then, with all of these converted scores, Metacritic averages them together, assigning greater weight to critics and publications with higher quality. Hence, the score given by A.O. Scott of the New York Times counts for slightly more in the resulting average than Jimmy the Critic on his blog. This is why Metacritic refers to their final score as a “weighted average”.

Right away, we can see that these two scores, both of which are expressed as X / 100, actually measure completely different things. The RT score is basically, “what percentage of reviews are positive” (whatever that means), whereas the MC score is more of an average rating. It doesn’t take a great deal of imagination to see how these two systems could fail a movie fan. For the sake of argument, let’s look at two movies, which we well call Movie A and Movie B (still better titles than “Attack of the Clones”). Movie A receives 100 reviews on both RT and MC, half of which are 6/10 and half of which are 7/10. Movie B also receives 100 reviews, but they are all 6.5/10. It should be evident to all of you maths aficionados out there that Movie A and B both receive a MC score of 65 since that is how averages work. But, over at RT, Movie A ends up with 50% on the Tomatometer and Movie B ends up at 100%. Wait, what?

See, as far as RT is concerned, all of the reviews for Movie B are “positive”, so it is happy to grant the 100%. By contrast, half of the reviews for Movie A are only 6/10, which is “negative”, so this film would earn the “Rotten” moniker comfortably. Obviously things are never this clear-cut in the real world, but this particular example should give you pause. It is clear that the Rotten Tomatoes system should not be viewed as a quantitative system, but as a qualitative one. When two different above-average films end up with wildly different scores, it becomes difficult to use the Tomatometer as anything but a general guide: X% of the critics thought this movie was above-average.

Recognize also that this kind of system rewards mediocre films. This should also be clear with another simple example: Movie C receives 100 reviews as before, but all of them are 6.5/10, whereas Movie D is one of the best films ever made, as every one of the 100 critics gives it a 9.5/100 grade. Rotten Tomatoes has the exact same grade for both of these movies: 100%. Perhaps the best way to sum this all up: the Rotten Tomatoes score cannot tell you how good a movie is, only if most people think it is good or not.

For more information on that how good, we need to turn to a more robust scale like Metacritic. Since this metric is essentially an average, we can be quite confident that the higher the MC score, the more universally-beloved the film is. If there is a weakness to this system, it is that it artificially lowers the score of a film, because lower scores weigh down the average. This is why the people at Metacritic claim that any film which scores above 80% should be considered “Universally Acclaimed”, and films between 61 and 80 are characterized as “Generally Favorable”.

But, we have to be careful about both of these systems for a simple reason: they are both in bed with particular production companies! Metacritic is owned by CBS, which is owned by National Amusements, which owns Paramount. Paramount Pictures produces a number of films like Terminator Genisys and Mission: Impossible – Rogue Nation, and it benefits Paramount if Metacritic praises films like these. Similarly, Rotten Tomatoes is owned by Flixster, which is owned by Time Warner, which owns Warner Bros. Hence, higher RT scores is good for Time Warner movies.

So what’s the answer for those out there who are trying to decide what movie to see this weekend? On the surface level, you can get a good sense of how critics generally feel with the RT score. A movie mired in the low teens has a pretty good chance of being not-so-good, but may have some redeeming qualities that get lost in the aggregation process. By contrast, even a movie in the high 80s could have some flaws, and you’ll probably not be able to use Rotten Tomatoes to decide between an 87% and a 95% – too much of the nuance has gotten lost. Hopping over to Metacritic may help slightly, but you’ll likely be confronted with a similar problem thanks to the inherent nature of averages: it may be hard to tell if you would enjoy the film with a score of 75 or the one with a score of 68. And the reason is simple: these review aggregators normalize out any and all specifics, until you are reacting to nothing more than a number.

A number cannot explain the majesty of 2001: A Space Odyssey or the horror of Alien. It will never tell you just how funny Ghostbusters is, nor express the quotability of The Big Lebowski. As lovers of movies and rational consumers, we do a great disservice to ourselves by trying to distill a film into a nice, easy, numerical summation. Instead, get out there and read reviews from critics who intrigue you (note: not necessarily those you agree with!). Watch youtube videos where the creators explain specifics moments in the film and why those affected them so.

For in the end, Rotten Tomatoes and Metacritic and IMDB user grades and anything else which seeks to establish a “consensus” grade for the quality of a film is contrary to the process of how humans actually experience a film. One does not jot each moment and aspect of a film into a great ledger, add up the red and black ink and come to an encompassing conclusion. Our jaws drop and our hearts skip at moments, and our eyes roll when they are unearned or poorly executed. Find critics who convey which films possess the moments you enjoy most, and then go support those films.

(Note: originally, the third paragraph of this piece claimed that the Wikipedia page of a film listed these scores near the top of the page. This is not true, and I meant that these scores were reproduced next to the Wikipedia entry on the Google search results, but clearly failed to make this point clear. Thanks to /u/CuddlePirate420 on reddit and Scott Keith in the comments section for pointing out this mistake. I have since corrected the paragraph to its current form.)