An interspecies friendship. A mother desperate to find her daughter’s killer. A teenager coming of age. These are just some of the stories competing for Best Picture at the 90th Academy Awards. With so many compelling narratives and no single film dominating the precursors, some have declared this year’s race wide open.

So which film should you pick if you’re one of the many Americans participating in an Oscar pool this year? Rather than relying on gut instinct, we’ve used the power of data science to help you select the film that is most likely going home with the famous gold statue on March 4th.

Improving the Business of Predicting Oscars

If you’ve started to research the contenders, you’ve probably encountered the litany of articles published by Oscar pundits. From Vanity Fair to The New Yorker, nearly every major publication covering Hollywood releases some sort of Oscar Predictions. For example, just last week, esteemed critic Richard Roeper predicted that Three Billboards Outside Ebbing Missouri would take home the coveted prize. His evidence… ”I’m sticking with the best film of 2017.”

Just like our unique approach to tech education, we believed that the business of predicting Oscars was also ripe for disruption.

To understand our methodology, one must first understand a concept called “supervised learning.” Supervised learning is a machine learning concept that allows data scientists to understand the relationship between one output and a lot of inputs. In this case, it helps us understand past outcomes (who won Best Picture previously and why) so we can better predict future outcomes (who will win this year and why).

We began our journey to predicting this year’s best picture winner by collecting and cleaning lots of data. From critic ratings to performance at precursors, we looked for any and all publicly available information about the films that had been nominated for Best Picture over the past X years. This data would help inform our algorithm which we would build using SciKit Learn, one of the most popular learning toolkits in the world.

Through evaluating multiple models, we determined that random forest classification provided the most accurate prediction of previous Oscar winners. Random forest classification is a machine learning method that determines the relationships between variables through the creation and evaluation of decision trees. In the example below, you can see how decision trees ask a series of Yes No questions to arrive at an answer.

The Random forest classifier algorithm recognizes that as decision trees become more complex, they have a tendency to pick up on nuance and creates rules on randomness in a process known as capturing noise or overfitting. Therefore, rather than creating large complex trees, random forest makes many small trees with slight variations, allowing us to find higher level, generalizable rules. When applied to Oscar winners and losers over the past 38 years, this approach made correct predictions in all but 1 year, 2017. (We guess PWC aren’t the only folks that need a mulligan).

The Envelope, Please.

After running this year’s best contenders through the random forest classifier algorithm, we believe that...

Wait for it...

SHAPE OF WATER will win best picture this year.

Here's how the rest of the nominees faired.

Share

If you end up winning your pool as a result of our research, please let us know by tweeting at us. And if you’re interested in learning how to make similar predictions to help you win your Bachelor pool, Fantasy football draft or something else of the sorts, we encourage you to check this out.