Predicting the World Cup with the Google Cloud Platform¶

This notebook builds a machine learning model that can be used to predict professional soccer games. The notebook was created for the "Predicting the Future with the Google Cloud Platform" talk at Google I/O 2014 by Jordan Tigani and Felipe Hoffa. A link to the presentation is here: https://www.youtube.com/watch?v=YyvvxFeADh8

Once the machine learning model is built, we use it to predict outcomes in the World Cup. If you are seeing this after the world cup is over, you can use it to predict hypothetical matchups (how would the 2010 World Cup winners do against the current champions?). You can also see how various different strategies would affect prediction outcomes. Maybe you'd like to add player salary data and see how that affects predictions (likely it will help a lot). Maybe you'd like to try Poisson Regression instead of Logistic Regression. Or maybe you'd like to try data coercion techniques like whitening or PCA.

The model uses Logistic Regression, built from touch-by-touch data about three different soccer leagues (English Premier League, Spainish La Liga, and American Major League Soccer) over multiple seasons. Because the data is licensed, only the aggregated statistics about those games are available. (If you have ideas of other statistics you'd like to see, create a new issue in the https://github.com/GoogleCloudPlatform/ipython-soccer-predictions GitHub repo and we'll see what we can do.) The match_stats.py file shows the raw queries that were used to generate the stats.

There are four python files that are used by this notebook. They must be in the path. These are: