Photo by Thomas Serer on Unsplash

One of the best ways to expand your coding skills is to get stuck in with a hands-on project on a subject you’re interested in. Just before the 2019 Rugby World Cup started in September I decided that I wanted to practice my Python and data science skills by trying to predict the outcomes of the tournament matches.

Below I’ll walk through the high-level process I went through to try and predict both the winners and the scores of the Rugby World Cup group stages. Hopefully it encourages you to try it yourself on your favourite sport!

For this project I used Python, Jupyter notebooks, PyCharm and Power BI (for quick data viz).

Step 1: Get the data

ESPN Scrum has an extensive database of international rugby matches. I wanted all international rugby matches from 2003 onwards. I chose this date because the World Rugby Rankings were introduced just before the 2003 World Cup (more on this in the Features section).

ESPN Scrum match result database

But this data was split over 175 pages - copy/paste was not a pragmatic solution. BeautifulSoup to the rescue. BeautifulSoup is a library for pulling data from HTML and XML files. Below you can see the raw scraped data containing information about a Wales match on 7 September 2019:

Result of BeautifulSoup scraping for match results

I noticed that the ESPN URL had a reference to page, so I iterated through each page and stored the results. So with a little bit of work I had 16 years worth of data in my dataset — over 8,700 rugby matches.

DataFrame of match results

I wanted to add the World Rugby Rankings as a feature so I performed similar scraping to get each team’s ranking since 2003.

World Rugby Rankings

Step 2: Data preparation

The scraped data obviously didn’t come in a perfect state for machine learning, so I needed to clean up a few minor things such as removing the ‘v’ in the Opposition name field and ensuring that a country’s name was the same in both the Team and Opposition fields (I noticed the USA was USA in one and United States of America in the other). Pandas makes this relatively straightforward — see apply and map, for example. I performed a few other preparation steps such as accounting for errors/blanks in the raw data and formatting dates correctly.

Step 3: Features

The World Rugby Rankings were merged on to the main dataset so that for each team, their ranking at the date closest to the date of each match was applied.

While I had the World Rugby Rankings, I still wanted to calculate a separate, simpler skill score for each team. I calculated relative skill using a generalisation of the Elo rating system. In this system, a player’s rating changes depending on the match result and the relative difference in rating between the two teams. In a simple example, if a team ranked 10th beats a team ranked 2nd, the 10th ranked team will receive more rating points than if they beat a 15th ranked (weaker) team. Similarly, the losing 2nd ranked team will lose more rating points for losing against a weaker opponent than if they lose to say, the top ranked team. The algorithm is therefore self-correcting and over time ‘learns’ the skill of each team.

The World Rugby Rankings take various match features into account such as match status (more points for a World Cup final), home advantage, and points scored in the match. The calculated skill score ignores all of that and simply looks at match result and relative skill between the two teams at the time of the match.

Let’s see how the rankings compare at a snapshot in time:

Official World Rugby rankings (23rd September) versus the calculated rankings

Some interesting results here — Australia ranked higher than expected, and Ireland and Wales lower than the true rankings. I suspect Australia’s high ranking is heavily influenced by their initial high starting point (they were ranked 3rd best team in the world in 2003). Below you can see how the skill scores of some teams have changed over time as the algorithm learns. Notice the steady rise of England, Ireland and Wales from about 2014 onwards.

Relative team skill score over time

In summary, the data for each historical match included: match date, match result, points for, points against, World Rugby rankings for both teams and a relative skill score for both teams.

Step 4: Train and test

For score prediction I used Keras, a high-level neural networks API. The simplest model in Keras is the Sequential model.

I performed hyperparameter tuning (check out Jason Brownlee’s great tutorial here), and experimented with both wide (one layer with a lot of neurons) and deep (more layers but fewer neurons per layer) networks. I ended up using a first layer of 15 neurons and a second layer of 8 neurons, both with the Rectified Linear Activation Unit (ReLU). We don’t use an activation function for the output layer as we don’t want to transform the output values.

Neural network topology

Below you can see a plot of the loss (Mean Squared Error) over epochs.

Neural network model loss against number of epochs

I also tried the fan favourite XGBoost, but obtained slightly better results with the neural network, at the expense of a longer training time.

Step 5: Predict!

For model input I created a dataset of the upcoming World Cup group stage matches and assigned each team their latest skill score and ranking.

So how did the predictions perform? Much better than I expected (at least for the first 15 games)! Of the 30 games it predicted, it correctly predicted the winner in 27 of them (predicting the Japan versus Ireland upset would have been astounding).

The model performed particularly well in the first 15 games, being within less than 5 points difference of the actual score difference in 10 of those.

Model prediction results versus actual results

Note that I didn’t retrain the model during the group stages, the results above were obtained right at the beginning of the World Cup. It was interesting to note that once most teams had played at least one game, the predictions seemed to worsen. This makes sense as the relative rankings and skill scores would have changed, so one could assume that the predictions would have improved if I updated these features and retrained the model after the first round of games (which I didn’t do, too caught up in watching the actual games!).

Either way, not bad results for a couple of evenings of work! Machine learning is a fascinating field, do let me know if you decide to try this yourself, there is plenty to improve on and interesting features that could be added. Hope you enjoyed the walkthrough!

Here are some useful resources to get you started:

Web scraping with BeautifulSoup

Manipulating data with Pandas

Data science

I highly recommend Jake VanderPlas’s excellent Python Data Science handbook (includes example Notebooks)

Getting started with Keras and neural networks