Introduction

Gaussian Process Regression

1. That there is a functional form to temperature change as a function of time.

$temp(t) = a+b\cos(ct)$

$temp(t) = at^2+bt+c$

$temp(t) = at^4+bt^3+ct^2+dt+e$

2. The correlation between temperatures at various time points can quantified.

we will need data that represents all the regions of prediction space.

Data Collection and RUCK Construction

First we need data to build the regression. Because of our need to collect data throughout the space I want to predict over I am going to have to make some restrictions. This is because games are played in international rugby with an eye towards parity; nobody wants to see New Zealand beat the tar out of Kenya. Since we want to ultimately predict the Rugby Championship (RC), we are going to need to look at Tier 1 and high Tier 2 countries, with focus on those countries that play in the RC: Argentina, Australia, South Africa, and New Zealand. With this in mind, I focused on games in which the teams playing had ranking larger than 70. I then selected 30 or so games with the aim of capturing RC teams, filling out the space I wanted to predict over, and with sufficient recency to best reflect the player composition of the team. For each game, I recorded the points ranking of each team (accounting for home advantage) , the score, and the number of tries for each team. I needed the tries in order to be able to properly assign points in a tournament context.





Rank Utilizing Competition Kernel (RUCK). This was done in Python, and followed the temperature analogy: each game is a point in a two dimensional plane which we will call the rank space:





With the data collected we are now ready to build the(RUCK). This was done in Python, and followed the temperature analogy: each game is a point in a two dimensional plane which we will call the rank space:





For each point there is associated a 4 dimensional vector: Home score, Home tries, Visitor Score, Visitor Tries. We then use a polynomial form and fit it to the data. The final output is a set of 8 functions whose domain is the rank space which output the predicted score of each team, the variance in the predicted score for each team, the number of tries for each team, and the variance in the predicted number of tries for each team.





Making Predictions with RUCK

Now, there are two ways that we can make predictions with RUCK: One we can make a point estimate of the scores and tries of each team and use this a single prediction for the games outcome. However, this fails to capture the stochastic nature of sport. Indeed, I do no think that anybody believes that in competitive settings the outcome of the game is predetermined beforehand. So, we will be using both: We will be generating an expected outcome and a win probability. To compute the win probability, we will use the Bayesian perspective of kriging and say that each point in the rank space corresponds to a distribution whose mean and variance in set by the outputs of the regression, and we will sample 1000 times from that distribution (that is, simulate the game 1000 times) to compute the probability of one side winning.









Validation of RUCK England vs. Japan I have selected two games to demonstrate and validate the capabilities of RUCK. The first is England vs. Japan on November 17th of 2018. One might say in this case that there is no prediction needed, but we want to confirm that RUCK can handle the obvious situations.

The expected outcome of this game, according to RUCK, was 62-23 in favor of England. England notches 8 tries to Japan's 2. The actual outcome of the game was 35-15 with 4 tries to England and 2 to Japan. In terms of predicting the winner, RUCK has passed the test but appears to have not done very well on the score. However, we can investigate the win probability to see the expected distribution of scores.



We find that in 1000 simulated games, England won 997 of them giving them an estimated with probability of 99.7%. Below is a plot of the scores:



The orange dot represents the actual outcome of the match. So in our simulations the actual outcome is some distance away from the main cluster, but it is not an altogether lonely point (In terms of distance from the center of the predictions, we do not have a statistically significant difference at the .01 level, but maybe at the .05 there is a difference). This suggests that the actual outcome of the game is not really outside of the predicted outcomes of the simulated game. However, we would like to see something closer in an ideal world.

This leads to an interesting discussion about what is "going wrong" here. It could be that our methodology is flawed or has been implemented inappropriately. Alternatively, it could be that the points ranking represent a poor predictor of match outcomes (particularly match score). Similarly, it could be that England simply underperformed given their ranking. Further analysis would be required to start to understand this, but from a statistical perspective, we can consider this to be a correct prediction on the part of RUCK.

Scotland vs. South Africa The next game took place on the same day and saw South Africa take of Scotland at Murrayfield. RUCK predicted an outcome of 22-18 in favor of Scotland. However, the actual outcome was 20-26 with South Africa taking the win. But before we go throwing RUCK in the virtual garbage, we should investigate the win probability and the distribution of predicted scores.

RUCK predicted a win probability of 76.8% in favor of Scotland. Below you will find the outcomes of the simulated games:

We find that the actual outcome (orange dot) is well within the range of the predicted outcomes, suggesting the RUCK has accurately estimated the outcome of this match.

Based on these validations, we can feel confident in RUCK's ability to predict the possible range of outcomes of games.

A Hypothetical Game: Wales vs New Zealand A key step at this point is to demonstrate and reiterate the limitations of this process. Indeed, anyone familiar with the general process of regression will likely already understand the some of the practices that should be avoided, or at least approached carefully. One of these is choice of points that we feel comfortable predicting outcomes at. The general rule is that we should not use a regression model to make predictions outside of the range of the data used to train the regression. We do this here with a specific example.



The #1 versus #2 is probably the best loved and best selling game in any sport. We love games where the two best, most skilled teams duke it out for supremacy. The storylines write themselves. To sate that need, we will now investigate what RUCK says about a match between the current best teams in World Rugby: Wales and New Zealand (NZ). The first question we need to consider is whether we should look at the game occurring in Wales or in NZ. Looking at the current rankings and applying the 3 point home team advantage we see that Wales would be at 91.96 and NZ would be at 92.54. Looking back at the data we have collected to build RUCK, we see that we actually do not have any data on two 90+ teams going at it. This means that there will be a large variance in the predictions, and hence decreases the ability of RUCK to produce a meaningful prediction. This is borne out in the figure below: RUCK gives a slight edge to Wales with a win probability of 51.1%.

In NZ, giving the 3 point advantage to NZ, puts us a bit closer to an existing data point, suggesting that we may get a more tighter set of predictions: NZ is given a win probability of 85.6%.

So, how do we improve our ability to make predictions for these teams? More data! This example highlights the need to add more data points to that upper corner of the rank space. But we must be careful what data we add; the added games need to be representative of the type of game that Wales and NZ would play today.

Finale: The Rugby Championship We close this post with by giving the results of 10,000 predictions of the outcome of the Rugby Championship (RC). To accomplish this, for each round of the tournament a single match result we produced for each of the two games in the round, rankings were updated according to the World Rugby formula, and a single realization was produced for the next round. This was done for the three rounds of the RC. There are usually six, but since it is a World Cup year, the schedule is shortened.



Here is the expected final table:



Unsurprisingly we have New Zealand winning handily. However, NZ is not assured of winning in every simulation. Here are each teams chances of winning the RC: New Zealand 79.07% Australia 14.53% Argentina 4.41% South Africa 2.01% It may seem odd that SA has the least chance of winning the RC, but you have to remember the data we used to train the regression. In the 2018 RC, SA did very well against NZ, but also split with Argentina and Australia. In our training data, the rank space occupied by SA shows a lot of variability in outcome, which means that in the simulations they are generally giving up a game or two to a lower team. It is not that SA is bad, it's just that they are inconsistent in the data.

To get a better sense of this I plotted histograms of each countries final position in the RC: Interestingly, we see that SA is essentially as likely to finish second as fourth. Australia seems destined for the middle of the table.

Epilogue: The World Cup and Conclusions It would seem then that simulating the World cup is the next logical step. However, the World Cup features some games between very disparate teams, which would make predicting the pool stages difficult. We could simulate the knockout stages after making assumptions about who would win their pool. Perhaps as the time approaches I may settle in and do it, after enhancing the data set a little bit.

So, are the rankings a good predictor of match outcomes? Maybe. There is some evidence here to suggest that, depending on the data use to train the regression, they can capture the uncertainty of outcomes in limited regions of the rank space. However, there are many things unaccounted for. for example, teams in a slump can have a misleading ranking and they work their way down. To avoid this, instead of using the most recent rankings, we could train the algorithm with the last five rankings of each team. This would help capture trends that could potentially improve our ability to predict outcomes. For the prediction of tournaments, we will chain a series of single game predictions together. So, for the first round will will sample a single outcome for each match in the round, tally points, and recompute the rankings, and then sample an outcome for each match of the second round, and so on until the end of the tournament. Repeating this 1000 times will give us a sense of the distributions of winners and runners up and so on.

Now, this seems like a pretty good assumption for temperature change over time, since we do not generally expect there to be large changes over time and thus is two time points are close together, then we expect to be similar, but the further apart they are, the more different they will become. Mathematically, we are insisting that the temperature at any finitely many time points have joint normal distribution with a correlation matrix that is derived from a correlation function for the Gaussian process.With these two assumptions in hand we can get two key outputs: the expected temperature at all points, and the variance in our prediction at these time pointsThe green line in the above plot represents the predicted temperature and the dashed blue line is a measure of the variance in that prediction. that is to say the green line in the expected temperature, and the blue line quantifies how much we could expect the actual temperature to deviate from the expected value.Note that the variance in our predictions at the times where we collected data is zero. This is because our prediction matched the actual temperature. Also note that the variance is larger in areas that are far away from the data. This means that in order to make quality predictions,Now, you aren't here to read about temperature, you want to know about Rugby. So let's move on to the interesting stuff.