Back in 2016, I introduced a new metric called Deserved Goals. This was an attempt to quantify the underlying skill of Premier League teams, and develop better predictions than the existing metrics.

I was pretty happy with it, and I have had some success using the metric to predict the Premier League, especially when combining it with other metrics. However, 3 years later I think I can make some improvements.

The original Deserved Goals used the number of shots taken by a team and their conversion rate of shots into goals, regressed towards the average. For shots taken, I kept 80% of the variance from the average, and for conversion rate I kept 46% of the variance.

Deserved Goals = ( Average Shots + 80% of the variance ) x ( Average Conversion + 46% of the variance )

I calculated 453 as being the average number of shots taken in a season, and 11.09% being the average conversion rate.

Deserved Goals = (453 + 80% x (Shots – 453)) x (11.09% + 46% x (Conversion Rate – 11.09%))

So a team which took 500 shots in a season and scored 70 goals, which is a 14% conversion rate, would have a Deserved Goals score of 61 goals.

Deserved Goals = (453 + 80% x (500-453)) x (11.09% + 46% x (14%-11.09%))

Deserved Goals = (453 + 80% x 47) x (11.09% + 46% x 2.91%)

Deserved Goals = (453 + 37.6) x (11.09% + 1.34%)



Deserved Goals = 490.6 x 12.43%



Deserved Goals = 61 goals



We would therefore expect 61 goals per season to be a better reflection of this team’s underlying attacking strength than the 70 goals they actually scored.

You can do the same calculation for goals against, work out a ratio, and use this as a metric.

Before we can start improving it, we need to quantify how good the original metric was. Using data from the 16/17, 17/18 and 18/19 Premier League seasons, we can see how well various metrics do at predicting future performance within a season.

Note: As the data is a bit messy, I have plotted a 5 point centred moving average to make things easier to interpret on all of the following charts. Also, higher is better on all charts.

Here are the results for the average errors (MAE) between predicted and actual future points per game (PPG) for each metric.

So, Deserved Goals 1.0 was pretty good. It picked up a signal quickly, and outperformed the other metrics (including Expected Goals) for the majority of the season.

Since I wrote my original blog post, a number of things have changed. Firstly, data for Expected Goals (xG) is now freely available from a number of sources. I have used FiveThirtyEight’s data for the above chart. This data was not available a few years ago.

Secondly, a second form of xG has been developed, called non-shot xG. Rather than using shots, it gives an xG value to each period of possession, meaning you get more meaningful data points quicker than using shot-based xG. Theoretically, this should give better predictions earlier in the season.

Indeed, this is what we see when we plot the non-shot xG on the chart.

Non-Shot xG is a much better predictor than any other metric early in the season, although it is still not as good as Deserved Goals 1.0 in the latter 2 thirds of the season.

Combining the 2 versions of xG is even more powerful. Simply taking an average of the Shot-based and Non-Shot xG figures improves performance, as seen below. This will be referred to as Combined xG.

OK, so now we’ve set the challenge to beat. I want Deserved Goals 2.0 to be as powerful in the early season as Combined xG, and I want to keep the strong performance in the second half of the season.

Here’s my thought process. The original formula was as follows:

Deserved Goals 1.0 = ( Average Shots + A% of the variance ) x ( Average Conversion + B% of the variance )

I still want to use shots as the starting point, and so the initial part of the formula remains unchanged. This gives us an estimate of how good a team is at creating shooting opportunities.

( Average Shots + A% of the variance )

I want to improve early season performance by using Combined xG, so next up is an adjustment to account for how good these shots are predicted to be. For this let’s use Combined xG divided by the number of shots, for which the average will be the same as the average conversion, 11.09%. As with all parts of the formula, we will only keep a percentage of the variance from the average. This gives us an estimate of how good a team is at ensuring their shots are taken from good locations:

( Average Combined xG per shot + B% of the variance )

We then have the old conversion rate, but rather than using shots we are using Shot-based xG, so this becomes the conversion of Expected Goals into goals, which on average should be 100%. This gives us an estimate of how good a team is at converting shots into goals, controlled for the quality of the chance. You might call this finishing skill:

( Average Goals per Shot-based xG + C% of the variance )

The formula is therefore:

Deserved Goals 2.0 = (( Average Shots + A% of the variance ) x ( Average Combined xG per Shot + B% of the variance )) x ( Average Goals per Shot based xG + C% of the variance )

I need to select values for A, B and C. These should be a good approximation of the extent to which the 3 components are skill rather than luck. In other words, how much of the variance from the average is signal rather than noise. We would expect the ability to create shots to be mostly signal, whereas finishing skill is notoriously “noisy”, so we would expect a low %.

To get a rough idea of what these should be, I have calculated how much these 3 components revert to the mean between seasons, using Pearson’s R (The CORREL function in Excel).

Here are the results:

So, just using these figures would mean A=74%, B=65% and C=13%. That’s a good starting point, however looking at season-to-season correlations is a bit misleading. Teams often change personnel between seasons, so I would expect the correlations to be higher than this within a season where personnel stays mostly the same.

Let’s increase each figure a bit, to A=90%, B=75%, and C=25%, and see how the metric performs.

The final formula is therefore:

Deserved Goals 2.0 = (( Average Shots+ 90% of the variance ) x ( Average Combined xG per Shot + 75% of the variance )) x ( Average Goals per Shot-Based xG + 25% of the variance )

or:

Deserved Goals = ((453 + 90% x (Shots – 453)) x (11.09% + 75% x (Combined xG per Shot – 11.09%))) x ( 100% + 25% x (Average Goals per Shot-Based xG – 100%))

OK, so let’s see how this metric gets on.

Previously the best 2 metrics were Combined xG and Deserved Goals 1.0. Here’s how Deserved Goals 2.0 compares to those:

I’m classing that as a success. Deserved Goals 2.0 is much better than the original version in the early part of the season, and is on a par with Combined xG. In the latter stages it outperforms Combines xG, and is almost as good as the original version. Overall, it is the best metric of all the ones I have tested so far.

I could probably tweak the values of A, B and C to improve the results, but I think there would be a risk of over-fitting to the data.

Another way of measuring the performance of predictive metrics is to use r^2 instead of average errors. This produces similar results:

If you enjoyed this post, please see part 2 here, where I develop this further.

If you have any questions, comments or suggestions, please let me know. I am on Twitter @8Yards8Feet

Data from: http://www.football-data.co.uk/ and https://projects.fivethirtyeight.com/soccer-predictions/