Nowadays its easy to forget that data science is not all about machine/deep learning.

While AI is awesome, data science is by majority a practice that exists to better understand real phenomenons.

Besides being a data scientist, I am also a sports fan.

One thing that drives me crazy is the false use of data and statistics in sports.

Very often you see irrelevant facts being made assumptions upon and players/teams being compared over very weak statistics.

It’s a while now, that I wanted to create a measure for comparing goals in a soccer match.

Counting who has the most goals, is just plain wrong.

A goal scored at the 90 minute when the scoreboard shows 1–1, is by far superior to a goal score in the same minute when leading 4–0.

I have put in a lot of time and effort in coming up with a way to measure the significance of a goal, to finally establish what I call Relative Goal Value v1.0 (referred by now as RGV1).

The elements RGV1 takes into considerations are:

1. Time the goal has been scored

2. Team the goal was scored against

3. Home / away goal

4. Current score of the game

I have chosen to not discriminate penalties.

In this post, I’ll explain about the RGV1 scoring system, and use it to compare Lionel Messi to Cristiano Ronaldo and the top 50 scorers (by RGV1) in the 5 major leagues.

RGV1 Scoring System (TL;DR)

Before we use RGV1 to compare player’s goal scoring, lets understand what it is about.

This is the TL;DR version, assuming most people reading this will not want to go into the equations, this part will explain the essence of the scoring system, at the end of the post you can find the full equations.

**Disclaimer: While RGV1 is proportional to the points won for the team, it has nothing to do directly with it. RGV1 DOES NOT measure how many points a player won for the team but rather calculates a sophisticated value of a goal.

The scoring is built in a following manner

The most important element and most complex being game state value.

The game state value, differs in range, depending on the current score and the time left to play.

When the game is tied, the value of a goal rises exponentially from 1 to 3, according to the minute of the game.

When leading, the value of a goal drops exponentially as time advances, and the range is dependent on by how much the team is leading by.

When trailing the score behaves like when leading, but in a smaller scale.

The logic behind the game state value is that:

— Goal scored on tie > goal scored when behind > goal scored when leading

— On a tie, the later the goal the higher the value (goal scored on tie in the 20' minute, is worth less than a goal scored on tie in the 90' minute)

— When leading, increasing the lead earlier is better

— When trailing, decreasing the opponents lead earlier is better.

Before deciding on these 4 points and their relativity to one another, I have consulted with many friends, some field experts in order to be as accurate as possible.

Below is a plot of the game state value:

Then, the game state value is multiplied by the team quality multiplier, which ranges from 0.68~ to 1, depending on the standings of the opponent team in the end of the season (a measure of team quality).

And finally this is multiplied by 1/0.9, depending if it was an away/home goal.

A perfect 3 score, will be achieved when scoring a winning goal on the 90 minute in an away game against the team who finished the season in the top spot.

The lowest score possible, will be achieved when scoring a goal, when leading by 3+ in the 90 minute against the team that finished the season last.

Before we go on to the comparison, some examples of scores:

1. In La Liga, 2016–2017 season, the goal with the highest score is Lionel Messi’s goal at the Bernabeu, when the game was tied 2–2 at the 92th minute (Score of perfect 3)

2. In La Liga, 2016–2017 season, the goal with the lowest score is Tiago’s goal for Atletico Madrid against Granada at home when leading 6–1, at the 87th minute (Score of 0.231)

Examining 2009–2016 in La Liga, below is the distributions of all the RGV1 scores for all players

Messi vs Ronaldo

Now lets get to the interesting part.

A lot has been talked about this two, and while in other areas of the game it is quite clear in each area who is best, their goal scoring is constantly compared.

The data we’ll be comparing on are only on La Liga’s goals, from the year 2009 (when Ronaldo arrived at Real Madrid).

First, lets see how their overall RGV1 distribution looks like

Well, not so surprising… In numbers these plot is (Messi/Ronaldo)

Mean: 0.950 / 0.943 (higher is better)

Standard deviation: 0.547 / 0.485

25 percentile: 0.461 / 0.578

50 percentile: 0.854 / 0.861

75 percentile: 1.232 / 1.246

Minimum: 0.226 / 0.233

Maximum: 3.000 / 2.855

Looking at Ronaldo’s and Messi’s most important goals (maximum RGV1), interestingly, both happened in April, one year apart.

Messi, the winning goal in the 92 minute at the Bernabeu, when the game was tied 2–2 against Real Madrid, which won the league title that season.

Ronaldo, the winning goal in the 85 minute at the Camp Nou when the game was tied 1–1 against Barcelona, which won the league title that season.

Moving forward, lets see what was their overall contribution, meaning sum of all RGV1 from 2009 to 2016

Messi has scored a total of 271.629 RGV1 and Ronaldo a total of 260.228, Messi in 266 appearances and Ronaldo in 254, making Messi’s average RGV1 per appearance 1.021 and Ronaldo’s 1.024.

Let’s try to look now at the RGV1 per season, starting with the total RGV1 per season.

Interesting to see in the graph is that the leader of each year splits evenly between them, each one taking the top spot for 4 seasons.

Now, tempting to look at is the average RGV1 per season.

But the truth is that this is a bad metric, since if the two had scored the same goals exactly, but one of them scored an extra goal with a low value, he would have a worse average even though he performed better.

Instead we would look at ‘fixed average’ which would be the total RGV1, divided by the average goal count of the both in the same season.

Here also we can see that the lead changes are equal and Ronaldo displays better stability throughout the years while Messi’s peak performance outperforms Ronaldo’s.

Since the most critical aspect of RGV1 scoring is the game state value lets see how the goals distribute between different game states per player and the minutes.

First, by the scoreboard status

Simply amazing to see, that across 8 seasons, Messi and Cristiano has an equal amount of goals scored when 1 behind and when the game is tied.

Notice that they both score when the game is tied more than any other score situation, which tells a lot to their contribution to their teams at the most important point of the game.

Now lets look how they distribute their goals across minutes:

Here we can see that Ronaldo’s distribution is quite uniform, while Messi prefers the second half.

I have to say, that when I started with this project, I knew they were both phenomenal goal scorers, but I had hopes to see one that will stand out.

As the data tells us, there is no much difference between the two and the mystery of who’s the better goal scorer is left unsolved….

But how do they stack against the rest of the goal scorers?

Messi and Ronaldo Against the World

Without further due, lets look at the totals, of the top 50 RGV1 ranked scorers in the period between 2009–2010 -> 2016–2017