It has been almost two years now since Jürgen Klopp introduced himself as ‘the Normal One’ at Liverpool, and looking back at his first full Premier League season it appears to be clear: He lied.

Trying to describe the 2016/2017 season, “normal” isn’t the first word that comes to mind when thinking of Liverpool.

Undefeated against any of the other top clubs, they somehow managed to lose six games against the rest. Liverpool was the only team to perform better against elite clubs than against teams of the lower part of the table.

It’s an unusual pattern that is probably best captured by Constantinos Chappas:

This graph shows the performance of every team (measured in points per game) against the Top 7 and bottom 13 teams of the Premier League, with Liverpool being the only team that crosses the ‘Equal PPG’-threshold.

So …what is going on in Liverpool? The internet is full of takes on what might have caused this bipolar performance. A quick google search shows that:

Gegenpressing isn’t useful against teams that play with a low defensive block

Liverpool doesn’t have the right mentality (of course)

Klopp doesn’t have a plan B

There is not enough depth on the bench

…and many other reasons that explain (some better than others) why exactly Liverpool underperformed against lower teams. I even found a blog post complaining that Liverpool’s attacking players are too short compared to the rather huge defenders of smaller Premier League clubs, which I thought was kind of funny. Especially when thinking of their new signing Mohamed Salah who should be, after he put his football shoes on, about 60cm tall (rough estimation on my part).

So, adding to this vast pool of opinions on Liverpool’s bipolar performance, here is my take on this:

There is nothing to explain.

(At least not yet.)

Incidence rates of kidney cancer in the U.S.

In his book Thinking, Fast and Slow Nobel Prize winner Daniel Kahneman cites a study on the incidence of kidney cancer:

Out of the 3,141 counties of the United States the counties with the lowest incidence of kidney cancer were mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West.

If you want to you can stop here for a moment and think about what might have caused these low rates (or skip to the next part if you already know this example).

Kahneman goes on to name a few explanations people usually come up with: People living in rural areas have access to fresh food without additives; there is no air pollution and no water pollution in rural areas which makes for a healthier living.

These are all explanations that sound plausible. And maybe you came up with some others?

“Now consider the counties in which the incidence of kidney cancer is highest. These ailing counties tend to be mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West.”

Hm. Okay.

The descriptions for the counties with the highest and lowest rates are identical. This is not a trick question by the way: the answer to the high and low rates is given in the description; it’s just not that obvious.

Again, the lifestyle that comes with living in a rural area offers some explanations as to why the incidence rates are so high: no access to good medical care, a high fat diet, too much alcohol or too much tobacco.

But the rural lifestyle can’t explain both, the highest and the lowest incidence rates. Something must be wrong with the logic behind this reasoning.

What causes the high and low incidence rates though (or better: what allows them to deviate so much from the mean) is the fact they are sparsely populated.

To understand what is going on, imagine an urn filled with marbles: half of them are red, half are white. Out of this urn, two people will take turns drawing marbles.

Person A draws four marbles every turn and Person B draws seven. Both record every time they observe that their marbles are either all white or all red. After enough samples are drawn, they will observe that A’s samples yield these extreme results 12.5% of the time, while B’s only in 1.56%.

We all “know” that large samples are generally more accurate than small samples. This is because, like with the marbles, small samples yield extreme results more often than large samples. But we somehow seem to overlook this when the facts drawn from small samples fit into a coherent story.

In the same way, counties with small populations are more likely to yield an extremely high or extremely low kidney cancer incidence rate compared to counties with large populations.

I found this interactive tableau graphic by Ben Jones here that illustrates the relationship between county population size and the incidence rates very well:

The fact that the counties are sparsely populated doesn’t cause nor prevent cancer. A small sample size only allows the incidence rates to be extremely high (or low). Any fact that is correlated with a small population (like the rural lifestyle) may make for a good and coherent story, but it only deflects from the deeper truth that there is nothing to explain.

Again, Kahneman:

“[…] the main lesson to be learned is not about epidemiology, it is about the difficult relationship between our mind and statistics. […] When told about the high-incidence counties, you immediately assumed that these counties are different from other counties for a reason, that there must be a cause that explains this difference.”

Kahneman argues that we pay more attention to the content of a fact than to information about its reliability. As a result of this, we are often left with a view of the world that is simpler (and more coherent) than it actually is.

Okay, now back to Liverpool

This study is a good example of our bias to be insensitive to small samples. It confronted us with a fact (the counties with the lowest kidney cancer incidence rates) and we constructed a story to explain it. Then it confronted us with another fact (the highest rates), which turned our story upside down and exposed it for what it was: A wrong causal explanation for something that was an event of chance alone.

But “the real world”, for the most parts, isn’t as tidy as a study taken from a book intended for statistics teachers.

At the beginning of this post I offered some causal explanations to the fact that Liverpool scored more points on average against elite clubs than against lower ranked teams.

Those explanations all sound plausible and some might point to real weaknesses in Liverpool’s squad or tactics.

But there is also the possibility that these explanations are just part of a narrative, making sense out of a fact that was due to chance alone.

But how can we distinguish between the two? That’s a tough question where there is probably no clear right or wrong answer.

We can, however, try to estimate the probability of any team ‘doing a Liverpool’ only by chance. To get the probabilities for this I have simulated the 2016/2017 league table a couple of thousand times using Pinnacles closing line odds. Closing line odds are the odds offered by bookmakers at the time a game starts. In theory, those should be the closest estimations of the true underlying probabilities for any game since they incorporate all available information.

We know by now of our bias to prefer causal explanations to statistical ones and having a probability estimate attached to ‘a Liverpool happening’ will at least give us some insight into how far we can trust causal explanations in this case.

But before we get to the results of this simulation we should answer two questions first:

Was Liverpool’s bipolar performance already incorporated in the closing line odds? How do we measure a bipolar performance?

Question 1.)

The first thing to do is to look if the closing line odds already indicated that Liverpool might underperform against smaller teams than normally expected.

There was a graphic distributed by Gracenote that I thought painted a nice and clear picture of what to expect from Liverpool in matches against big clubs and in all other matches. They compared Liverpool in big games to a team like Chelsea with a Euro Club Index of about 3500 and in all other games to a team like West Bromwich with a Euro Club Index of about 2450.

So, did the closing line odds reflect this in any way?

You may remember Liverpool losing at home against Swansea (2:3) in January 2017. Liverpool was the only team out of the top six that lost at home against Swansea that season.

But can you guess which set of closing line odds belonged to that game?

These are the odds for every Swansea game against West Bromwich and the top six Premier League Clubs (Arsenal, Spurs, Chelsea, both Manchesters and Liverpool).

It’s probably easy to see which odds belong to the West Bromwich games but we can’t really tell the other games apart from each other.

So, before the games started, the betting markets didn’t see Liverpool as a team comparable to West Bromwich at all:

Liverpool wasn’t even expected to be the worst-well-performing team (is that a thing?) out of the top six teams. You can download the closing line odds for all games of 16/17 and other seasons here.

All the reasons for Liverpool’s bipolar performance mentioned in the introduction to this blogpost didn’t seem to have much of an impact on the betting markets. This can be a first indicator that these reasons don’t have much explanatory power, but are part of a narrative that makes sense out of a weird random thing in hindsight.

It could also mean that the betting markets were wrong.

At the very least though it means that we can use the closing line odds to answer the question how probable it is for a team to “do a Liverpool” by chance alone.

Question 2.)

So, what constitutes a team “doing a Liverpool”? At the beginning of this post I used a graph that showed Liverpool’s points per game against the top seven and the rest of the league. But why cut the league in half at seven? Why not at eighth or ninth place?

For this blogpost, I will divide the Premier League into the top six clubs (Arsenal, Chelsea, Liverpool, Man City, Man United and Tottenham) and the rest of the league.

This is for one simple reason: If you ask a group of people for the top six clubs at the beginning of a season, you will get the usual answers. But if you ask for the 7, 8 or 9 best clubs, the answers will differ quite a lot.

For example, check out my prediction at the beginning of last season, where I had Everton at 9th place here (they finished 7th).

This is what the graph from the introduction looks like, if we divide the Premier League into these two categories:

Liverpool is still clearly set apart from the other teams, as it earned equal points per game against both groups.

And this will be how we will measure a bipolar performance: It occurs whenever a team scored at least as many points against the top six clubs as it did against the rest of the league.

The Conclusion

I have simulated the 16/17 season 10.000 times based on Pinnacles closing line odds. And in 4050 of those simulated seasons, at least one Premier League team (not just Liverpool) had what we can call a “bipolar season”.

I think it is fair to assume that this season there was an about 40% chance[1] to observe a bipolar pattern by chance alone. It just happened to be with Liverpool.

So, nothing to worry, Jürgen.

(At least not yet.)

1st EDIT:

[1] I have got some mentions that the 40% in this article seem very high. The 4050 seasons also incorporate those simulations where not all of the “top six” clubs ended up within places 1-6 of the league table. Using the closing line odds from this season, there was a ~43% chance that the “top six teams” will actually end up as the top six teams in the league table.

Adding this condition to our definition of a “bipolar season” (the top six teams also have to end up as the top six teams in the league table) , the overall probability of observing at least one team with a bipolar performance is ~17%.

2nd EDIT:

Now that the 17/18 has ended we can look at the points per game every team got against the top 6 and the bottom 14 and compare the results to Liverpools bipolar 16/17 season:

Liverpool got 1 point per game against the top six and ~2.3 points per game against the rest. At the bottom left corner you can see West Bromwich with ~0.83 points against the top six and ~0.81 points against the rest.

We are only looking at their performance in the Premier League though, and it should be noted that Liverpool got into the Championsleague final (they won against Porto, Man City and AS Roma to get there).