Isn’t it strange that the Chinese aren’t world champions in every single team sport? Here’s why it’s strange: China has 19% of the world’s population. For individual sports that may not be a huge deal: if tennis ability and opportunity are distributed equally around the world, there would be only a 19% chance that the best tennis player hails from China and 81% that he is Swiss, Serbian, Spanish, Scottish or from any other country. It is somewhat surprising seeing the top 5 superior servers and strikers of soft springy spheres with swings of stringed racquets all come from sovereign states that start with “S”, but that’s a separate story.

In team sports that should be different. If soccer talent was equally spread China should have on average 19 of the top 100 players in each generation, almost never less than 11. Countries like Spain, Germany and France on the other hand would expect to have 1 player in the top 100, maybe 2 or 3 if they’re lucky. That would be no match for the loaded Chinese squad. Even a top 3 player can’t dominate all by himself in a team-based sport like soccer, as evidenced by the below picture of sad Ronaldo.

And yet, the Chinese team is not good at soccer, and I’m putting that milder than some. The Chinese men’s national soccer team is ranked 84th in the world, a few spots below Antigua and Barbuda – a nation with a population of 90,000. That’s roughly equal to a single neighborhood in Shanghai. Motivation is often brought up as an explanation: perhaps the Chinese have the talent and opportunity to play soccer, but all 1.3 billion of them choose not to. Perhaps instead of playing soccer they choose to study. Those that play soccer the least and study the most can go into medicine, and those that study hardest of all and have no room for soccer make it into top medical schools in the US.

Certainly we don’t expect those Chinese to play soccer at all, and yet below is a group photo of the Emory University medical school soccer club. The summer I was there we played at least 4 hours a week. You can easily find me on the photo, I’m one of three non-Chinese people on the team.

The success of a national soccer team should depend on two factors: the pool of available players (population) and some combination of natural talent, infrastructure and opportunity that determine roughly how successful an average person in that country can be at soccer. I’ll call the combined second thing national soccer affinity, and will immediately note that it’s a huge simplification to throw so many disparate things into a single factor. My goal is to separate the effects of population, so affinity is basically everything that’s independent of a country’s total size. I am making no guesses regarding the components of soccer affinity (maybe it’s all about having enough sunshine days for kids to play outdoors), only in the comparison between countries. The question I want to investigate is:

Relative to their population, which countries are the best and worst at soccer? And why?

If we imagine that soccer affinity is normally distributed, a country’s population is the size of the bell curve and the national affinity is how far to the right on the ability axis the center of the bell curve is. The level of a country’s national team is how far on the ability axis the best 11 men and women are. Clearly, having a larger bell curve (more people at every level of play) and shifting the curve to the right (better players on average) should both contribute to boosting the level of the national team. The fact that there are over 15,000 Chinese for each Antiguan, and yet the soccer teams are comparable in level, presents the following puzzle:

Why does it seem that national team level depends on affinity much more than on population?

The answer to that puzzle is: Because the tails of a normal distribution fall much faster than you think.

In plain(er) English: every point on a bell curve is some distance away from the middle (the mean). The further away from the mean you go the less points there are (lower curve). These distances are often measured in standard deviations, or SD, shown by the vertical red lines on the picture. On a standard bell curve, just over 68% of the points are found a distance of less than 1 SD from the mean in either direction.

Looking naively at the familiar bell picture, it seems that the curve drops sharply over the first 2 or 3 SD to either side and then levels off around 0 when you move further away. That’s extremely misleading: the relative height of the curve actually drops faster the further out you go. It’s invisible on the chart because the line further than 3 SD out is squished very close to 0. The height of the curve at 1 SD is 4.5 times higher than that at 2 SD. The curve at 5 SD is 250 times higher than that at 6 SD and it keeps getting steeper and steeper.

The best male soccer player in China (Zheng Zhi?) is almost literally one in a billion, which means that he’s almost 6 standard deviation better than the average Chinese. If the population of China doubled (they’re working on it!), there would be 2 players as good as Zheng is. However, if the population of China became just one standard deviation better at soccer, there would be over 200 players at least as good, and a few dozen who are much better.

It could be that a normally distributed soccer skill model is wholly wrong, but it does seem to explain some of what we see in reality. For anything that’s distributed roughly like a bell curve, the quality of the best people in a large enough group (like a country) depends much more on small differences in the average level than on large differences in total population.

For illustration, let’s use the one trait that we can all agree is close to normally distributed and varies among nations: human height. The average Indian dude (sorry for the androcentrism, ladies, there’s just better data on male heights and male soccer teams) is 165 cm (5′ 5″) and there are roughly 630 million of them. The average Norwegian dude is 180 cm (5′ 11″) and there are 2.5 million. The standard deviation of male height is around 6 cm around the world. If heights were distributed in a perfect normal bell curve with those parameters they would look like:

As we plot them side by side, the Indian curve completely dwarfs the Norwegian one, even for pretty tall dudes. There are 9 Indians who are exactly 180 cm (5′ 11″) tall for every Norwegian. 5′ 11″ is tall, but not super tall. The higher mean effect only kicks in for the real outliers, so let’s zoom the above plot in to the really tall dudes.

Here, the picture reverses completely. There are 100 times as many Norwegians above 195 cm (6′ 4″) as there are Indians. Under a normal distribution assumption, the tallest Indian at 6′ 7″ would only match the 1,000th tallest Norwegian.

It’s important to remember that a normal bell curve is a very simplistic model, real life is messy, and Dharmendra Singh is 8′ 1″. Even inside the realm of mathematics, a normal distribution has narrower tails (the height drops faster as you get away from the mean) than most other widely used distributions that look sorta like a bell curve (like the student’s t or the gamma distributions). A normal model underestimates the number of outliers and overstates the importance of shifting the mean.

With that said, my main point stands: it should not surprise anyone that the achievement of extreme performers doesn’t strongly depend on the population of a country but does on the average. There doesn’t have to be something horribly wrong with China to account for its disappointing soccer team, they could be just a little bit to the left of other countries on national soccer affinity. We still don’t know what makes up soccer affinity, just that it’s enough to explain the disconnect between populations and team performance. With the math lesson behind us comes the fun part: in the next posts we’ll rank the world’s countries by average soccer affinity, throw a bunch of data at it to see what it correlates with, and see if can get any insight into what makes countries good or bad at soccer.