I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the law of frequency of error. The law would have been personified by the Greeks if they had known of it. It reigns with serenity and complete self-effacement amidst the wildest confusion. The larger the mob, the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of unreason.” — Sir Francis Galton on the Central Limit Theorem

The Central Limit Theorem (CLT) is important. It’s the reason you can use statistical tests on imperfect populations; it’s why researchers want large sample sizes, and — I would argue — it’s one of the most beautiful concepts in statistics.

The CLT describes the fact that if you take the mean of a group drawn from some population and plot it on a graph then repeat that again and again from that population, eventually you will have a normal distribution of sample means that should center around the true population mean. And the larger your sample is, the faster the distribution of means you’re creating will start to look normal (this also depends on the shape of the distribution you’re sampling from).

The cool thing about this is that this will happen no matter what the population you’re drawing from looks like*. Skewed, bi-modal, uniform, normal…It all ends up looking normal when you start taking sample means.

Annual Income (skewed)

This makes sense. Pretend that you are taking samples of 10 people and averaging their income. The distribution of annual income is heavily skewed: most people make somewhere between 17k and 170k a year and a few make a lot more:

Maybe something like this. Ceci n’est pas un chapeau.

But choose your 10 people and their average income is very unlikely to be extremely high. What are the chances of picking Bill Gates, Elon Musk, Warren Buffet, Oprah and 6 of the other 1% in your sample?When you keep taking samples, they’re going to tend to be more moderate because its more likely that you’ll get a mix of people in your sample than all the wealthiest or all the poorest people in one sample of 10. This makes extreme values less likely and moderate values more likely.

Now imagine you take a sample of 100 instead. It’s even harder to get extreme values because now you’d need 100 of the wealthiest people in the US in your sample to get that tail-end, extreme value.

Rolling Dice (uniform)

This will even work with uniform distributions. Try it by rolling a die. If you just roll once, you’re equally likely to get a 1, as a 2, as a 3…etc:

But when we take the mean of 3 rolls, we’re less likely to get all 6’s or all 1’s than a roll where the mean is 4 [6,4,2],[4,4,4],[3,4,5] . The larger your sample the less likely you are to get extreme values. Getting three 6’s in a row is not too unlikely, but how about 1000? 1 million?

Researchers use this to their advantage by collecting as much data as they can. The larger their sample size, the less likely they are to get mostly or all extreme values. Eventually it all looks like this:

the beautiful normal distribution

The values we get from statistical tests usually assume that you have a normal-ish distribution. But in real life not all data are like this. So how can we even do statistics? Central Limit Theorem to the rescue. When we take samples and compare means, we’re really doing tests on the distribution of sample means (which is all possible means you could get from sampling n people from a population), and as you saw, these distributions tend to be normal no matter what they looked like originally.