So, how does one go about solving this kind of problems?

Well, first of all, let’s interview our first applicant. No matter their score, we definitely cannot pick them: whether it is 100, 1,000, 10,000, if we cannot compare their score to anyone else’s, the score is meaningless. This is a forced rejection.

Then, we get to the second applicant: their score, now, is comparable to the previous applicant, and we can at least know who’s better and who’s worse. However, again, it doesn’t matter how much better the second applicant is (ten times the first one? a hundred times?), we still have no idea whether the second one is a hundred times better because the first one was incredibly bad or because they REALLY are that good. So this is a forced rejection, too.

With the third applicant, we can now start understanding how the scoring works: sure, all of the applicants until now could be outliers, much better or much worse than the average, but the more applicants we interview, the less it is probable for all of them to be outliers.

This sparks an idea: what if we interviewed a ton of them to estimate what the average applicant looks like and then hire someone who’s better than average? This feels like a great idea.

So, many of us know what “average” means: if we sum the scores of all the applicants and then divide by the number of the applicants what we get is the average score. To be precise, if we interviewed EVERY secretary all over the world, including those who didn’t apply, we’d get the average score on our test for all the secretaries. However, the applicants are only a subset of all the secretaries in the world, so what we get is an empirical estimation of the average score. This is a mouthful, so we’ll just say “average”.

Now it’s time for “what-if”s: what if every secretary scores between 39 and 41 with an average of 40? Is that different from scoring between 20 and 60 with an average of 40?

Well, let’s say you got the first scenario. If someone showed up with a score of 42 they’d be your best applicant yet and you’d hire them with your eyes closed, right?

In the second scenario, however, many of those you already interviewed are a lot better, almost twice as much, so you’d really be hiring a mediocre, “average” secretary.

What is the difference between those scenarios? Intuitively, you know that the average for the first two cases is the same, but the spread is different, so it’s not as awesome to be 2 points better than average in a case as much as it is in the other.

Enter Gaussian Distributions.

Gaussian Distributions (or Normal Distributions) are a really useful model that can be used to approximate many statistical distributions. There are great articles around the internet that explain Gaussian Distributions, but, roughly speaking: a Gaussian Distribution is a set of probability values such that you can control the average value of the samples as well as how far they usually are from the average.

I’m sure some examples will explain it much better:

Here, I “interviewed” 100 people. The average (mu in the title of the plot) is a bit higher than 40, but close enough. std, instead, is called the standard deviation, and it means “how much is it normal to deviate from the average?”. Again, it’s better explained with an example:

This time, the standard deviation is 9.48, which means that the values are more “spread out” than the case from before. If those were results of a test and you scored 50, that’d be better than average, but actually not impressive. If, in the previous example, you scored 50, you’d be way better than everyone who took the exam before you, so that’s great!

So, if you have N applicants and you interview M of them to estimate a Gaussian Distribution (by finding mu and std), you can immediately know how good a new applicant is in relation to the others. On the other hand, if M is too small, the same test can give very different values of mu and std.

For example, these two tests are the same test (generated with mu = 40 and std = 2.5), but the results were quite different (42, 2 as opposed to 40, 3.15).

So, the real question is “how do I pick M?”.

Ideally, M = N would be the best approximation you can afford. However, if you do that, you’d have rejected all the applicants and hired no one, possibly getting you fired.

For small values of M, instead, you may pick sub-optimal applicants.