In a general sense, hypothesis tests are just what they sound like! You have an idea — a hypothesis — and you’re going out into the world, gathering information and using it to see how well your hypothesis fits the world. Scientific Method 101.

If you’re going to evaluate your hypothesis statistically, then you’ll probably use some sort of hypothesis test like a z-test, t-test, or F-test (with an ANOVA, for example). These all have their details and nuances but it comes down to this: how much variation does you hypothesis/model account for versus a model where groups are just random(a great metaphor is that the null hypothesis is like a random number generator that just spits out data from a population where there is truly no difference*)? For z, t and F statistics we look at a ratio with the variation our groups account for on the top and random variation on the bottom. That way the larger the number is, the more variation our groups can explain compared to chance.

General form of z, t, and F values

Cough Medicine

Say I give some people Cough-Be-Gone and some people get Hack-A-Way. I think that people who take CBG will cough less per hour than people who took HAW. So I measure the coughs per hour or each group. People who took CBG coughed 25 times an hour, and people who took HAW coughed 37 times per hour. So the difference we observed (H-C) is 12.

The null hypothesis here is that there’d be no difference in the number of coughs between groups with CBG and HAW, and if this were true we’d expect a difference of zero since the two groups would not be different in the population. But we didn’t measure the whole population, we just took a sample, so we may — by chance — get a sample where the two groups have a different mean even though in the population they do not. However, we expect the distribution of all possible differences between groups to be approximately normal due to the Central Limit Theorem.

So is the difference we saw due to a true difference or just random chance? What we really truly want to know is what is the probability that there is no difference between the groups given this data I collected? In more math-y notation we write this as P(H0|data).

Unfortunately that’s not what these tests tell you. They pretend that the null hypothesis is true and then give you a p-value tells you how likely you would be to get data this extreme — or even more extreme: P(data|H0). We give evidence to our hypothesis (there is a difference) by assuming that the opposite and showing that if that were true it would result in our data being pretty weird.

Null Distribution

So, we set a limit for our p-value** (aka how weird would our data need to be under the null hypothesis before we go “eh, it’s probably not from the null distribution”?).

The idea is that if the data is pretty rare in the null distribution (right of the vertical line), we think it’s safe to say it probably comes from the alternative distribution (Ha) which is usually our hypothesis.

One thing we should note is that that limit is completely arbitrary; we could pick any number we want, but in most social sciences we use a=0.05 as our cut off. This means that if there is only a 5% chance of getting data where the difference between the coughs of the two groups is as extreme (= 12) or more extreme (>12) than our data if the null is true, then we feel like that’s sufficiently rare and we figure that the data probably didn’t come from the null distribution and we reject the null hypothesis. If the p-value was larger than a = 0.05, then we would fail to reject the null.