You're An Alpaca Shepherd Now You've finally achieved your lifelong dream: you're an alpaca shepherd. And like any alpaca shepherd will tell you, your foremost concern is the wool quality of your herd.*this may or may not be true



Word on the street in Cusco is that a popular new shampoo increases the wool quality of your alpaca. But you're no sucker - you're going to find out for sure. You're going to test the difference with statistics.



In statistical testing, we structure experiments in terms of null & alternative hypotheses. Our test will have the following hypothesis schema:



Η 0 : μ treatment <= μ control

Η A : μ treatment > μ control



Our null hypothesis claims that the new shampoo does not increase wool quality. The alternative hypothesis claims the opposite; new shampoo yields superior wool quality.





Randomization As a first step, we randomly assign half of our sampled alpaca to the new shampoo, and half to the old.



We say that the alpaca receiving the new shampoo belong to the treatment group, and the others to the control group. The assignment of an alpaca to a given diet is known as its treatment assignment.



Randomization of treatment assignment is very important. It removes bias and confunding from our results, and provides the basis for the theory underpinning our statistical test.

Response Values After giving each alpaca its designated shampoo, we determine if the new shampoo has any effect on wool quality.



In statistics jargon, every experimental unit has a response value. For us, each alpaca is an experimental unit, and its measure of wool quality after taking its shampoo is its response value.



We can eyeball these values ourselves and get a feel for any perceived differences between the two shampoos. However, , we'll need a more rigourous method to determine if the differences are statistically significant.

Test Statistic To determine whether or not the new shampoo really is effective, we need a way to quantify the difference between our null and alternative hypotheses.



Luckily for us, such a numerical summary exists: the test statistic.



A benefit of the permutation test is that it allows us to use any numerical value that we want for our test statistic.*many other tests require complex, specificc test statistics Because our analysis is fairly straightforward, we'll simply use the difference in mean response values between the two shampoos:



Test Statistic = μ Treatment - μ Control



To obtain our initial test statistic, we simply subtract the mean wool quality of the alpacas that used the new shampoo (treatment group) from the mean wool quality of the alpacas that did not use the new shampoo (control group).

The 'P' in 'Permutation' Enter the most important step of the permutation test, as well as its namesake.*It's also called the 'randomization test'



While keeping the same response values we received earlier, we permute (shuffle) the treatment assignments of our alpaca, and re-calculate our test statistic.



We do this because we analyze the results of our experiment relative to the null hypothesis, which posits the new shampoo as having no benefit on wool quality.



While this may seem a bit odd, the logic is quite simple: if the new shampoo truly doesn't improve wool quality, shuffling the shampoo label of our alpaca and recalculating our test statistic won't matter - we'll obtain similar wool quality values for both groups.



More Permutations We repeat this process, permuting our data over and over again, and recalculate a test statistic at each iteration.



Ideally, we'd calculate a test statistic for every possible permutation of shampoo assignment among our alpaca. This would create an exact distribution of all possible test statistics under our null hypothesis.



Unfortunatley, calculating every permutation is often far too large for practicality. No worries! Instead we'll resample enough permutations to build an approximation to our distribution, as that'll work just as well.





Test Statistic Distribution Eventually, after some sufficient number of permutations, we create the approximate test statistic distribution.



This distribution approximates all possible test statistic values we could have seen under the null hypothesis. We can then use this distribution to obtain probabilities associated with different mean-difference values*Or whatever calculation you used for your test statistic , where we assume that wool quality does not increase with the new shampoo.



By observing where our initial test statistic falls within this distribution, we obtain the final piece for our test: The magical p-value.

The P-Value A p-value represents the probability of obtaining the observed values, assuming the null hypothesis is true. For us, it's the probability of obtaining the differences in wool quality we did, assuming the new shampoo did not increase wool quality.



To determine the outcome of our test, we compare our p-value to a significance level. This should be determined a prioir, but we'll just say ours is 10%. If the p-value is less than or equal to the significance level, we reject the null hypothesis; the outcome is said to be statistically significant.



For us, a low p-value signals that, assuming the null hypothesis is true, the probability of obtaining our initial differences in wool quality occurs with a low probability. A high p-value signals the opposite, such an outcome is likely under the null hypothesis.





Our Results To calculate the p-value for a permutation test, we simply count the number of test-statistics as or more extreme than our initial test statistic, and divide that number by the total number of test-statistics we calculated.



In our case, only sixteen out of our two-hundred test statistics were as or more extreme than our initial test statistic.



Thus, our p-value is:



P-Value = 16 / 200



= 0.08



= 8%



In other words, if it's truly the case that the new shampoo doesn't improve wool quality, then obtaining the initial difference in wool quality we did occurs with a probability of only 8%.



That's a fairly low probability. In fact, at our 10% level of significance, we reject our null hypothesis and accept our alternative: the new shampoo does appear to be increasing wool quality. Time to buy some more!