On Variance

Joe is cheery after an intense semester at his college. He is meeting Devine today for a casual conversation. We all know that their casual conversation always turns into something interesting. Are we in for a new concept today?

Devine: So, how did you fare in your exams.

Joe: Hmm, I did okay, but, interestingly, you are asking me about my performance in exams and not what I learned in my classes.

Devine: Well, Joe, these days, the college prepares you to be a good test taker. Learning is a thing of the past. I am glad you are still learning in your classes.

Joe: That is true to a large extent. We have exams after exams after exams, and our minds are compartmentalized to regurgitate one module after the other — no time to sit back and absorb what we see in classes.

By the way, I heard of an intriguing phenomenon from one of my professors. It might be of interest to you.

Devine: What is it.

Joe: In his eons of teaching, he has observed that the standard deviation of his class’s performance is 16.5 points. He told me that over the years, this had fed back into his ways of preparing exams. It seems that he subconsciously designs exams where the students’ grades will have a standard deviation of 16.5.

Devine: That is indeed an interesting phenomenon. Do you want to verify his hypothesis?

Joe: How can we do that?

Devine: Assuming that his test scores are normally distributed, we can conduct a hypothesis test on the variance of the distribution —

Joe: Using a hypothesis testing framework?

Devine: Yes. Let’s first outline a null and alternate hypothesis. Since your professor is claiming that his exams are subconsciously designed for a standard deviation of 16.5, we will establish that this is the null hypothesis.

We can falsify this claim if the standard deviation is greater than or less than 16.5, i.e.,

The alternate hypothesis is two-sided. Deviation in either direction (less than or greater than) will reject the null hypothesis.

Would you be able to get some data on his recent exam scores?

Joe: I think I can ask some of my friends and get a sample of up to ten scores. Let me make some calls.

Here is a sample of ten exam scores from our most recent test with him.

60, 41, 70, 61, 69, 95, 33, 34, 82, 82

Devine: Fantastic. We can compute the standard deviation/variance from this sample and verify our hypothesis — whether this data provides evidence for the rejection of the null hypothesis.

Joe: Over the past few weeks, I was learning that we call it a parametric hypothesis test if we know the limiting form of the null distribution. I already know that we are doing a one-sample hypothesis test, but how do we know the type of the null distribution?

Devine: The sample variance ( ) is a random variable that can be described using a probability distribution. Several weeks ago, in lesson 73 where we derived the T-distribution, and in lesson 75 where we derived the confidence interval of the variance, we learned that follows a Chi-square distribution with degrees of freedom.

Since it was more than ten lessons ago, let’s go through the derivation once again. Ofttimes, repetition helps reinforce the ideas.

Joe: I think I remember it vaguely. Let me take a shot at the derivation 🙂

I will start with the equation of the sample variance .

I will move the term over to the left-hand side and do some algebra.

Let me divide both sides of the equation by .

The right-hand side now is the sum of squared standard normal distributions — assuming are draws from a normal distribution.

Sum of squares of standard normal random variables.

We learned in lesson 53 that if there are n standard normal random variables, , their sum of squares is a Chi-square distribution with n degrees of freedom. Its probability density function is for and 0 otherwise.

Since we have

follows a Chi-square distribution with degrees of freedom.

with a probability distribution function

Depending on the degrees of freedom, the distribution of looks like this.

Smaller sample sizes imply lower degrees of freedom. The distribution will be highly skewed; asymmetric.

Larger sample sizes or higher degrees of freedom will tend the distribution to symmetry.

Devine: Excellent job, Joe. As you have shown is our test statistic, , which we will verify against a Chi-square distribution with degrees of freedom.

Have you already decided on a rejection rate ?

Joe: I will go with a 5% Type I error. If my professor’s assumption is indeed true, I am willing to commit a 5% error in my decision-making as I may get a sample from my friends that drives me to reject his null hypothesis.

Devine: Okay. Let’s then compute the test statistic.