On the fifteenth day of July 2018, Jenny and Joe discussed the confidence interval for the mean of the population.

The % confidence interval for the true mean is .

is the mean of a random sample of size n. The assumption is that the sample is drawn from a population with a true mean and true standard deviation .

The end-points are called the lower and the upper confidence limits.

While developing the confidence interval of the mean water quality at the beach with a sample of 48, Jenny pointed out that the value for the true standard deviation, is not known.

Joe collected a much larger sample (all data points available for the Rockaway beach) and computed an estimate for the true standard deviation. Based on the principle of consistency, he suggested that this estimate is close to the truth, so can be used as .

Jenny rightly pointed out that not always we will have such a large sample. Most often, the data is limited.

Joe’s suggestion was to use sample standard deviation, s, i.e., the estimated standard deviation from the limited sample in place of . However, Jenny was concerned that this will introduce more error into the estimation of the intervals.

March 1908

This was a concern for W. S. Gosset (aka “Student”) in 1908. He points out that one way of dealing with this difficulty is to run the experiment many times, i.e., collect a large enough sample so that the standard deviation can be computed once for all and used for subsequent similar experiments.

He further points out that there are numerous experiments that cannot easily be repeated often. In the situation where a large sample cannot be obtained, one has to rely on the results from the small sample.

The standard deviation of the population ( ) is not known a priori.

The confidence interval Joe and Jenny derived is based on the conjecture that the distribution of the sample mean ( ) tends to a normal distribution. , and if the sample size is large enough, it would be reasonable to substitute sample standard deviation s in place of the population standard deviation . But it is not clear how “large” the sample size should be.

The sample standard deviation can be computed as . While s is a perfectly good estimate of , it is not equal to .

“Student” pointed out that when we substitute s for , we cannot just assume that will tend to a normal distribution just like .

He derived the frequency distribution of in his landmark paper “The Probable Error of a Mean.”

This distribution came to be known as the Student’s t-distribution.

In his paper, he did not use the notation t though. He referred to it as quantity z, obtained by dividing the distance between the mean of a sample and the mean of the population by the standard deviation of the sample.

He derived the distribution of the sample variance and the sample standard deviation, showed that there is no dependence between s and and used this property to derive the joint distribution of .

Today, we will go through the process of deriving the probability distributions for the sample variance , sample standard deviation s and the quantity t.

It is important that we know these distributions. They will be a recurring phenomenon from now on and we will be using them in many applications.

I am presenting these derivations using standard techniques. There may be simpler ways to derive them, but I found this step by step thought and derivation process enriching.

During this phase, I will refer back to “Student’s” paper several times. I will also use the explanations given by R.A Fisher in his papers on “Student.”

You can follow along these steps using a pen and paper, or you can just focus on the thought process and skip the derivations, either way, it is fun to learn from “Student.” Trust me.

To know the distribution of t, we should know the distributions of and s.

We already know that tends to a normal distribution; , but what is the limiting distribution of s, i.e., what is the probability distribution function f(s) of the sample standard deviation?

To know this, we should first know the limiting distribution of the sample variance , from which we can derive the distribution of s.

What is the frequency distribution of the sample variance?

Expressing variance as the sum of squares of normal random variables.

Let’s take the equation of the sample variance and see if there is a pattern in it.

Move the over to the left-hand side and do some algebra.

Let’s divide both sides of the equation by .

The right-hand side now looks like the sum of squared standard normal distributions.

Sum of squares of (n – 1) standard normal random variables.

Does that ring a bell? Sum of squares is the language of the Chi-square distribution. We learned this in lesson 53.

If there are n standard normal random variables, , their sum of squares is a Chi-square distribution with n degrees of freedom. Its probability density function is for and 0 otherwise.

Sine we have

follows a Chi-square distribution with (n-1) degrees of freedom.

with a probability distribution function

⇔

The frequency distribution of the sample variance.

It turns out that, with some modification, this equation is the frequency distribution of the sample variance.

The above equation can be viewed as where and .

These few steps will clarify this further.

Let

Applying the fundamental theorem of calculus and chain rule together, we get,

Hence, the frequency distribution of is

From , we can derive the probability distribution of s, i.e., .

WHAT IS THE FREQUENCY DISTRIBUTION OF THE SAMPLE Standard Deviation?

Here, I refer you to this elegant approach by “Student.”

“The distribution of s may be found from this ( ), since the frequency of s is equal to that of and all that we must do is to compress the base line suitably.”

Let be the distribution of .

Let be the distribution of s.

Since the frequency of is equal to that of s, we can assume,

In other words, the probability of finding in an infinitesimal interval is equal to the probability of finding s in an infinitesimal interval .

We can reduce this using substitution as

or

The frequency distribution of s is equal to 2s multiplied by the frequency distribution of .

Hence, the frequency distribution of s is

WHAT IS THE FREQUENCY DISTRIBUTION OF t?

We are now ready to derive the frequency distribution of t.

Some of the following explanation can be found in R. A. Fisher’s 1939 paper. I broke it down step by step for our classroom.

For a given value of s,

The frequency distribution of is

Remember, .

Substituting

The distribution becomes

or

For a given value of s, the probability that will be in is

Substituting , we can get

Fisher points out that for all values of s, we can substitute the frequency distribution of s and integrate it over the interval 0 and . This can be done because and s are independent.

So the joint distribution becomes

In the next few steps, I will rearrange some terms to convert the integral into a recognizable form.

Hang in there. We will need some more concentration!

Let

The equation becomes

Some more substitutions.

Let

Then

The equation can be reduced as

Since

The equation becomes,

Replacing

The equation becomes

Some more reduction …

The integral you see on the right is a Gamma function that is equal to for positive integers.

There we go, the t-distribution emerges

The probability distribution of t is

It is defined as t-distribution with (n-1) degrees of freedom. As you can see, the function only contains n as a parameter. The probability of t within any limits is fully known if we know n, the sample size of the experiment.

“Student” also derived the moments of this new distribution as

The function is symmetric and resembles the standard normal distribution Z.

The t-distribution has heavier tails, i.e. it has more probability in the tails than the normal distribution. As the sample size increases (i.e., as ) the t-distribution approaches Z.

You can look at the equation and check out how it converges to Z in the limit when .

These points are illustrated in this animation.

I am showing the standard Z in red color. The grey color functions are the t-distributions for different values of n. For small sample sizes, the t-distribution has fatter tails — as the sample size increases, there is little difference between T and Z.

😫😫😫

I am pretty sure you do not have the energy to go any further for this week. I don’t too.

So here is where we stand.

and finally The probability distribution of t is See you next week.

If you find this useful, please like, share and subscribe.

You can also follow me on Twitter @realDevineni for updates on new lessons.