On Proportion

Our journey to the abyss of hypothesis tests begins today. In lesson 85, Joe and Devine, in their casual conversation about the success rate of a memory-boosting mocha, introduced us to the elements of hypothesis testing. Their conversation presented a statistical hypothesis test on proportion — whether the percentage of people who would benefit from the memory-booster coffee is higher than the percentage who would claim benefit randomly.

In this lesson, using a similar example on proportion, we will dig deeper into the elements of hypothesis testing.

To reiterate the central concept, we wish to test our assumption (null hypothesis ) against an alternate assumption (alternate hypothesis ). The purpose of a hypothesis test, then, is to verify whether empirical data supports the rejection of the null hypothesis.

Let’s assume that there is a vacation planner company in Palm Beach, Florida. They are finishing up their new Paradise Resorts and advertised that this Paradise Resorts’ main attraction is it’s five out of seven bright and sunny days!

.

.

.

I know what you are thinking. It’s Florida, and five out of seven bright and sunny? What about the muggy thunderstorms and summer hurricanes?

Let’s keep that skepticism and consider their claim as a proposition, a hypothesis.

Since they claim that their resorts will have five out of seven bright and sunny days, we can assume a null hypothesis ( ) that . We can pit this against an alternate hypothesis ( ) that and use observational (experimental or empirical) data to verify whether can be rejected.

We can go down to Palm Beach and observe the weather for a few days. Or, we may have been to Palm Beach enough number of times that we can bring that empirical data out of our memory. Suppose we observe or remember that seven out of 15 days, we had bright and sunny days.

With this information, we are ready to investigate Paradise Resorts’ claim.

Let’s refresh our memory on the essential steps for any hypothesis test.

1. Choose the appropriate test; one-sample or two-sample and parametric or nonparametric.



2. Establish the null and alternate hypothesis.



3. Decide on an acceptable rate of error or rejection rate ( ).



4. Compute the test statistic and its corresponding p-value from the observed data.



5. Make the decision; Reject the null hypothesis if the p-value is less than the acceptable rate of error, .

Choose the appropriate test; one-sample or two-sample and parametric or nonparametric



We are verifying a statement about the parameter (proportion, p) of the population — whether or not . So it is a one-sample hypothesis test. Since we are testing for proportion, we can assume a binomial distribution to derive the probabilities. So it is a parametric hypothesis test.

Establish the null and alternate hypothesis



Paradise Resorts’ claim is the null hypothesis — five out of seven bright and sunny days. The alternate hypothesis is that the proportion is less than what they claim.

We are considering a one-sided alternative since departures in one direction (less than) are sufficient to reject the null hypothesis.

Decide on an acceptable rate of error or rejection rate

Our decision on the acceptable rate of rejection is the risk we take for rejecting the truth. If we select 10% for , it implies that we are rejecting the null hypothesis 10% of the times. If the null hypothesis is true, by rejecting it, we are committing an error — Type I error.

A simple thought exercise will make this concept more clear. Suppose Paradise Resorts’ claim is true — the proportion of bright and sunny days is . But, our observation provided a sample out of the population where we ended up seeing very few bright and sunny days. In this case, we have to reject the null hypothesis. We committed an error in our decision. By selecting , we are choosing the acceptable rate of error. We are accepting that we might reject the null hypothesis (when it is true), of the time.

The next step is to create the null distribution.

At the beginning of the test, we agreed that we observed seven out of 15 days to be bright and sunny. We collected a sample of 15 days out of which seven days were bright and sunny. The null distribution is the probability distribution of observing any number of days being bright and sunny, i.e., out of the 15 days, we could have had 0, 1, 2, 3, …, 14, 15 days to be bright and sunny. The null distribution is the distribution of the probability of observing these outcomes. In a Binomial null distribution with n=15 and p = 5/7, what is the probability of getting 0, 1, 2, …, 15?

.

.

.



It will look like this.

On this null distribution, you also see the region of rejection as defined by the selected rejection rate . Here, . In this null distribution, the quantile corresponding to is 8 days. Hence, if we observe more than eight bright and sunny days, we are not in the rejection region, and, if we observe eight or less bright and sunny days, we are in the rejection region.

Compute the test statistic and its corresponding p-value from the observed data

Next, the question we ask is this.

In a Binomial null distribution with n = 15 and p = 5/7, what is the probability of getting a value that is as large as 7? If the value has a sufficiently low probability, we cannot say that it may occur by chance.

This probability is called the p-value. It is the probability of obtaining the computed test statistic under the null hypothesis. The smaller the p-value, the less likely the observed statistic under the null hypothesis – and stronger evidence of rejecting the null.

You can see this probability in the figure below. The grey shade within the pink shade is the p-value.

Make the decision; Reject the null hypothesis if the p-value is less than the acceptable rate of error

It is evident at this point. Since the p-value (0.04) is less than our selected rate of error (0.05), we reject the null hypothesis, i.e., we reject Paradise Resorts’ claim that there will be five out of seven bright and sunny days.

This decision is based on the assumption that the null hypothesis is correct. Under this assumption, since we selected , we will reject the true null hypothesis 10% of the time. At the same time, we will fail to reject the null hypothesis 90% of the time. In other words, 90% of the time, our decision to not reject the null hypothesis will be correct.

Now, suppose Paradise Resorts’ hypothesis is false, i.e., they mistakenly think that there are five out of the seven bright and sunny days. However, it is not five in seven, but four in seven. What would be the consequence of their false null hypothesis?

.

.

.

Let’s think this through again.

Our testing framework is based on the assumption that

For this test, we select and make decisions based on the observed outcomes.

Accordingly, if we observe eight or less bright and sunny days, we will reject the hypothesis, and, if we see more than eight bright and sunny days, we will fail to reject the null hypothesis. Based on and the assumed hypothesis that , we fix eight as out cutoff point.

Paradise also thinks that . If they are under a false assumption and we tested it based on this assumption, we might also commit an error — not rejecting the null hypothesis when it is false. This error is called Type II error or the lack of power in the test.

Look at this image. It has the null hypothesis under our original assumption and the selected and its corresponding quantile — 8 days. In the same image, we also see the null distribution if . On this null distribution, there is a grey shaded region, which is the probability of not rejecting it based on and quantile — 8 days. We assign a symbol for this probability.

What is more interesting is its complement, , which is the probability of rejecting the null hypothesis when it is false. Based on our original assumption (which is false), we selected eight days or less as our rejection region. At this cutoff, if there was another null distribution, is the probability of rejecting it. The key is the choice of or its corresponding quantile. At a chosen , measures the ability of the test to reject a false hypothesis. is called the power of the test.

In this example, if the original hypothesis is true, i.e., if is true, we will reject it 10% of the time and will not reject it 90% of the time. However, if the hypothesis is false (and ), we will reject it 48% of the time and will not reject it 52% of the time.

For smaller p, the power of the test increases. In other words, if the proportion of bright and sunny days is smaller compared to the original assumption of 5/7, the probability of rejecting it increases.

Keep in mind that we will not know the actual value of p.

It is a thought that as the difference becomes larger, the original hypothesis is more and more false, and power ( ) is a measure of the probability of rejecting this false hypothesis due to our choice of .

Look at this summary table. It provides a summary of our discussion of the error framework.

Type I and Type II errors are inversely related.

If we decrease , and if the null hypothesis is false, the probability of not rejecting it ( ) will increase.

You can intuitively see that from the image that has the original (false) null distribution and possible true null distribution. If we move the quantile to the left (lower the rejection rate ), the grey shaded region (probability of not rejecting a false null hypothesis, ( ) increases.

At this point, you must be wondering that all of this is only for a sample of 15 days. What if we had more or fewer samples from the population?

The easiest way to understand the effect of sample size is to run the analysis for different n and different falsities (i.e., the difference from original p) and visualize it.

Here is one such analysis for three different sample sizes. The level that will be fixed based on the original hypothesis also varies by the sample size.

What we are seeing is the power function on the y-axis and the degree of falsity on the x-axis.

A higher degree of falsity implies that the null hypothesis is false by a greater magnitude. The first point on the x-axis is the fact that the null hypothesis is true. You can see that at this point, the power, i.e., the probability of rejecting the hypothesis, is 10%. At this point, we are just looking at , Type I error. As the degree of falsity increases, for that level, the power, (i.e., the probability of rejecting a false hypothesis) increases.

For a smaller sample size, the power increases slowly. For larger sample sizes, the power increases rapidly.

Of course, selecting the optimal sample size for the experiment based on low Type I and Type II errors is doable.

I am sure there are plenty of concepts here that will need some time to process, especially Type I and Type II errors. This week, we focused our energy on the hypothesis test for proportion. The next time we meet, we will unfold the knots of the hypothesis test on the mean.

Till then, happy learning.

If you are still unsure about Type I and Type II errors, this analogy will help.

If the null hypothesis for a judicial system is that the defendant is innocent, Type I error occurs when the jury convicts an innocent person; Type II error occurs when the jury sets a guilty person free.

If you find this useful, please like, share and subscribe.

You can also follow me on Twitter @realDevineni for updates on new lessons.