What is Anova and when to use

Concept of Anova and different types of Anova explained in a very simple way with examples, also you will learn how to use Minitab for Anova and infer output. Anova is a very important and versatile analysis used in data analysis and analyzing relationships. Anova is used when X is categorical and Y is continuous data type.

Definition : ANOVA is an analysis of the variation present in an experiment. It is used for examining the differences in the mean values of the dependent variable associated with the effect of independent variables. Essentially , ANOVA is used as a test of means for two or more populations.

The tests in an ANOVA are based on the F-ratio: the variation due to an experimental treatment or effect divided by the variation due to experimental error.

Before we move ahead, we need to understand following four terms very clearly:

Dependent Variable – Analysis of variance must have a dependent variable that is continuous. This is our “Y-Total sales”, its value will depend on different levels of “X” or “Xs” in our experiment or analysis.

Independent Variable – ANOVA must have one or more categorical independent variable like Sales promotion. These variables are also called Factors.

Null hypothesis – All means are equal.

Factor level – Each Factor can have multiple levels like Heavy, Medium and Low are three levels of Sales promotion.

Different forms of ANOVA There are three types of Anova analysis which we can use based on number of independent variables(Xs) and type of independent variables. But your dependent variable(Y) will remain continuous always.

Fig 1 explains the types of Anova with an example. In this example “Y” is total sales of a general store in $ which is a continuous variable and it is common for the three examples.

Eta square : The strength of the effects of X on Y is measured by Eta square. The value of Eta square varies between 0 and 1.

F Statistic : The null hypothesis that the category means are equal in the population is tested by an F statistic based on the ratio of mean square related to X and mean square related to error.

Mean square : The mean square is the sum of squares divided by the appropriate degrees of freedom

SS(between/x) : This is the variation in Y related to the variation in the means of the categories of X. this represents variation between the categories of X or the portion of the sum of squares in Y related to X.

SS(within/error) : Also reffered to as SS(error), this is the variation in Y due to the variation within each of the categories of X. This variation is not accounted for by X.

SS(y) : The total variation in Y.

The total variation in Y, denoted by SSy can be decomposed into two components:

SSy = SS x + SS error

Test the Significance in ANOVA

In one-way ANOVA, the interest lies in testing the null hypothesis that the category means are equal in the population.

H0 : Mean1=Mean2=Mean3…….=Meanx

Under the null hypothesis, SSx and SSerror come from the same source of variation. In such case, the estimate of the population variation of Y can be based on either between or within category variation of X.

As we have already talked about X having multiple levels or categories, just refer back to the introduction of ANOVA in case you are having difficulties to understand this.

The null hypothesis is tested by the F statistic based on the ratio between the two estimates of Mean square due to X(between) and Mean square due to error(within):

F = MSx/MSerror

This follows the F distribution, F distribution is a probability distribution of the ratios of sample variances.

Interpretation of ANOVA test

If the null hypothesis of equal category means is not rejected, then the independent variable doesn’t have a significant effect on the dependent variable. On the other hand, if the null hypothesis is rejected, the effect of the independent variable is significant.

This means that the mean value of the dependent variable will be different for different categories of the X, the independent variable.

How to measure strength of effect of X on Y : Eta square

The effect of X on Y is measured by SSx. SSx is related to the variation in the means of the categories of X. The relative magnitude of SSx also increases as the variation in Y within the categories of X increases or decreases.

The strength of the effects of X on Y are measured as:

Eta square = SS x / SS y

The value of Eta square varies between 0 and 1. It takes a value of 0 when all the category means are equal, indicating that X has no effect on Y. So higher the value of Eta square and closer to 1, means variation is Y is explained by the independent X.

Objective: To test the effect of cause X on the CTQ Y

Usage: When cause X is Categorical (grouped) & CTQ Y is Continuous Data

A project was taken to Reduce the Processing Time.

One of the causes suspected was lack of experience.

The following data on processing Time was collected with 3 levels of Experience. Analyze the data and verify whether lack of experience is a cause of high Processing Time

As you can see below, we have divided our staff into three categories based on their experience. There are employees in their first month of job, so they are part of “0” month experience. Then we have employees who had more than 1 month but less than 6 month of experience, we grouped them under “6” month experience. More than 6 month experience employees are grouped under “12” month category.

We have taken samples of processing time of different employees of different categories. Samples and sample sizes are for illustration purpose only, so count of samples are kept low.

We are using MINITAB for calculation purpose, though you can use any other software like SPSS or R etc. You will get results more or less in similar way only and you will be able to interpret easily.