In predictive modeling, Multicollinearity statistics measure the strength of linear relationships among variables in a set. It is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning one independent variable can be explained as a linear combination of a set of other independent variables.

Moderate multicollinearity may not be problematic. However, severe multicollinearity is a problem because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. This results in unstable difficult to interpret co-efficient. Hence, multicollinearity can affect the statistical power of the predictive model.

Practice dataset:

You can register and start with existing courses/simulations or you can also upload your own data and start using inbuilt analytics algorithms in a point & click manner. Register here.

Input & Output:

To run “Multi-Collinearity Check” function in Analyttica TreasureHunt, select the target and independent variables. All the variables have to be numeric. Specify the target variable.

The function will return two tables: one for Condition Index and one for VIF of the predictor variables.

Application & Interpretation:

Variance Inflation Factor (VIF) can be used to detect the presence of multicollinearity. Variance inflation factors (VIF) measure how much the variance of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related.

It is obtained by regressing each independent variable, say X on the remaining independent variables (say Y and Z) and checking how much of it (of X) is explained by these variables. Hence VIF = 1 / (1 — R2).

From the formula, it is clear that higher the VIF, higher the R2 which means the variable X is collinear with Y and Z variables. If all the variables are completely orthogonal, R2 will be 0 resulting in VIF of 1.

Condition Index (CI) is another measure used to check the presence of multicollinearity.

CI = Square root (highest Eigen value / individual Eigen value).

Higher the condition index, higher is the multicollinearity.

Some important points to note to detect multicollinearity:

a) Correlation among two variables does not necessarily imply high multi-collinearity.

b) However, very high correlation does imply multi-collinearity.

c) Correlation check among the raw variables provides us with a tool to significantly reduce multi-collinearity among variables.

See Also:

Variance Inflation Factor, Condition Index.