In my last post, “Coefficients are not the same as variable influence”, I argued that coefficients in a linear regression model are useful but limited in answering the question, “which variables are most influential in model predictions?” One manifestation of the differences is that variables that have relatively small coefficients, that is, they have relatively small influence on predictions on average, may have significant influence on predictions within sub-ranges of the input variable, even sometimes become the most important variable within the sub-range(s). This effect can occur when the input variables do not comply with the assumptions of the algorithm, most notably with linear regression models.

In this post, I’ll take this one step further to show another way to estimate the influence of a variable on a predictive model without having to decompose the predictions into terms like I did last week, which assumed linear regression models. The method described this week is based on randomization experiments to estimate influence without having to assume or understand the distributions input variables approximate.