Interpretable Machine Learning Part 1 David Josephs

In this blog post, we are going to talk about a few tools for interpretable machine learning. I believe the importance of the subject is rather clear, as we can not only use it to help mitigate and deploying biased models, but also to help us remain relevant should autoML take off in the next 10 years. We will go over the tools in order of (in my opinion) increasing utility, starting with simple Permutation Importance, then PDPs and ICE, and finally ALE.

Permutation Importance: Math and Intuition

Everyone loves tree based models. Gradient boosting, random forests, and friends are wonderful, flexible tools. One of the other benefits of these models, because of their tree-ness, is that we are able to actually see how “important” each variable is in the decisions the model is making. This is also one of many many reasons why we love linear models, we can actually see and quantify the strength of a feature in our model. However, why must we limit ourselves to just linear and tree based models?

Lets try and think of a new approach to get variable importance. When I was first really getting into ML, I remember asking one of my professors the question: “How much time do you spend on feature engineering?”. I will never forget his answer, he told me: “Feature engineering [is] the most crucial part to improve both accuracy and model generation. If you have an unneccessary feature in the model, you are in essence fitting noise.”. This has stuck with me for a long time, and it is a useful thing to keep in mind while discussing permutation importance.

If unneccessary features just provide noise which decreases model accuracy and generalization, what happens if we replace a good feature with noise? Our model should be inherently worse, no? This is the key idea of permutation importance.

If I replace a feature with noise, how much worse does the model perform?

This is the key idea of permutation based variable importance. All we are going to do to calculate this is three simple steps:

Calculate prediction loss Replace a feature with noise Recalculate prediction loss Compare