I presented a keynote at PyData Warsaw on moving toward interpretable reliable models. The talk was inspired by some of the work I admire in the field as well as a fear that if we do not address interpretable models as a community, we will be factors in our own demise. In my talk, I addressed some of the main reasons I believe interpretability is important for the data science and machine learning community.

Why Care About Interpretability?

If we become so removed from the average person's understanding and we see it as a burden and nuisance to even address their concerns, we will find ourselves the target of a cultural, political or regulatory backlash. If we build interpretability, we allow area experts and our end users to give us realistic feedback and help improve our model overall. They can help us diagnose noise, see correlations and find better labels. GDPR and other regulations are pushing for more transparency. If we fear or run from transparency, then we might want to ask ourselves WHY. Is it because we fear the gap between our user's understanding of models and our own explanation? If so, is it just a matter of some technical literacy? OR, is it because we aren't proud of the way we are using their data and perhaps our models are extensions of unethical or immoral decisions made in the preprocessing, training or use case. Models can become racist, sexist and display other issues that are present in the data (often found in language data and crowd sourced data). If you are interested to read more, I have a whole talk on this as well, or just start with the amazing article on stereotypes in word vectors by Aylin Caliskan-Islam et al.

Now you are convinced interpretability might be useful, yes? So, where do we go from here? For better or worse, this is still a very open a broad area of research. I'll summarize a few libraries and papers you can use to get started immediately as well as some problems in the space which are still areas of active research.

What Can I Do Now?

There are several interesting open-source libraries which you can use to get started with interpretability. I highlighted a few in my talk, but there are many more. I will try to outline a few of the interesting ones I found including some I didn't have time to outline in my talk.

Classification Explanations

This is currently the space that has the most open-source tools available; so if you are working on classifiers, the good news is there is more than one tool you can use.

LIME (Local Interpretable Model Explanations): GitHub and Paper -- Find subsets of your data which can explain the model at a local level.

eli5 (explain to me like I'm five): GitHub Open-source library with great documentation allowing you to build visual explanations of classifiers and regression models.

Sklearn-ExpertSys: GitHub -- Decision and Rule-based sets for Classifiers. I personally haven't had a chance to use this yet, but plan to do so as part of a longer blog series.

Neural Network Architectures

YellowBrick: GitHub -- Data Visualization library aimed at making visual explanations easier. I have so far only played around with this for data exploration, not for explaining models, but I am curious to hear your experience!

MMD-critic: GitHub A meaningful approach to sampling! Google Brain resident Been Kim also wrote an accompanying paper which explains how this library works to help you sample

Ian Ozsvald's Notebook using eli5: Ian and I have been chatting about these libraries, and I asked him to continue to update and elaborate his own use of tools like eli5. Updates will come as well, so check back!

Bayesian Belief Networks: Probabilistic Programming is cool again! (or always was... probably?) This is one of many libraries you can use for building Bayesian networks. Although this may not fit your definition of interpretability (if you have to expose this to the end-client they may not be able to make sense of it), it is worth exploring for your own probabilistic models.

There are many more, which I hope to write about over the coming weeks in a series of blog posts and notebooks as I explore what I call: reverse engineering for model interpretability AND MVM: Minimal Viable Models. (more on this to come so check back or follow me on Twitter... 😉)

What is Still Unsolved?

Plenty. If you are a graduate student or you work in a research lab or you work with unlimited access to TPUs (ahem..), please help this area of research. Here are a few things that are still very difficult.

Interpretable views of neural networks: I don't mean the one part of ImageNet where you can see a face. I mean actual interpretation of neural networks in a meaningful and statistically significant way.

Multidimensional Projections: Finding ways to explain models or clusters using 2-D or 3-D visualizations of multi-dimensional space is difficult at best and error-prone at worst. Watch Matti Lyra's PyData Talk on Topic Modeling for some insight. Or follow up with research from the fields of multi-dimensional distance metrics as well as unsupervised learning.

Kagglefication: Ensembles are killing us, with some sort of averaged metric I wish I could explain... 😝 But honestly, if we gamify machine learning, do we run the risk of making our own work in the field into an optimization game where the only metric is our f1 score? I hope not, but it makes me fearful sometimes... I fear we find ways to often boost or over-engineer our features to the point that we no longer can interpret the metrics and measurements we have created. This is a problem.

Finding representative samples and ensuring our labels are useful: It's difficult enough to explain models that you know were trained on meticulously documented labels. This becomes much more difficult in the "real world" where tags or labels might at times be high-quality or in other moments be garbage (or entirely absent...).

Measuring Interpretability: Until there is a built in sklearn.metrics.interpret I'm not certain how widespread metrics or usage we will see for interpretability. Even defining how we might calculate that is difficult to deduce. Although we can build upon probabilistic models and cognitive science theory, how can we easily compare the interpretability of a text explanation with that of a regression model? Research is clear that this is not impossible to do, so I hope we can find a solution which allows us to optimize for a metric like interpretability...

There are likely many more areas of research and concern, but these are the ones that, for me, struck a chord and seemed obvious areas we, as an open-source community, can work on. If you know of papers or research in the area, I am all ears! I hope this small post has at least inspired you to have more conversations with peers or colleagues around the subject of interpretability, which is a good start.

My Slides / Talk

If you are curious about my slides, I have posted them below.

The video is available here:

Please continue the conversation in the comments below, or feel free to reach out on Twitter (@kjam).

Please enable JavaScript to view the comments powered by Disqus.