For some time, I’ve been interested in the verification of software libraries for eXplainable Artificial Intelligence (XAI). Not only in terms of the number of implemented algorithms but also actual usability for the end user. I found that it is challenging to gather useful feedback from end-users because it is not easy to find a predictive model that wide group of users really cares about. One positive example is the FICO challenge. The task was to build and explain a predictive model for risk scoring.

Maybe it is time for a more important challenge?

Partial dependence plot for age in gradient boosting model that predicts survival of persons with COVID19 disease. Note that this plot is related to a simple predictive model build on small available data. It is not by any means final nor very precise.

Outbreak of COVID19 disease cased by SARS-CoV-2 is severe. Various data related to this outbreak is shared publicly. Selected individual data for infected persons (country, age, gender, date of infection, possible recovery or death) can be downloaded from this spreadsheet or this Kaggle data or for selected countries from other databases. This data makes possible training a predictive model for survival and also trying different XAI methods that can explain model predictions.

Using the DALEX I built a simple baseline solution. I trained a simple gradient boosting model that estimates odds of recovery based on gender, country and age (with forced monotonicity constraints). Then the model is explained with modelStudio interactive dashboard. Take a look and play with the model yourself https://pbiecek.github.io/explainCOVID19/

Break down plot for 50-years old male from China that has COVID19 disease. Chances of survival are pretty high (0.971), mostly due to moderate age. Note that this plot is related to a simple predictive model build on small available data. It is not by any means final nor very precise.

Usually XAI tools highlight any imperfections in the model or in the training data. This is the case here. Building a more complex model was difficult because of the incompleteness of the data on individual level. But even a simple model with three variables can be an interesting tool for a fresh look at the problem of model explainability.

If you have a better data, better model or better explanations please let me know. #explainCovid19