Behind the Coronavirus Mortality Rate

A closer look at the mortality rate. What does it tell us?

Coronavirus Confirmed Case Map from Ding Xiang Yuan on Feb 21, 2020

In a previous article, I introduced a Python toolbox to gather and analyze the coronavirus epidemic data. In this article, I will use the Python toolbox to dig into one measure of the epidemic — the mortality rate. I am going to focus on the following questions:

What is the regional variability of the mortality rate? Is the current mortality rate likely to be an underestimate or an overestimate?

Now let’s start.

Sanity Check

Just like any data analysis task, we should always perform a sanity check prior to the real work. So let’s independently verify the World Health Organization's ~2% mortality rate estimate on January 29, 2020¹:

We can see that the mortality rate is mostly within 2%~3%, well in line with the WHO official estimate.

1. Investigate the Mortality Rate Regional Variability

For simplicity, let’s define the “mortality rate” (MR) as the following:

MR(T) = cumulative deaths at date T / cumulative confirmed at date T

This calculation is a little “naive”, but for the discussion of regional variability, it is a good proxy. And we will revisit this calculation later in this article.

Since the epidemic started from the city Wuhan, and most of the cases are concentrated in the Hubei Province, we naturally want to split the data into three regions:

The city of Wuhan

The Hubei Province except for Wuhan city,

China except for Hubei Province.

Following is the daily new confirmed count in these three regions. It confirms that this is a reasonable split. Plot.ly is a great tool to build interactive plots. So I will use it instead of the more traditional Matplotlib so that readers can drill down the data on his / her own.

There is a huge spike of new confirmed cases on Feb 13, 2020. This is because Hubei Province loosened the definition of “confirmed” on that date so that it’s consistent with the reporting of other provinces². The new definition added clinical diagnosis to the criteria, thus included many patients that were left out previously.

We can easily compare the mortality rate of the three regions as well as the national average in the following plot:

It is clear that the mortality rate in Wuhan is much higher than the rest of the Hubei Province, which is in turn much higher than the rest of China. This result is in line with the report from the National Health Commission of China³. And according to Johns Hopkins CSSE, as of Feb 20, 2020, there are 634 confirmed cases with 3 deaths outside of China, so the international mortality rate is roughly in line with that of China outside of Hubei Province. Therefore, depending on where you are, the mortality rate difference could be 10x or more.

At the time of writing (Feb 21, 2020), there is no evidence that the virus has mutated. Then why is the mortality rate so much different across regions? One explanation is that this virus is very contagious, and can infect a large number of people in a short time if uncontrolled. Therefore, the hospitals in Wuhan and Hubei Province were quickly saturated, leaving many patients died due to insufficient resources. On the contrary, the virus spread to other provinces relatively late, when the national tight control is already in place. So given a much slower increase in patients compared to the healthcare resources, the mortality in other provinces is much lower.

We must realize that China is a unique country that it can quickly mobilize a huge amount of resources and take unprecedented measures to strangle the spread of the disease. But if this virus spill over to other countries which lacks the ability to contain the virus, the result could be far more disastrous and result in a much higher mortality rate.

2. Mortality Rate Estimates

Now let’s come back to the value of the mortality rate. Is its current value 2~3% likely to be an underestimate or an overestimate?

As previously pointed out, the simple formula of mortality rate is slightly flawed. That formula is accurate only when the epidemic has ended. During an epidemic, the death count at time T is only a result of the confirmed cases a few days earlier at T-t. More precisely, it depends on the patients’ survival probably distribution, which is difficult to estimate during an outbreak.

Nevertheless, it is certain that the denominator in our “naive” formula is too large. So we can conclude that the current estimate of mortality rate is likely to be an underestimate.

To get a sense of the magnitude of underestimation, we can plot the mortality rate using different lag t. According to some early studies, the average period from confirmation to death is about 7 days⁴. Therefore, we plotted the mortality rate for no lag, 4-day lag, and 8-day lag.

The calculation is straight forward:

But the plotting is a little more involved:

As you can see, the lagged mortality rates are higher as expected, but not by much. And the recent convergence indicates that the epidemic is stabilizing or cooling down. Therefore, if there is no further outbreak, we can reasonably estimate that the mortality rate in Wuhan will be in the 3%~6% range, the rest of Hubei Province in the 2.5%~3% range, and the rest of China in 0.6%~0.9% range.

Update (3/8/2020):

The mortality rate in these three regions is stabilized at:

Wuhan: 4.8%

Hubei Province except Wuhan: 3.5%

The rest of China except Hubei Province: 0.7%

These numbers approximately matched my above predictions on 2/21/2020.

Final Words

Most of the plots in this article are interactive, readers can zoom in and read the precise numbers. For those who want to play with the data by themselves, the Python Notebook to reproduce all these plots is in the GitHub repo, and can be run on Google Colab.

(Update on Feb 24, 2020: my plot.ly account only allows 1,000 views per day. In order to avoid the “404 error”, I have replaced all interactive charts with static pictures. But the plot.ly codes still work. And you can still explore the interactive charts in the Google Colab or your own machine.)

Acknowledgment

I want to thank my friend David Tian, a Machine Learning engineer, for his generous help on the Google Colab setup, and his valuable suggestions on this article. Check out his fun self-driving *DeepPiCar* blog.