Edit 2 07/05/2020 — this blog is now a preprint on medrxiv with help from Dr. Lea Merone! Have a look here:

https://www.medrxiv.org/content/10.1101/2020.05.03.20089854v1

Edit 1 28/04/2020 — updated the meta-analysis with the infection-fatality rate for the Diamond Princess study by Russell et al. Previous meta-analysis used the case-fatality rate from this study rather than the IFR. Also updated the number of deaths expected for the UK, this was underestimated due to a miscalculation.

Edit 2 29/04/2020 — it’s been pointed out that I compared the IFR of COVID19 to the upper bound of influenza, added a range to reflect the uncertainty.

Everything is changing very quickly. Things published a week ago are no longer valid, because new evidence comes to light all the time. So let’s make this a living document — tell me if there are changes or issues with the protocol, or if new studies have come to light, and I’ll incorporate them into the analysis. Let’s make this data as good as possible for us all, and as useful.

During the COVID-19 outbreak, the one truth that has remained steady is that we have more questions than answers. When will quarantine end? What will the economic damage be? Can I get 5kg of gummy bears delivered overnight and if so will there be any left tomorrow?

So many questions. Not many answers.

Except to the last question — turns out the answers are “yes” and “no”, respectively Source: Pexels

One of the biggest questions that has been asked around the world is simple but incredibly hard to answer: “What percentage of people who get COVID-19 will die of the disease?”. Or, to put it another way, “How likely am I to die if I get COVID-19?”. It seems like such an easy question that it’s difficult at first to see why we don’t know— ultimately, the fatality rate is just number of people who die from the disease divided by number of people who have the disease, after all.

The problem is, neither of those numbers are easy to really get at. We can quite quickly get very good estimates of the case-fatality rate, which is the rate of death in people who have tested positive for coronavirus, but the one thing we are very sure of now is that we aren’t catching every case of the disease. It’s also unlikely that most places are capturing the true figure of people who have died from COVID-19, which means that both our denominator and numerator are suspect.

“How likely am I to die if I get COVID-19?”

It’s very hard to draw good estimates from bad data.

All of this makes the infection-fatality rate very hard to know. This is the rate at which people die when they are infected with the disease, including all of those mild and/or asymptomatic cases that you may have heard about. Some people have prominently claimed that this number is likely to be similar to the rate of death due to influenza — about 0.1% — while others have said that it’s probably 10 or 20 times that.

It’s like the TV show Numbers, except instead of actors playing at being mathematicians it’s literally everybody with an opinion

Thankfully, there is quite a bit of published research already looking at this difficult question. Dozens of papers are already published or in pre-print, trying to estimate the ‘true’ infection-fatality rate of COVID-19 from various datasets across the globe. And so, being a nerd, I spent my weekend collating all of these estimates into one number so that I could have a realistic estimate of the infection-fatality rate to share with you all.

It was a great way to spend most of a Saturday.

And so, without further ado, let’s look at the evidence for the infection-fatality rate in COVID-19, and what the infection-fatality rate is likely to be.

Getting The Numbers

The first part of understanding the evidence is to read it, and the first part of that is finding it all. In this case, I decided to run a fairly simple systematic review and meta-analysis, which is a type of scientific study that collates all the research on a topic into one estimate.

This can be a bit of work, but ultimately it’s not that hard — you run a search for the thing you’re looking for in scientific databases and collate all the results that come up. Then, you exclude all of the studies that are duplicates or irrelevant, and combine the final remaining studies into one combined estimate using a statistical model and your own insights.

Like this, but with more computer screens. Source: Pexels

The next part is very scienc-y, so I’ve italicized it in case you don’t want to read about my methods and simply want to skip ahead to the juicy numbers.

My methodology was simple — I searched Pubmed (published research) and Medrxiv (pre-print server) using the search terms: (infection fatality rate OR ifr) AND (COVID-19 OR SARS-CoV-2). This lead to a total of 66 studies on Pubmed and 43 on Medrxiv. I included any study that produced a percentage or numerical estimate of infection-fatality rate, and was written in English, which narrowed it down to 11 studies. I then had a look online through Google Scholar and Twitter to see if I could identify any other “grey literature” — government reports, mostly — that estimated the infection-fatality rate in a population. That eventually gave me 13 separate point-estimates and confidence intervals to combine into a single estimate.

A PRISMA flow diagram of the search methods

I used Stata 15.1 and the command metan, with the point-estimates and lower/upper-bounds of the confidence intervals, to combine this into one number*. 5 of the studies didn’t provide a confidence interval, so I computed one based on the numbers given in the report. I used a random-effects model using the DerSimonian and Laird method. I used the I² statistic to get an idea of statistical heterogeneity.

I also divided the studies up into three groups for the analysis — observational research, where scientists have tried to calculate an infection-fatality rate directly from the rate of infections and deaths in a population, modelling studies, where scientists have estimated an infection-fatality rate using a variety of factors, and pre-prints, which are a combination of the above two but not peer-reviewed and therefore more prone to error.

All of this wonderful science brings us here, to the place you’ve all been waiting for.

The Results

From the 13 studies — including 4 models, 4 observational studies, and 5 pre-prints of one kind or another — there was an overall estimate of 0.75% infection-fatality rate, with the 95% confidence interval ranging from 0.49% to 1.01%.

In other words, across all of these 13 studies and pieces of data, including serology studies testing everyone who is uninfected from the US, estimates of fatality from France and Italy, and a number of studies from China, the best guess of the proportion of people who die from COVID-19 infections seems to be about 8 in 1,000. That’s roughly 4 times more lethal than measles, and 8–20 times more lethal than your regular influenza infection.

Forest plot of the meta-analysis — you can see the overall estimate of 0.75% (0.49–1.01%) down at the bottom

Is this a hard and fast figure? Absolutely not. If you have a look at the plot above, you can see that I split it up into different types of studies — the models, observational studies, and pre-prints. All three of these come to quite different conclusions regarding the true infection-fatality rate, which makes sense given the very wide differences in methodology.

Depending on which type of study you trust the most, it looks like the infection-fatality rate is somewhere between 0.22% and 1.3%, with the most robust estimate putting it somewhere in between 0.49% and 1.01%. That’s still a HUGE range, but it does give us some idea of what the plausible reality is likely to be.

Forest plot of meta-analysis by country

I also had a look at the numbers when you analyze by country. The biggest group of studies came from Chinese data, while the rest were a mix from all over the world. If you look at that mix vs China, you see very little difference in the IFR, but what you do see is that the Chinese studies have very low heterogeneity — they are statistically very similar. This does lend a bit more weight to the estimate using Chinese data, as it may be more reasonable to combine these studies statistically than using all those very different studies from around the world.

The last thing I did was look at the estimates by month. If you have a look at the plot below, you’ll see an interesting phenomenon — earlier research had much lower (on average) estimates of the infection-fatality rate than the studies published more recently. This is probably because our understanding of COVID-19 is still evolving. What we know now isn’t hard-and-fast truth, but the best estimates based on current data that we have.

Forest plot of the meta-analysis by month

The other thing to bear in mind is that infection-fatality rate is almost certainly going to vary depending on the age breakdown of a region. We have very good evidence that older people are more likely to die from COVID-19 — this means that populations that have a lower average age will almost certainly see fewer deaths due to the disease. It’s not unlikely that some places may see as few as the lower bound of the lowest estimate — 0.22% — while others could see as high a value as the very upper bound.

What This All Means

Which brings us to the conclusions of this little piece of epidemiological research. Firstly, this isn’t a formal systematic review, and it’s very unlikely that I’ve captured every estimate out there. I can only read English and French, and there are at least a few papers that I found published in other languages that looked like they might speak to infection-fatality rate. There is also a vast amount of “grey” data out there — published estimates on government websites that are hard to get at unless you know exactly where on the web they live.

However, what this does give us is some idea of the likely infection-fatality rate of COVID-19 based on research so far. It seems, for example, that the rate reported by Stanford researchers in a study in Santa Clara of 0.12% is extremely unlikely to be true. We can also say with some certainty that the very high estimates that some have produced of nearly 2% are probably wrong as well. It’s very likely that the average infection-fatality rate will end up somewhere between 49 and 101 deaths per 10,000 infections, with a rough guess of 75 as our point estimate.

What does this mean for you? Well, if we take that estimate and apply it to the U.S. in an unmitigated epidemic, assuming that 60% of the population would likely be infected before herd immunity set in and the epidemic halted, the likely number of deaths would be somewhere between 1 and 2 million from COVID-19. In the UK, it would be 200,000 to 420,000.

At a minimum.

This doesn’t mean that your personal risk of death is going to be 0.75% if get COVID-19 — this is an aggregate measure that doesn’t take into account personal characteristics like age and comorbidities — but it does give you some idea of how many people in a given population are likely to pass away if they catch COVID-19.

New evidence is emerging all the time, so this is certainly not the last word, but given the data so far it seems unlikely that this number will enormously change. There are some countries out there with very young populations that might skew this downwards, but there are also many places with less well-developed healthcare systems that have yet to be really hit by the disease.

Eventually, we’ll have an exact breakdown of the death rate everywhere, but until then this information gives us a reasonable idea of what to expect.

*If you want to do this yourself, send me an email and I’ll send you the Stata code and data file I’ve compiled.

Included studies