Status of the Outbreak

Just over two weeks ago I wrote that there wasn’t enough data yet to say what the likely outcome of the virus would be. Last week I wrote that the outlook had worsened as more data from outside of mainland China became available — but that it was still early in the epidemic. Today, the four reported deaths in Iran (and the two exported international cases) makes it extremely likely that there are hundreds of cases in the country. The Iranian health ministry says that:

Based on existing reports, the spread of coronavirus started in Qom [and] has now reached several cities … including Tehran, Babol, Arak, Isfahan, Rasht and other cities and it’s possible that it exists in all cities in Iran. [emphasis added]

Combine that with the 17 cases and one death that have been reported today from Italy, and it is likely that the WHO criteria for a pandemic has already been met. It seems increasingly unlikely that the outbreak will be contained within China. Although we can still hope for that outcome.

Given the current spread of the outbreak it is worthwhile to try and project what a worldwide spread of the virus might look like.

Note — if you want to skip the explanation of how the inputs were calculated, just scroll down until you see the SIRD model with the projections for the United States.

Case Fatality Rate (CFR)

I think the most important unknown factor at this point is the true CFR of the virus. There are several estimates that have been published in pre-print studies, but they largely rely on the data available from mainland China. As skepticism has grown about the reliability of China’s data, I believe it’s worth attempting to calculate a preliminary CFR that excludes the Chinese data.

A more accurate calculation can be made by relying on the data available from countries which scored at least 50 (out of 100) in the 2019 Global Health Security Index’s measure of their ability to detect and report emerging epidemics. This also leads us to reject the fatalities from Iran in our calculations — which I also think makes logical sense.

Methodology

Having decided to restrict our dataset, we still need to determine a methodology to calculate the CFR. In Methods for Estimating the Case Fatality Ratio for a Novel, Emerging Infectious Disease several methodologies are compared. Fortuitously, these methodologies are compared by how they performed during the 2003 SARS epidemic which, given SARSCoV2’s close genetic relationship to SARS, makes the calculations especially relevant.

Among the various methodologies, I believe there is a clear choice.

e2(s)=D(s)/{D(s)+R(s)}

where, D(s) and R(s) denote the cumulative number of deaths and recoveries.

Use of this formula performed well compared to the other methodologies during the various stages of the SARS outbreak. This methodology was “reasonable at most points in the epidemic.” For copyright reasons won’t show the graph here — but I urge you to check out Figure 3 from the paper which compares the success of the various methodologies across the timeline of the epidemic.

This methodology also has the added benefit that we do not need to know the date when symptoms began for the fatal cases — information which is not readily available for all cases at this time.

Using this methodology, there have been 10 deaths and 180 recoveries from countries which met our standards for inclusion.

(2) CFR = 10/(180+10) = 5.26%

Given this limited sample size, we can calculate a margin of error for the data with a 95% confidence interval. Confidence interval and margin of error calculated by:

(3) n = [z2 * p * (1 — p) / e2] / [1 + (z2 * p * (1 — p) / (e2 * N))]

Where: n is the sample size, z is the z-score associated with a level of confidence, p is the sample proportion, expressed as a decimal, e is the margin of error, expressed as a decimal, N is the population size.

This gives us a margin of error of 7.3%, so our estimated CFR is 5.26% (4.88%-5.65% CI:95%).

R-nought (R0)

To get an estimate for the R0 value for the virus (what is an R0 value?), I’ve taken the average R0 values of the pre-print estimates of the R0, with each study weighted to lose 2% for each day earlier than the latest published study. E.g. a study based on data from 1/22/20 would be weighted 30% less than a study with data up to February 5th (15 days later). Studies used here can be found in the sources and calculations section on this page.

This weighted-average gives us a value of R3.66.

SIRD Model

With those two important factors calculated, we can create an SIRD model of the outbreak in the United States. (Actually several other factors such as the Average latent period (days) and Average duration of infectiousness (days) are needed and have been taken from pre-print studies. A more detailed description of the model inputs and their sources can be found here.)

To make the model realistic, I have made the additional assumption that containment measures would be put in place which would decrease the R0 of the virus. It has been calculated (Sanche et al.,) that the measures the Chinese have taken have reduced the R0 of the virus in China by up to 59%. Using this same percentage reduction on our estimated R0, and assuming that these measures are put in place when 1% of the population is infected, gives us the SIRD model shown below.

This suggests that, if not contained, the outbreak will peak on July 4th, and ultimately 59.77% of the country would be infected at some point. The death toll is projected to be ~11 million — although there are very strong indications that the fatalities are heavily skewed towards the elderly.

This model will be updated fairly regularly as data comes in on this page.

Changes to the CFR and the R0 values are updated at least daily here.

On twitter @joshuafkon