The Pearson correlation matrix and time series plots above suggest that the spread of the disease around the world is eerily similar after 500 confirmed cases. In nearly all instances we see slow initial growth in the number of infections — promptly followed by exponential growth. Given the data we have up to this point, it’s clear that the estimation method for this problem doesn’t need to be complex nor elaborate.

COVID-19 Summary Statistics

We can take the interquartile range of the data above to create a reference table that outlines the number of new cases to be expected in the days after a country crosses 500 confirmed infections. This is likely the simplest but also the most naive approach that we can use to approximate how infection counts may grow across countries. However, since it’s not specific to any single country, that also means anyone reading this can likely apply this to their own country to model how the disease may grow within their own borders.

From the 8th through to the 10th of March, we see that the number of confirmed cases in the United States has grown in a manner that is consistent with the data from other countries with over 500 confirmed infections. If the disease continues to spread in the US in a similar manner, then by March 20th there could be between 6975 and 14,270 confirmed coronavirus cases — and by the 25th of March, there could be between 8,125 to 33,463 confirmed cases. That said, it should be noted that observations decrease the further we go in time — China, Japan, Italy, and South Korea are the only countries in the data with more than 15 days of data — so the expected error increases the further we go into the future.

There is one additional drawback — the table above assumes equivalence in testing, which we know is not true. Given the lack of testing, official statistics in the United States are likely chronically underreported, so we need something more specific to model confirmed coronavirus infections here in the states.

Parameter Estimation

In technical terms, we‘ve built an autoregressive exogenous model (AR-X(1)) where we estimate the mean of new cases on each additional day based on the previous day’s value as well as the input from an exogenous but statistically associated trend. In this instance, our exogenous variable is the median number of confirmed cases for the other countries in the set. I computed the model by (1) taking all countries with at least 100 confirmed infections and (2) using the median number of cases per day across the sample to (3) estimate the expected number of cases in the United States up to the present day. The model was fit with Python’s statsmodels package (|AIC|: 3.70, |BIC|: 3.66, RSS: 0.43), and looks as follows: