

Update: Part 2 here: Update on Santa Clara CV19

Santa Clara County has been doing a great job at providing daily updates on the status of COVID-19 (henceforth CV19). Let’s see what we can do with that!

We are gonna do some cool stuff in this blog! We will:

Come up with a day over day “growth rate” of infections using an exponential regression

Do multiple scenario analysis to see the range of possibilities for # infected and # dead in the county (based on our actions)

Look at the age distribution of those deaths

Compare the growth rate against the Flu (Which IMO is way handier than comparing R0), and

Talk about where we go from here

So let’s get to it!

First, the data so far as of March 9th, starting from Feb 28th, when we really start seeing new case reports daily:

Note that I was working on my phone for most of this, so apologies for ugly phone screenshots. I wanted to show you my work here – if you love it let me know, and if you think it is overkill let me know. I want everyone to be able to follow along!

Also note that this entire blog post is super back-of-napkin. I am a Data Scientist, not a Statistician nor an Epidemiologist. I’ll try and make my assumptions clear when I make them, let me know any feedback or suggestions you have regarding my modeling in the comments!

Last, something to keep in mind for you non-Californians, Santa Clara County includes cities like San Jose, Mountain View, Palo Alto, Campbell, and naturally Santa Clara the city.

Fitting an Exponential Curve to Santa Clara County Data:

Lets make a simple exponential curve and fit it to this data!

Generally, exponential models are a common model of the early spread of a virus. Specifically, over a long period of time, virus spreads tend to follow logistic curves of some form. These consist of essentially an “exponential” phase, and then a “linear peak” before tapering off.



We can see this if we look at the progression of CV19 in China right now:



Yup! Looks China looks like a Logistic Curve! Looks like it goes from exponential growth, to flattening out around the start of February!

Assumption: Based on the timeline of CV19 growth, we will assume that China started taking this seriously around December 31st, when they notified the WHO about the illness. Based on this, it looks like it took about 4 weeks for China to end the “exponential growth” phase and enter the “linear” phase. How do you think we will do? I will present some different cases below.

First, let’s fit an exponential curve to our local Santa Clara County data, using Feb 28th as our starting point:



R-squared – a measure of how much variance the model explained – looks great at 98.6%! That means it looks like Santa Clara County cases so far do indeed look like they are following exponential growth. Basically, I’m just saying “look at how well that curve goes through those points!” – exponential looks like a great fit.

Understanding What This Means in Human Terms

Exponential growth is cool because it gives you a very human level understanding of what is going on day by day. What it means is that you can generally predict the number of cases on any day to be a fixed multiple of the day before it! This plot suggests that multiple is 1.255 – about a 25.5% local growth rate in Santa Clara County, day by day. Here’s how I got that (basically just pick any two points and divide them, I’m too lazy to do Calculus):



A 25.5% growth rate!

This 25.5% number is on the high side, given the global estimate is in the 15% to 25% range, but it is quite possible that because Santa Clara is densely populated, its growth is faster than the current global growth overall.

Food for thought, think of this like investing money. If you are earning 25.5% interest, your money is doubling roughly every 2.72 years. So in virus terms on a daily scale, a 25.5% growth rate means the cases in Santa Clara County will double every ~2.72 days! If instead we follow a more conservative estimate using the international numbers of 15%, that corresponds to a doubling every 4.6 days. Here’s my math:

How do we forecast with this?

How do we use this for forecasting the number of positive tests on a given day?

Let’s say for example that we want to predict March 10th using data we have today at March 9th. We have two options. One is to use the actual recorded number of cases today as a starting point for forecasting, the other is to use what the model predicted that today should have been as a starting point.

I prefer using today as a starting point, as I believe that the newest information is always the most relevant. Specifically, as of today, March 9th, there are 43 reported cases in Santa Clara County. For this, we can forecast the number of cases tomorrow as:

43 * .255 = 10.965 new cases March 10th 43 * 1.255 = 53.97 estimated total cases March 10th

So if exponential growth is holding, we would expect to see 11 cases tomorrow, March 10th! If we consistently fall short of predictions, we are likely through the exponential phase already, which would be a good thing.

So how long will we be in the exponential growth phase?

Well, honestly – it could be tomorrow, or it could be a while! But let’s look at what happened in other countries for guidance.

Remember we have a reasonable way of forecasting the number of cases on subsequent days as long as we are in the “exponential phase” of virus growth. Also remember it took China about four weeks to exit exponential growth.

This is similar for South Korea, who had its first influx of cases roughly Jan 20th, and roughly managed to flatten their exponential growth out early March – so maybe 5-6 weeks. I get this estimate by looking for the first period where # new cases per day starts to decline!

So, let’s play a game! It looks like the time to beat is 4 weeks to stop exponential growth. Do you think the USA can stop it faster than China and South Korea? Or will it continue to spread as in Italy? Let’s take a look at some different cases!

Assumption: The California government started its race on the 28th. This is when we started doing real testing, and when real cases started flowing in like in other countries.

Case 1: We control exponential growth as fast as China – 4 weeks



In theory, we have a pretty big head start on everyone. If we can stop exponential growth in four weeks from Feb 28th (aka by March 27), check out what that looks like on a chart:



By the end of the exponential growth phase, looks like about 1500 total people will ultimately test positive in Santa Clara County! Not great, but with a population of roughly 2 million, the vast majority of individuals will probably be safe at least for this initial “first CV19 season”. Even then, most of those 1500 who do get infected will not die either.

Case 2: We do it as fast as South Korea – 6 weeks

Now, let’s check out what a 6 week forecast would look like if we do not exit exponential growth:



Now, it looks like total cases in Santa Clara County hit 25,000 cases in by April 10th if we do not stop exponential growth by then. That is getting worse for sure! But let’s take a look at one last scenario:



Case 3: The USA Totally Blows It – 8 weeks



If exponential growth continues for two months, essentially into April 24th, around when the temperature starts really warming up, this is what we are looking at:

Yikes! Over 300-thousand infected! That’s a lot – over 15% of the population. After this point, I have to imagine there just isn’t a high enough density of healthy people for exponential growth to possibly continue. But I’ll leave that to the Epidemiologists to tell you for sure.

Aside: I will treat this as a worst case scenario for the purposes of this blog, but note that some experts I saw in news articles somewhere estimated 40-70% of the population. I haven’t looked into that deeply enough to know how they justify their estimates however. Point being there are still some people who essentially think this growth period could be worse! Though admittedly they may be referring to a hypothetical (but likely) winter CV19 flare up.

I leave this aside here because I could be wrong, and exponential growth could indeed continue after 8 weeks. That scenario is a much worse one – and I will address it in the future if it looks like we are coming to that point!

Okay, so what does this mean in terms of how many die?

Estimates for Case Fatality Rate (CFR) – basically the standing proportion of infected who die – vary vastly by country. We usually use CFR to estimate of the odds of you dying given you test positive for CV19. The WHO have a fairly high estimate at 3.4%. However, there does seem to be a consensus that currently 1% may be more realistic. This holds when taking into account more recent cases in China, and has also been endorsed by the NIH, who I consider a reputable source. In a brighter picture, South Korea reports closer to a .5-.6% CFR, though admittedly the affected populations there are disproportionately young, and may not be representative of other countries. We will touch more on the effects of age in a bit.

Here is a full scenario analysis of what different “exponential windows” mean:

Wow! That’s quite a range of possibilities! The uncertainty is part of what makes this virus so uncomfortable for many. But we will learn a lot in the coming days, and we can incorporate that information over time to steadily reduce our uncertainty – and hopefully our fear.

Who is most likely to die?

A lot of you are likely familiar with this figure:





This is out of the China CDC. Pretty strong evidence that Coronavirus is vastly more dangerous than the Flu. That said, we need to take this with a grain of salt.

Some back of napkin normalization: Remember that their estimate at the time of release of this figure was based on the 3.4% CFR. So in practice, you can divide the right hand side numbers by 3.4 for the 1% case, or divide roughly by 7 to get the rates if the .5% CFR is correct.

That’s good for informational purposes, but what I really want to know is not just how many people died, but who those people are.

To figure this out, I summed up all of the death probabilities, weighted by the size of the age group, and can divide by the contribution of any group of people. This gives me the “conditional probability” – essentially given a death, what age group is it most likely to fall under.

(Note, I reworked this part several times thanks to a comment I received – I believe I have a much more nuanced set of calculations now)

Using survey data we can come up with population proportions by age (from survey data, see [9])

To calculate the conditional probability of a particular death falling into an age range, we have to do a couple steps. We first weight the CFR by age group – essentially in the table below multiply the Fraction by the CFR.

This is cool because it gives us a “Total weighted CFR estimate” based on the population proportions of Santa Clara! This essentially corresponds to the overall CFR we would see if we took the Chinese CDC numbers and adjust for the age distribution of our population in Santa Clara:

We get a weighted CFR of 1.2%!

Then, we can just see how much each age group’s weighted CFR contributed to the overall weighted CFR to get a % Fatalities by age group.

Over 80: 19.2%

Over 70: 50.4%

Over 60: 75.2%

Over 50: 87.4%

Over 40: 92.0%

Over 30: 95.6%

Over 20: 98.2%

Under 20: 1.8%

So we can calculate that over 87 percent of people who die will be over about 50! That’s not good! This means that the older population has to take much more strict precautions.

The 87% over 50 death rate essentially lines up with the news reports that I have seen at least. So be sure to convey the severity of the situation we have ahead of us to your parents and grandparents!

How does CV19 growth compare to the Flu?

Let’s take a look at the growth rate for the 2019 Flu season, which is slightly worse than usual – to be conservative. There are a lot of numbers floating around the internet called “R0”s – the number of people each person is assumed to infect – but these are more useful for Epidemiological purposes than human understanding. So let’s compare the much easier to understand growth rates instead!



Here is where I am sourcing my data:





We actually have weekly estimates of infection! This is rate of hospitalization, but roughly speaking the growth rate of hospitalization most likely follows the growth rate of the virus as a whole. I will stop at the end of 2019, where it looks like the curve just starts arc-ing down. I’m eyeballing it, but I would say it’s a pretty close guess to the actual end of the growth period. We can check by fitting an exponential curve to it!

Here is the raw data:

Here is the fitted curve:







Again, a really solid fit – so it looks like we’ve captured the exponential portion of the 2019 Flu effectively. Calculating the growth rate based on those parameters, I end up with a 1.05966 growth rate. About 6%! Compare that to the most conservative estimate of CV19 15%, or compare my local estimate at 25.5%!



Remember that a 25.5% growth rate implies a doubling every 2.72 or so days. A 6% growth rate means a doubling in ~11.5 days. That’s a huge difference!

What Can We Do?

So this CV19 outbreak in Santa Clara County looks like it will spread quickly, and will likely wreak havoc on the elderly if it reaches them. So what can we do?

The real question going forward is “What can our medical infrastructure handle?” . Its hard to put an exact value on this. Without medical care, the best case .5% CFR will likely drift towards the 3.4% CFR that the WHO adopted based on early information in China, before a lot of people were being treated correctly. This makes the difference between 7 deaths, and 10,200 deaths in Santa Clara County alone!



Practically speaking, what we are trying to do is this:







As long as we can slow down the rate the CV19 spreads, we can prevent the hospitals from getting overwhelmed! We need to shorten our time in the “exponential growth phase” as much as possible, and need to take strong and decisive action to do so.



We need more tests, more hospital staffing, more medical supplies, more transparency, and more of a great deal of things! Individually we can all contribute by exercising extreme individual caution, and putting pressure on our local institutions and companies to prevent events and activities that bring a lot of people together! For now, our eyes are on the spread of CV19 in our communities, so as individuals one of the most important things we can do is be informed, and be safe!



I hope you learned something in this blog, and if you made it this far I really appreciate you. Please let me know any feedback, good or bad, in the comments. If you think it is worth it for me to keep you guys updated on this data, let me know!

Stay smart, stay safe!