This is a follow up on my post from two days back here: A Detailed Analysis and Simple Model of Santa Clara County COVID-19 Cases

Boy, just in the last two days since I wrote the first post, I am seeing some really interesting new data. Specifically, that the number of tests is short of what we would expect given the exponential growth we see elsewhere. I am still working on collecting more information, but I wanted to put down some initial thoughts here!

Importantly, a lot of you responded – correctly – saying that the model I used in the last blog doesn’t separate spread of the virus itself from growth in the amount of testing. Generally speaking however, if the number of tests per day is at least increasing, these should be somewhat proportional.

Essentially, what that means is that if the number of positive tests declines, one of three things is true. Either one, the number of infections is decreasing. Or two, the number of tests is decreasing! Or possibly three, that the last few days were just anomalous. This is totally possible when you have small amounts of data!

I want to dive into the new information I found and try to consider it from multiple standpoints this time!

First, two cases were announced yesterday. Two! Followed by three cases just today, the 11th. Exponential growth – at least by the model I fit in my last post – would have predicted roughly five times that.

Why could this be?

Let’s dive into the three options now.

Option 1:

Testing is growing, and Santa Clara County has it under control

I would prefer that this option were true personally!

Take a look at what happens if I try to fit another exponential curve to the data:

The exponential curve had an R squared of .986 yesterday, but a .952 today. Less good of a fit than it was before! This is because we had a line of relatively flat days before yesterdays’s flat out drop.

To me, in fact it would look like based on that picture like we are flattening out and looking more logistical. Compare with the China plot again:

Aside from the sudden jump allegedly caused by a change in methodology by the Chinese government, we can see that they do appear to be logistic:

There isn’t a hard and fast rule for when you say you are no longer exhibiting exponential behavior or not, but certainly the more days where we hear low numbers of new cases the better!

If we get another day or two like this, I’m gonna start fitting general logistic models to this instead of exponential ones!

Of course, with so little testing being done it’s hard to be certain either way.

Of course, this only matters if testing is expanding:

For the number of positive tests to increase exponentially, our testing needs to be able to keep up!

At the very least, we need to keep doing a fixed, nondeclining number of tests per day in order to demonstrate this. Think about it, if you test a fixed thousand people a day, and illness is spreading exponentially, you would expect a sampling of 1000 people to have exponentially more cases. But naturally if we increase in both tests per day AND spread, we would expect even more of an exponential increase in positive tests!

Is there any evidence of this testing growth? There is a great effort by Jeffery Hammerbacher – another Data Scientist – and a bunch of volunteers to collect data on the number of tests provided by state.

We don’t have county level data for Santa Clara County, so let’s look at California data as a proxy.

Looking at the latest numbers as of March 11th:

Really, I think it is only fair to consider days where new numbers of negatives are released, so we have a ratio of positives and negatives that we know matches truth. Thus, we can look at three key dates here, March 4th, 9th, and 11th, test counts 515, 804, and 1073 respectively.

Looking at it this way and treating 462 as our starting point, there were 289 tests done from the 4th at 2pm to the 9th at 2pm, 57.8 per day. From the 9th to the 11th, we pick up another 269 tests, 134.5 per day.

By these numbers, the rate of tests in California has been accelerating!

And finally – for what it’s worth – this (admittedly now five day old) article says that California had the highest testing capacity in a state:

The Strongest Evidence Yet That America is Botching Coronavirus Testing

Though if true, we aren’t using it – as by the Hammerbacher numbers we are second place and have about half of the total tests actually performed as Washington state – which is different from capacity. Strange. Anyways…

There is some evidence testing rates are increasing:

So quite possibly the latest data isn’t the product of a testing issue. At the very least, provided that each day we test more cases than the last, the number of positive tests should at least be somewhat proportional to the number of infections in the public at large. Certainly if there is both exponentially more illness, and more testing, we would not expect the number of new cases to be declining!

This could reflect several things if true – most importantly that CV19 actually is containable in American cities – provided you take early and drastic action. The Bay Area was lucky in that a lot of the major event cancellations and corporate remote work policies went into effect before things got out of hand, instead of during. This could have made all the difference!

If this is true, we still need to be super careful. Arguably, even more so – as if our efforts are working and we get complacent they might not keep working for long! The possibility that an outbreak occurs if we attempt to return to normalcy is very real, and we should not underestimate it. If we managed to stem the exponential growth phase here we should count our blessings – not push our luck.

Last, if this pattern holds then other states who are in the very early stages of breakout need to pay attention now more than ever! States and localities where containment measures work are just as important to learn from as the ones where they do not.

–

That said, option 1 assumes testing capacity and availability is expanding! But it was pointed out by some of your responses that my belief that testing was increasing wasn’t necessarily true – and that there may be deeper issues with testing. So, I wanted to take a moment to consider that option and see if I could make an argument for it!

Option 2:

Testing is declining, meaning something really strange is happening with testing reporting, supply or capacity

Update: Based on reports of a reagent shortage (see here: https://www.latimes.com/california/story/2020-03-11/coronavirus-testing-kits-lack-key-ingredient-causing-confusion?_amp=true) , Option 2 is unfortunately looking increasingly likely. RT-PCR test kits are usually reusable – but require a set of consumables to run per test. Unfortunately that means that Option 2 might be likely, and the estimates from the first post might well hold up.

The “test kits” shipped by the CDC are essentially largely the hardware component. They also need a set of reagents to run, and it would appear this could well be the bottleneck.

Incidentally, this is what reports mean by “test capacity” – they have the hardware to perform the tests. Not the samples and/or the consumables!

– original section below –

Normally you would expect the amount of testing being done in the midst of an outbreak to go up as labs get up and running and test capacity increases. As long as this were true, you would expect the number of cases reported to be at least a little bit proportional to the true number of cases.

If testing suddenly went sharply down however, this would mean that – in fact – they would NOT be proportional. Decreases in testing would totally mean decreases in number of cases reported.

That said, it seems unlikely there would be a sharp drop in testing right?

But take a look at this CDC website:

Note that the period after March 5th is noted as “pending” – I am unsure what that means.

Are we doing less testing?

It seems unlikely, but stranger things have happened. (Update: perhaps not so unlikely now)

I think it is more likely that it is something to do with the “pending” aspect – but the fact that there are partial counts included there is interesting.

Keep an eye on that webpage though – as if the amount of testing we are doing over time actually IS declining, then it would totally mess with our forecasting of CV19 spread. And raise a lot of questions! If you see those results get finalized and they still show a decline in tests provided we might be in trouble.

One interesting thing is that this is just data for public institutions! I wonder how private testing is coming along.

Option 3:

The last several days are anomalous, and exponential growth actually IS continuing

Whenever you are working with small amounts of data, you always run the risk of seeing signal in data where there is only noise – a problem known as overfitting. As such, we have to be careful whenever we see small variations, as it is easy to see a pattern where one might not exist.

So far, we have a lot of evidence from other countries that number of positive tests consistently seems to follow exponential growth in the early phase of spread. It is quite possible that the last several days of Santa Clara County data are simply anomalous, and we see vastly larger numbers of positive tests reported daily.

This would mean we are very much still in exponential growth!

I wanted to take a moment to highlight the work by Logical Cliff, who not only put together a Google sheet for exponential extrapolations – but also is keeping them updated for Washington and New York to boot! If we are still in exponential growth, Cliff’s short term forecasts will be useful for knowing what the coming days will be like.

Check them out here: https://logicalcliff.wordpress.com/2020/03/11/predicting-the-covid-19-increase-using-exponential-extrapolation-3-11/

Naturally, my original blog post is a great place to start to understand the exponential forecasting done there and why it is important if you haven’t seen it: A Detailed Analysis and Simple Model of Santa Clara County COVID-19 Cases

What do you think?

Some really interesting data wouldn’t you say?

I think the stronger evidence supports that the number of tests per day is increasing. If so, the fact that even in spite of that the number of positive tests is still declining is a promising sign.

It could well be that our efforts are paying off, and that we are doing a good job at flattening the curve. If so, whatever we have been doing is working, and we need to double down!

That said, I can’t rule out the possibility entirely that there are deeper issues with testing. If this is true, then it will throw a wrench in our forecasting efforts. Predictive models are unfortunately “garbage in, garbage out” most of the time, and are only as high quality as the data we give them.

At the very least I have presented two sources of information that can help us understand to what extent the original exponential model was reflecting testing growth vs virus spread.

Last, it is quite possible that our early efforts in Santa Clara may have stopped exponential growth temporarily – and the short term data is anomalous! In this case, exponential growth will continue, and we will need to double down on our efforts to test and prevent the spread.

I am super curious what we will see over the next few days. I am not yet quite sure what to make of this data between the three options, but I am trying to look into it more. I will let you know what I find!

If you have an opinion on which option is true, I’d also love to hear which one you think and why. Let me know in the comments!