The New York Times is Verifiably Wrong, Georgia is Doing Great on COVID-19 BJ Campbell Follow Apr 24 · 7 min read

A tale of bad data scrapers and worse sampling bias.

Let me get out in front and say I don’t necessarily blame Nathaniel Lash and Gus Wezerek of the New York Times for making so many errors, because they’re basically doing what everyone else in the country is doing — wondering how the hell the state of Georgia is able to come off of lockdown first. They’re doing this because they don’t live here, and regional prejudices make it fun to make fun of Georgians, and this is an acceptable thing to the rest of the country. They did their best to justify the prevailing narrative with charts and graphs, and they assuredly got paid more for their article than I got paid for this one. But unfortunately for them, and for Georgians, their article is fundamentally wrong on several important points. And peeling those levels apart, we can see that not only is Georgia in great shape to leave lockdown, there may be several other states who are in similarly good shape. Here’s their article:

Let’s move through it.

Testing Rates

They open with an analysis on testing rates.

Georgia has one of the lowest testing rates in the nation Less than 1 percent of Georgians have been tested, compared to almost 4 percent of residents in New York and Louisiana.

This is a true statement, but they forgot to mention why we have low testing rates. We have low testing rates because the CDC does not recommend anyone get tested if they don’t have symptoms. It’s literally in the national guidelines. New York and Louisiana have high testing rates because they have more infected people than we have. So this graph indicates nothing about the nature of the infection in Georgia:

from NYT

This graph actually indicates something completely different, that our “confirmed case count” may not be as reliable as other states confirmed case count is. And fine, we’ll come back to that.

Bad Data

One graph in their article synthesizes their case.

White House guidelines recommend that state officials wait for a “downward trajectory” over 14 days in either the number of new cases or the share of all tests for the virus that come back positive before they lift business restrictions. Georgia fails the first test. The number of new cases that its health department has announced each day has trended up over the past two weeks.

From NYT

Setting completely aside the fact that their trend line is hand drawn and doesn’t match the 7 day average at all, the statement itself is a lie. This graph is attributed to the Georgia Department of Public Health, but that is not true. We can know it’s not true by doing something truly wild and outrageous, such as going to the Georgia Department of Public Health website, and looking at their graphs. This is the exact same graph from the state government of Georgia:

From GA DPH, 4/24/2020

How can these two graphs be so different?

They are made from different data sets. The data set of the New York Times article didn’t come from the Georgia Department of Public Health at all. Instead, it came from the same data repository that their first graph came from, the COVID Tracking Project.

I like the COVID Tracking Project. It’s a fun website, and it’s a great central repository for nerds wanting to run graphs about COVID. Being a self identified nerd who wants to run graphs about COVID, I go there a lot. But in order to understand the origins of the data, what we might call the metadata, we have to pay attention to how they’re getting it.

The Georgia Department of Public Heath does not phone anything in to the COVID Tracking Project. The COVID Tracking Project does nothing more than scrape data regularly from other public websites. That’s all. They run some bot, which goes to the Georgia Department of Public Health a couple times a day and looks for a box on a webpage that looks like this:

..and the bot takes some numbers out of that box, and sticks them in a database file, and makes a note of the day. They get instantaneous test and death data by subtracting today’s number from yesterday’s number.

This is not at all what the Georgia Department of Public Health is doing.

The Georgia Department of Public Health is taking information in from hospitals and other healthcare agencies as it’s reported, and then applying the testing data reported to them backwards to the day that the tests were applicable. So if a hospital in Macon Georgia faxes or emails some report from three days ago indicating X tests and Y deaths, GA DPH applies those to the case and death count from three days ago in their data pool.

This creates two divergences in the relative data pools of these graphs. For one, the “today” numbers in the GA DPH graph are going to look lower than they actually are, because not all agencies in the state have reported in today. But also, some data in the NY Times graph is too late, because that report from Macon three days ago gets added to today’s total for the COVID Tracking Center, and therefore NYT, data set.

Both of those divergences might cancel out a bit and make the NYT data be a better representation of today, but the GA DPH data set is an obviously better depiction of the peak, which they’re calling April 14th. April 14th is 13 days prior to the day Georgia has announced a soft opening of most businesses, under excessive social distancing rules.

But that’s not even a fair estimate, because both graphs are a “seven day running average” of cases. In this calculation, each data point is the arithmetic mean of the prior seven days, which means if the slopes are relatively flat going both directions, the peak is going to lag about three or four days.

The Georgia Department of Health dataset, which is the better dataset, is actually calling the instantaneous peak of confirmed cases April 11th. 16 days before Georgia fully enacts Phase 1 of the reopening plan.

But The Peak Was Even Earlier

The problem with “confirmed cases” data sets, however, is deeper and more insidious. Because of the FDA’s intentional obstructionism and the CDC sending out their first batch of tests literally tainted with COVID-19 germs, the United States is way behind most other first world countries on testing. And that means that the “confirmed cases” aren’t a good indicator of actual cases at all. The climb of confirmed cases early on was more indicative of the ramp up of testing, not the ramp up of the disease.

The only good data we have to work with, unfortunately, are confirmed deaths. Using confirmed deaths, and an assumed Case Fatality Rate (CFR), and an assumed average duration between infection and death, we can back-calculate the actual case rate and the actual peak. Here is the “accurate” graph, from GA DPH:

From GA DPH, 4/24/2020

The true CFR for Covid-19 is still hard to determine, but whatever that is won’t actually matter to determine the peak. All that would matter is the average duration between infection and death. Different sources say different things, generally 14 to 21 days from infection to death. If we pick the low end to be conservative, and throw in the 3 day adjustment to account for the seven day running average, the actual peak of COVID-19 infections in Georgia was very likely to be March 22nd. Over a month ago.

This was the day before the Georgia ban on large gatherings, and eleven days before the Georgia order to shelter in place.

I live in Georgia. On March 22nd, we were doing a lot of very voluntary social distancing already, school districts were voluntarily closing, and many counties had moved towards taking their own measures. This truer projection of the COVID-19 infection rate peak seems to indicate, at least for Georgia, that the things we were doing on our own without State Government mandate at all were enough to control the spread of the disease.

And this also seems to match my own experiences. I know five Georgians who contracted COVID-19, but all of them were in that band in March.

Based on this analysis, it seems very likely that Georgia’s move to Phase 1 reopening of the economy is not only responsible, it is very likely to lead to very positive outcomes, because the restrictions in place for Phase 1 are going to be more significant than the restrictions we were voluntarily doing when we curbed the peak in March, all on our own. We just didn’t know we curbed it then, because the testing in our country is so poor.

And this makes me wonder how many other states should probably open up soon. I bet a lot.

Georgia has a PPE problem, and may have a “robust testing for healthcare workers” problem, but we’re doing a tremendous job of flattening the curve.