The lack of data is not necessary. It is a matter of prioritizing data collection, being willing to share data, and then doing the right kind of analytical modelling.

Only a week ago, President Trump announced a ban on European flights to reduce the spread of coronavirus. In the seven days that followed, the media have been reporting garden-variety statistics and decision-making based on infographics rather than reliable and nationally representative data. It is time for accurate, unbiased data to redo the math.

So, what data has been reported in the past seven days and what do we know? Here’s a short overview:

Number of reported/confirmed COVID-19 cases. Per city, state, country, region, and worldwide.

Reported deaths from COVID-19. Per city, state, country, region, and worldwide.

Number of countries reporting cases.

Data from China on infection rates and deaths.

Images of overwhelmed hospitals in northern Italy.

The defining graphic of the COVID-19 pandemic, “Flatten the Curve.”

The first three bullets can be summarized by the words “reported/confirmed.” In the last seven days, we’ve seen a snowball of news messages pop up reporting things like “200 new cases,” “total 1,900 cases,” and “two more die of coronavirus.”

We also see popups like “State x could have 16 more infections than officially reported,” “Coronavirus death rate still uncertain as mild cases go unreported,” “Limited data on coronavirus may be skewing assumptions,” and “Country x may be underreporting coronavirus numbers.”

Yet how does the number of cases compare to current influenza cases, for example? We don’t know. Are numbers adjusted for population to make comparisons easier and meaningful? The media simply reports new cases without context and warns the public to stay home to stop the spread of the virus.

Our Current Narrative Lacks Crucial Data

We do have data, to some extent, from China and Italy. Yet there are many questions about the accuracy of the data from China, particularly given the Chinese government’s initial denial of the illness. Yet data from China has been provided and was swiftly reported in academic journals.

Papers that normally take months to be carefully vetted, accepted, and published are now being published with remarkable speed. Less than a year ago, BMC Public Health published an article concluding that because there are several methods used to calculate the key measures, comparisons between studies and countries is difficult. Yet statisticians are still doing the precise comparisons warned against because in the COVID-19 era, everything is permitted because the baseline understanding of the disease is so limited.

In Italy, it appears that the crisis (for now) is localized in the Northern Italy, mostly in the Lombardy region. Why? Why aren’t patients being transferred to other regions? Are other regions in Italy also in crisis?

There have been explanations as for why Lombardy has been hit harder than other regions, but these have not been the focus of reporting. In Italy, the media reports heartrending stories about providers forced to condemn patients to die because of a lack of intensive care beds, but little else beyond raw numbers of cases and deaths.

It’s this image of people dying that drives decisions. Northern Italy’s “hospital meltdown” is shocking and upsetting, but should not necessarily be alarming for the rest of the world. Northern Italy has been hit hard, but the rest of Italy is largely waiting at home like the rest of us.

The Idea of ‘Flatten The Curve’

The curve that went viral last week is based on an idea, not data. The idea is simple and intuitive: “Washing your hands or staying home if you’re sick can slow down new cases of illness, so the finite resources of our health-care system can handle a more steady flow of sick patients rather than a sudden deluge.” So we do it.

Then suddenly we find ourselves at home following news reports about “stocks tumbled nearly 13 percent on Monday,” the “downturn now set to be deeper than the financial crisis,” and “This Is How the Coronavirus Will Destroy the Economy.” Simultaneously, the media sends videos of “overwhelmed hospitals in Italy” into our family rooms, creating a multiplier effect for the “Stay home—it could save lives” movement, which leaves us thinking we are doing the right thing.

Simulation Modeling

The New York Times reports that in the United States and United Kingdom, national quarantine decisions were strongly influenced by a report from Imperial College that simulated the possible courses of the pandemic and the impact on each country’s health-care system. The simulation was done based using the best available data—from Italy and China—but it was necessary to make many assumptions because there are so many unknowns.

We counted approximately 20 such assumptions that have yet to be proven.The key unknown factors were:

How infectious the disease will be in the United States and United Kingdom.

The length of the incubation time.

Whether individuals are immune to re-infection in the short term.

The number of people who will require critical care.

The proportion of people in critical care who will die of the disease.

This modelling exercise assumes the rates from China and Italy a) reported accurately and b) applicable to the United States. But it is unknown how country specific factors like population density, use of public transportation, waste treatment, smoking rates, population age distribution, and others affect the applicability of the data to the United States.

Even if reported accurately, the rates from Italy and China will not precisely reflect what will happen in the United States. But are they close enough that our current decisions based on that data are correct?

That is an empirical question that can and should be quickly answered. Indeed, recent evidence from Germany suggests key model inputs may not be applicable in Germany, as Germany has lower death rate than other countries. Does the virus behave differently in Germany? No, more likely the denominator of the death rates is different since Germany has been testing more aggressively.

Huge Need for More Accurate Data

So, what data do we need in the next seven days? If the past seven and more days were the “do the math” era, the next seven and more days should be about “redo the math.” Is it possible? Yes, it is. Epidemiology, statistics, health economics are all based on data. With the right data we can make better decisions.

Better data would give us:

Number of actual COVID-19 cases, per city, state, country, region, and worldwide.

Actual deaths from COVID-19, per city, state, country, region, and worldwide.

Curves and graphics based on actual data—how much does the curve need to be flattened to avoid Northern Italy’s fate?

This does not require massive country-wide testing, as Germany did. In the United States we could either quickly draw new samples from across the nation or use one of the many large representative samples of the population covered in ongoing surveys like the Current Population Survey (CPS) or the Medical Expenditure Panel Survey.

It should be possible to use these standing panels and do COVID-19 testing with representative samples of the population. Researchers can stratify or oversample in metropolitan areas or specific region if needed.

The Centers for Disease Control and Prevention could do tests with representative samples and repeat it every three days in the coming weeks. The relatively small number of tests diverted to gather data wouldn’t meaningfully detract from ongoing treatment. Given the trillion-dollar relief package proposed in Congress, it is reasonable to spend some tax dollars on testing and gather valuable data.

We Need to Redo the Math

With better data, it will be easy to redo the math accurately. We could draw curves based on actual data, calculate incidence rates on actual data, and most of all it would be data drawn from a representative sample of the U.S. population.

Therefore, we could better analyze risk factors and individual characteristics based on people who live in our country and use our health-care system. If we are smart enough to use the same methodology in different countries and compare notes across the world, we may actually get to a place where better decision-making is possible, and quickly. European countries have equivalents of the Centers for Disease Control, and most of them have ongoing surveys with nationally representative samples.

The lack of data is not necessary. It is a matter of prioritizing data collection, being willing to share data, and then doing the right kind of analytical modelling. Finally, we need the media to report based on data rather than the narrative.

“Without data you’re just another person with an opinion.” These are the words of W. Edwards Deming, who helped develop the sampling techniques still used by the U.S. Department of the Census and the Bureau of Labor Statistics.

His words emphasize the kind of damage that societal naiveté and opinions can do. It is time for statisticians, health economists, data scientists, epidemiologists, and other scientists to unite and say “It’s time for better data so we can redo the math.”