When it comes to the novel coronavirus, few data points are as reliable as they might appear. Whether it’s the total number of people infected, the fatality rate, or even which drugs might show therapeutic promise — uncertainty confounds our understanding. Things are seldom as straightforward as we would like them to be.

The latest example of this dynamic came on Tuesday, when the Centers for Disease Control and Prevention released new data about how underlying health conditions can affect individuals’ response to COVID-19. At first glance, the report suggested that people with those health conditions — such as diabetes, chronic lung disease and heart disease — were at a higher risk for severe disease or death from COVID-19. That finding is consistent with more anecdotal studies of COVID-19 patients in both China and Italy.

But as you go deeper into the data, the limitations of the finding become more clear. As of March 28, the CDC had collected data for 7,162 COVID-19 positive patients that included information on whether they had underlying health conditions or other known risk factors. That is just 5.8 percent of the 122,653 total U.S. COVID-19 cases reported to the CDC as of that time. But the size of the data set isn’t the biggest concern. The biggest concern is that we don’t know whether that subset is representative of the population of infected individuals. Without that, we don’t know whether the findings are representative, either.

In that subset of 7,162 cases, 37.6 percent of people had one or more underlying conditions. The CDC looked at three outcomes — hospitalization, admission to an intensive care unit and death — and calculated which percentage in each outcome had comorbidities. (There were also 525 people for whom hospitalization status was unknown.) They found that 71 percent of those hospitalized, and 78 percent of ICU patients, had one or more underlying conditions, compared to only 27 percent of those who weren’t hospitalized. They also found that of the 184 deaths in the sample, 173 (94 percent) had at least one underlying condition.

Those are highly suggestive numbers when taken out of context. And, sure enough, a few headlines did: “More than 70% of Americans hospitalized with COVID-19 had at least 1 underlying health condition,” wrote Business Insider. “Nearly 80% of US intensive-care cases have underlying conditions,” wrote Nature, one of the world’s premier scientific journals.

A more accurate headline might be: “Nearly 80 percent of a likely unrepresentative group of COVID-19 patients who went into intensive care had underlying conditions.”

Not quite the same effect, is it?

The CDC is transparent about the limitations of its data set, including the potential selection bias. Because the sample of patients the CDC looked at wasn’t randomly selected, the 94.2 percent of COVID-19 patients with no underlying health data can’t be assumed to be similar to those for whom that data exists. The fact that this group isn’t representative of the whole set of COVID-19 patients means that we can’t assume that any of the statistics based on the 5.8 percent sample holds true for all patients. In some cases, these 5.8 percent may be more likely to be severe cases, and might be both more likely to have underlying conditions in the first place and more likely to have outcomes such as hospitalization, intensive care or death. But the authors also note that reporting occurred too early to fully capture all of the outcomes — meaning that some patients, both with and without comorbidities, might still experience worsening outcomes, which would also change the conclusions. We won’t know for sure unless we have better data.

We reached out to the CDC researchers to ask why data on underlying conditions and comorbidities were so hard to get, but they did not immediately respond to requests for comment. It appears that at least some of the missing data can be explained by a lack of time. The authors indicate that the “reporting burden associated with rapidly rising case counts and delays in completion of information requiring medical chart review” are most to blame.

The limitations of this data don’t change what we should all know by now: Those with underlying conditions should still be especially careful about contracting the disease. And just because this data set isn’t perfect doesn’t mean it has nothing to say: Especially since it seems in accordance with previous studies, it does suggest people with underlying conditions are at increased risk. It’s just that we still don’t know how much more danger they’re in, despite headlines suggesting otherwise.