Over the years, Rosenfeld’s team has fine-tuned each of its methods to predict the trajectory of the flu with near-perfect accuracy. At the end of each flu season, the CDC always retroactively updates final numbers, giving the CMU lab a chance to see how their projections stack up. The researchers are now adapting all the techniques for Covid-19, but each will pose distinct challenges.

For the machine-learning- based nowcast, many of the data sources will be the same, but the prediction model will be different. The algorithms will need to learn new correlations between the signals in the data and the ground truth. One reason: there’s far greater panic around coronavirus, which causes a completely different pattern of online activity. People will look for coronavirus-related information at much higher rates, even if they feel fine, making it more difficult to tell who may already have symptoms.

In a pandemic situation, there is also very little historical data, which will affect both forecasts. The flu happens on a highly regular cycle each year, while pandemics are erratic and rare. The last pandemic—H1N1 in 2009—also had very different characteristics, primarily affecting younger rather than elderly populations. The Covid-19 outbreak has been precisely the opposite, with older patients facing the highest risk. On top of that, the surveillance systems for tracking cases weren’t fully developed back then.

“That’s the part that I think is going to be the most challenging,” says Rosenfeld, “because machine-learning systems, in their nature, learn from examples.” He’s hopeful that the crowdsourcing method may be more resilient. On the one hand, little is known about how it will fare in pandemic forecasting. “On the other hand, people are actually quite good at adjusting to novel circumstances,” he says.

Rosenfeld’s team is now actively working on ways to make these predictions as good as possible. Flu-testing labs are already beginning to transition to Covid-19 testing and reporting results to the CDC. The CMU lab is also reaching out to other organizations to get as much rich and accurate data as possible—things like anonymized, aggregated statistics from electronic health records and purchasing patterns for anti-fever medication—to find sharper signals to train its algorithms.

To compensate for the lack of historical data from previous pandemics, the team is relying on older data from the current pandemic. It’s looking to incorporate data from countries that were hit earlier and will update its machine-learning models as more accurate data is retroactively posted. At the end of every week, the lab will get a report from the CDC with the most up-to-date trajectory of cases in the US, including revisions on numbers from previous weeks. The lab will then revise its models to close the gaps between the original predictions and the rolling statistics.

Rosenfeld worries about the limitations of these forecasts. There is far more uncertainty than what he’s usually comfortable with: for every prediction the lab provides to the CDC, it will include a range of possibilities. “We're not going to tell you what's going to happen,” he says. “What we tell you is what are the things that can happen and how likely is each one of them.”

Even after the pandemic is over, the uncertainty won’t go away. “It will be very difficult to tell how good our methods are,” he says. “You could be accurate for the wrong reasons. You could be inaccurate for the wrong reasons. Because you have only one season to test it on, you can’t really draw any strong, robust conclusions about your methodology.”

But in spite of all these challenges, Rosenfeld believes the work will be worthwhile in informing the CDC and improving the agency’s preparation. “I can do the best I can now,” he says. “It’s better than not having anything.”