In late 2018, a paper in the journal Nature Microbiology proposed a hypothetical scenario: A young man comes into a Miami hospital with flu-like symptoms and dies of a mysterious pneumonia within days. As more people, including medical workers, fall ill, scientists sequence the virus’ genomes and trace the outbreak to a strain of coronavirus in wild ducks near Palm Beach. A combination of traditional and genomic epidemiological tools reveals that the virus is transmitted through the air and via individual “superspreaders.” In the end, the disease peaks at around 2,000 cases in a handful of countries, thanks in no small part to what scientists learned when they looked inside the virus’ genome.

The story sounds eerily familiar, even if it doesn’t quite parallel the ongoing fight against the COVID-19 pandemic. Within weeks of first diagnosing a mysterious pneumonia in the city of Wuhan, Chinese researchers had already sequenced the virus’ genes and released it to the public. This was supposed to be the first step in containing the disease: Tracking the virus’ genome from the moment it began to spread could tell us how fast it was circulating, and possibly how to stop it. Virologists across the world began to explore the data and mapped the coronavirus’ spread in near real time.

But this is where the hypothetical scenario falls apart. The novel coronavirus is more infectious, and harder to detect, than the one described in the 2018 paper, said Nathan Grubaugh, its lead author. And a sluggish response compounded the issue. Even the most revolutionary science needs data as fuel, and in the United States, that didn’t come until too late. The relative lack of early testing in the U.S. meant that epidemiologists and local governments didn’t know how many cases there were, or where new ones might be coming from. Genomic tools might have helped public health officials contain a small, well-monitored outbreak. Instead, it’s highlighted how far testing has lagged behind the disease.

“We assumed preparedness,” said Grubaugh, an epidemiologist at Yale’s School of Public Health. “Which is, looking back on it, incredibly naive.”

The first sign that viral genomics might transform epidemiology came in the wake of the 2013-16 Ebola epidemic in West Africa. For the first time, on-the-ground, same-day genetic sequencing allowed epidemiologists to examine the genome of the virus as it spread.

In one data visualization, a series of black lines curled out across a map of Guinea, Sierra Leone and Liberia, the hardest-hit countries. The black lines jumped from city to city, then country to country, tracking the transmission of the deadly fever, like something out of a movie about a pandemic. Next to the map, though, was something you don’t usually see: a family tree, showing the Ebola genome as it mutated from host to host.

Biostatisticians had used viral genomes to piece together the family tree of the Ebola outbreak, and then used that family tree to map the Ebola virus as it spread. The detail and scope of the map was unprecedented: researchers had examined the genomes of 5 percent of the nearly 30,000 known cases.

“The West African Ebola outbreak was a watershed moment,” said Duncan MacCannell, the chief science officer for the Advanced Molecular Detection program, an initiative at the Centers for Disease Control and Prevention that brings together advanced technological research and responses to infectious disease. Papers at the time argued that genome-driven technologies would revolutionize global responses to outbreaks.

As one review in Nature noted, on-the-ground sequencing and genomic analysis rolled out several months into the Ebola epidemic. Though these methods changed the public health response in real time, by showing that sexual contact and survivors were spreading the disease, their full potential was unclear. “Had [such tools] been deployed earlier, we can only speculate as to their potential impact,” researchers wrote.

In the past, the “detective work” of outbreak response was a purely “shoe-leather” affair, said Verity Hill, a doctoral student at the University of Edinburgh who produced an early model of the novel coronavirus’ spread in China. In that traditional, field-based approach, researchers interview infected people and map contact networks to track transmissions. Now, scientists can see those kinds of connections in a virus’ genetic code.

As a virus jumps from host to host, it mutates slightly, a handful of nucleotides at a time. Future infections will inherit some of those mutations. The mutations rarely change the pathogen’s virulence, but they give researchers a way to place the virus on a family tree, just as a biologist builds a tree of closely related animals.

An epidemic’s family tree then lets researchers sketch a portrait of how the virus spreads from person to person. Your virus’ genome is likely to be very similar to that of whoever infected you. And if it’s more similar to an infection from Washington state than from China, you have a clue as to how it may have gotten to you in the first place.

In a fast-spreading outbreak, these findings can help uncover new paths of transmission. Public health officials can use viral genomics to figure out how new cases are arriving, and can then screen travelers and isolate suspected cases before they spread locally. And unlike contact tracing, the genomic picture of the outbreak can connect the dots between patients who don’t know they’ve interacted, and it doesn’t rely on a patient’s memory. But that’s most useful when there are few local cases of the disease, said Gytis Dudas, an evolutionary biologist who created the Ebola visualization.

There are limitations, though. Researchers still need shoe-leather epidemiology to know, for example, where patients have traveled and when they fell ill, which adds clarity to potentially murky genomic data. The richest insights come from the pairing of traditional and genomic research.

In the years between the Ebola and COVID-19 outbreaks, scientists built the infrastructure necessary to harness those genomic tools. Today, there are open-access platforms for sharing genome sequences and open-source tools for analyzing that data. Genomics have become integral in responding to and tracking food-borne illnesses and the flu.

So by the time the novel coronavirus emerged in late 2019, the global health community was ready to dig into its genes.

At least one researcher was disturbed by what he saw. On Jan. 31, Trevor Bedford, a computational biologist at Seattle’s Fred Hutchinson Cancer Research Center, wrote a blog post in which he described many closely related strains of the virus coming out of China and southeast Asia in mid-January— evidence that the virus was spreading quickly between patients without much time to mutate. “[The extent of that spread] couldn’t have been uncovered without immediate testing of hundreds of thousands of people soon after detection,“ said Dudas.

At the time, the World Health Organization had yet to definitively state that widespread person-to-person transmission was driving the epidemic, telling STAT News that “scattered” transmission might still account for most cases. “I spent the week of Jan. 20 alerting every public health official I know,” Bedford wrote.

Although it’s now obvious that the novel coronavirus spreads quickly between humans, that information was critical for guiding an early response. If the disease was transmitted repeatedly by animals — as in the case of the mosquito-borne Zika or MERS, another deadly illness linked to a coronavirus — the answer would have been to limit contact with the host animals. Otherwise, more dramatic quarantine measures would be needed.

According to Dudas, governments around the world had all the signals they needed from early analyses of viral family trees. “The moment there was undetected transmission [in east Asia], it was a clear sign that bringing [the virus] under control is guaranteed to be difficult” he said. But governments, including the United States, waited too long to act. Rather than stockpile testing equipment and screening travelers at airports as the portrait painted by genomics grew more dire, they sat “hypnotized.”

“The first step in responding to an outbreak like this is really understanding where your cases are, and how many you have,” MacCannell said. In other words, the first step is to run lots of tests. In the first weeks of the American outbreak, the CDC was able to sequence every virus it received within two days — but it received fewer than 100 tests in the first place.

He added later that the virus is particularly difficult to track from its genome, as it mutates more slowly than other diseases. And because it spreads so quickly, even a few days of lag time in testing and sequencing can be too slow to contain an outbreak.

By the time genomic epidemiologists had access to a large number of American cases, the virus was already quietly spreading across the country. In late February, Bedford’s lab analyzed a COVID-19 case found in the Seattle suburbs. Their findings suggested that the virus had been in the Seattle area for over a month, and that between 80 and 1,500 people were infected without knowing it.

As testing has become more widespread and the CDC rolls out a national sequencing partnership, studies have begun to show missed opportunities to contain the virus. Research released this month found that the virus arrived in New York City from Europe weeks before anyone in the city tested positive — and close to a month before travel restrictions were placed on European flights.

Meanwhile, Grubaugh’s lab has a paper in progress that suggests seven of nine early cases in southern Connecticut were closely related to domestic outbreaks, mostly in Washington state, while one case was linked to a recent import from China and another from Europe. (The team also tested the hypothesis using air travel data, which showed that more passengers were coming to the region from domestic hotspots than abroad.) “We need to be more worried about what’s happening within the United States,” he said. “This is a domestic problem.”

Eventually, these findings will help bring future waves of the coronavirus under control, MacCannell said. “The one thing that I think is going to be different about this epidemic is the amount of rich genomic data that’s going to be out there in the public domain. Understanding where this virus is going, and where it’s been — we’re going to have an incredibly rich tapestry.”

And that picture can change how officials mount a response. “In areas where shelter in place has been implemented, genomic data can help us understand where new cases are coming from,” MacCannell said. “As the wave ‘passes’ and the pandemic decelerates, genomic data can also become vitally important in stamping out flare-ups as they occur.” It will also reveal which state quarantines and stay-at-home orders were effective and which came too late.

But responding to the data is also a political problem. “To anyone who follows this epidemic closely, this should come as zero surprise,” said Grubaugh. “It should allow people within the task force to be able to convince the president and the vice president that we need to be changing how we address this. We shouldn’t be calling it a foreign virus. But whether or not that happens is … I mean, I think I’d be pretty naive to think that all of a sudden that would start changing.”