Mining coronavirus genomes for clues to the outbreak’s origins

attaaaggtt tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct …

That string of apparent gibberish is anything but: It’s a snippet of a DNA sequence from the viral pathogen, dubbed 2019 novel coronavirus (2019-nCoV), that is overwhelming China and frightening the entire world. Scientists are publicly sharing an ever-growing number of full sequences of the virus from patients—53 at last count in the Global Initiative on Sharing All Influenza Data database. These viral genomes are being intensely studied to try to understand the origin of 2019-nCoV and how it fits on the family tree of related viruses found in bats and other species. They have also given glimpses into what this newly discovered virus physically looks like, how it’s changing, and how it might be stopped.

“One of the biggest takeaway messages [from the viral sequences] is that there was a single introduction into humans and then human-to-human spread,” says Trevor Bedford, a bioinformatics specialist at the University of Washington and Fred Hutchinson Cancer Research Center. The role of Huanan Seafood Wholesale Market in Wuhan, China, in spreading 2019-nCoV remains murky, though such sequencing, combined with sampling the market’s environment for the presence of the virus, is clarifying that it indeed had an important early role in amplifying the outbreak. The viral sequences, most researchers say, also knock down the idea the pathogen came from a virology institute in Wuhan.

In all, 2019-nCoV has nearly 29,000 nucleotides bases that hold the genetic instruction book to produce the virus. Although it’s one of the many viruses whose genes are in the form of RNA, scientists convert the viral genome into DNA, with bases known in shorthand as A, T, C, and G, to make it easier to study. Many analyses of 2019-nCoV’s sequences have already appeared on virological.org, nextstrain.org, preprint servers like bioRxiv, and even in peer-reviewed journals. The sharing of the sequences by Chinese researchers allowed public health labs around the world to develop their own diagnostics for the virus, which now has been found in 18 other countries. (Science's news stories on the outbreak can be found here.)

When the first 2019-nCoV sequence became available, researchers placed it on a family tree of known coronaviruses—which are abundant and infect many species—and found that it was most closely related to relatives found in bats. A team led by Shi Zheng-Li, a coronavirus specialist at the Wuhan Institute of Virology, reported on 23 January on bioRxiv that 2019-nCoV’s sequence was 96.2% similar to a bat virus and had 79.5% similarity to the coronavirus that causes severe acute respiratory syndrome (SARS), a disease whose initial outbreak was also in China more than 15 years ago. But the SARS coronavirus has a similarly close relationship to bat viruses, and sequence data make a powerful case that it jumped into people from a coronavirus in civets that differed from human SARS viruses by as few as 10 nucleotides. That’s one reason why many scientists suspect there’s an “intermediary” host species—or several—between bats and 2019-nCoV.

According to Bedford’s analysis, the bat coronavirus sequence that Shi Zheng-Li’s team highlighted, dubbed RaTG13, differs from 2019-nCoV by nearly 1100 nucleotides. On nextstrain.org, a site he co-founded, Bedford has created coronavirus family trees (example below) that include bat, civet, SARS, and 2019-nCoV sequences. (The trees are interactive—by dragging a computer mouse over them, it’s easy to see the differences and similarities between the sequences.)

Bedford’s analyses of RaTG13 and 2019-nCoV suggest that the two viruses shared a common ancestor 25 to 65 years ago, an estimate he arrived at by combining the difference in nucleotides between the viruses with the presumed rates of mutation in other coronaviruses. So it likely took decades for RaTG13-like viruses to mutate into 2019-nCoV.

Middle East respiratory syndrome (MERS), another human disease caused by a coronavirus, similarly has a link to bat viruses. But studies have built a compelling case it jumped to humans from camels. And the phylogenetic tree from Shi’s bioRxiv paper (below) makes the camel-MERS link easy to see.

The longer a virus circulates in a human populations, the more time it has to develop mutations that differentiate strains in infected people, and given that the 2019-nCoV sequences analyzed to date differ from each other by seven nucleotides at most, this suggests it jumped into humans very recently. But it remains a mystery which animal spread the virus to humans. “There’s a very large gray area between viruses detected in bats and the virus now isolated in humans,” says Vincent Munster, a virologist at the U.S. National Institute of Allergy and Infectious Diseases who studies coronaviruses in bats, camels, and others species.

Strong evidence suggests the marketplace played an early role in spreading 2019-nCoV, but whether it was the origin of the outbreak remains uncertain. Many of the initially confirmed 2019-nCoV cases—27 of the first 41 in one report, 26 of 47 in another—were connected to the Wuhan market, but up to 45%, including the earliest handful, were not. This raises the possibility that the initial jump into people happened elsewhere.

According to Xinhua, the state-run news agency, “environmental sampling” of the Wuhan seafood market has found evidence of 2019-nCoV. Of the 585 samples tested, 33 were positive for 2019-nCoV and all were in the huge market’s western portion, which is where wildlife were sold. “The positive tests from the wet market are hugely important,” says Edward Holmes, an evolutionary biologist at the University of Sydney who collaborated with the first group to publicly release a 2019-nCoV sequence. “Such a high rate of positive tests would strongly imply that animals in the market played a key role in the emergence of the virus.”

Yet there have been no preprints or official scientific reports on the sampling, so it’s not clear which, if any, animals tested positive. “Until you consistently isolate the virus out of a single species, it’s really, really difficult to try and determine what the natural host is,” says Kristian Andersen, an evolutionary biologist at Scripps Research.

One possible explanation for the confusion about where the virus first entered humans is if there was a batch of recently infected animals sold at different marketplaces. Or an infected animal trader could have transmitted the virus to different people at different markets. Or, Bedford suggests, those early cases could have been infected by viruses that didn’t easily transmit and sputtered out. “It would be hugely helpful to have just a sequence or two from the marketplace [environmental sampling] that could illuminate how many zoonoses occurred and when they occurred,” Bedford says.

In the absence of clear conclusions about the outbreak’s origin, theories thrive, and some have been scientifically shaky. A sequence analysis led by Wei Ji of Peking University and published online by the Journal of Medical Virology received substantial press coverage when it suggested that “snake is the most probable wildlife animal reservoir for the 2019‐nCoV.” Sequence specialists, however, pilloried it.

Conspiracy theories also abound. A CBC News report about the Canadian government deporting Chinese scientists who worked in a Winnipeg lab that studies dangerous pathogens was distorted on social media to suggest that they were spies who had smuggled out coronaviruses. The Wuhan Institute of Virology, which is the premier lab in China that studies bat and human coronaviruses, has also come under fire. “Experts debunk fringe theory linking China’s coronavirus to weapons research,” read a headline on a story in The Washington Post that focused on the facility.

Concerns about the institute predate this outbreak. Nature ran a story in 2017 about it building a new biosafety level 4 lab and included molecular biologist Richard Ebright of Rutgers University, Piscataway, expressing concerns about accidental infections, which he noted repeatedly happened with lab workers handling SARS in Beijing. Ebright, who has a long history of raising red flags about studies with dangerous pathogens, also in 2015 criticized an experiment in which modifications were made to a SARS-like virus circulating in Chinese bats to see whether it had the potential to cause disease in humans. Earlier this week, Ebright questioned the accuracy of Bedford’s calculation that there are at least 25 years of evolutionary distance between RaTG13—the virus held in the Wuhan virology institute—and 2019-nCoV, arguing that the mutation rate may have been different as it passed through different hosts before humans. Ebright tells Science Insider that the 2019-nCoV data are “consistent with entry into the human population as either a natural accident or a laboratory accident.”

Shi did not reply to emails from Science , but her longtime collaborator, disease ecologist Peter Daszak of the EcoHealth Alliance, dismissed Ebright’s conjecture. “Every time there’s an emerging disease, a new virus, the same story comes out: This is a spillover or the release of an agent or a bioengineered virus,” Daszak says. “It’s just a shame. It seems humans can’t resist controversy and these myths, yet it’s staring us right in the face. There’s this incredible diversity of viruses in wildlife and we’ve just scratched the surface. Within that diversity, there will be some that can infect people and within that group will be some that cause illness.”

Daszak and Shi’s group have for 8 years been trapping bats in caves around China to sample their feces and blood for viruses. He says they have sampled more than 10,000 bats and 2000 other species. They have found some 500 novel coronaviruses, about 50 of which fall relatively close to the SARS virus on the family tree, including RaTG13—it was fished out of a bat fecal sample they collected in 2013 from a cave in Mojiang in Yunnan province. “We cannot assume that just because this virus from Yunnan has high sequence identity with the new one that that’s the origin,” Daszak says, noting that only a tiny fraction of coronaviruses that infect bats have been discovered. “I expect that once we’ve sampled and sampled and sampled across southern China and central China that we’re going to find many other viruses and some of them will be closer [to 2019-nCoV].”

It’s not just a “curious interest” to figure out what sparked the current outbreak, Daszak says. “If we don't find the origin, it could still be a raging infection at a farm somewhere, and once this outbreak dies, there could be a continued spillover that’s really hard to stop. But the jury is still out on what the real origins of this are.”