Ever since the outbreak of COVID-19 (the disease caused by the SARS-CoV-2 coronavirus), scientists have been scrambling to identify the species of origin to understand how the new coronavirus first leapt from its animal hosts to humans, causing the current pandemic infecting more than a million people worldwide.

Scientists have been looking for an intermediate animal host between bats, which are known to harbor many coronaviruses, and the first introduction of SARS-CoV-2 into humans.

Many animals, beginning with snakes and most recently, pangolins, have all been put forth as the likely intermediate, but the viruses isolated from them are too divergent from SARS-CoV-2, suggesting a common ancestor too far back in time — living in the 1960s[1].

Now, University of Ottawa biology professor Xuhua Xia, tracing coronavirus signatures across different species, has proposed that stray dogs — specifically dog intestines — may have been the origin of the current SARS-CoV-2 pandemic.

“Our observations have allowed the formation of a new hypothesis for the origin and initial transmission of SARS-CoV-2,” said Xia. “The ancestor of SARS-CoV-2 and its nearest relative, a bat coronavirus, infected the intestine of canids, most likely resulting in a rapid evolution of the virus in canids and its jump into humans. This suggests the importance of monitoring SARS-like coronaviruses in feral dogs in the fight against SARS-CoV-2.”

The findings appear in the advanced access online edition of the journal Molecular Biology and Evolution.

Xia has long-studied the molecular signatures of viruses in different hosts. When viruses invade a host, their genomes often bear the battle scars from fighting off and evading the host’s immune system through changes and adaptations found within their genomes.

Humans and mammals have a key antiviral sentinel protein, called ZAP, which can stop a virus in its tracks by preventing its multiplication in the host and degrading its genome. The viral target is a pair of chemical letters, called CpG dinucleotides, within its RNA genome. CpG dinucleotides act as a signpost that a person’s immune system uses to seek and destroy a virus. ZAP patrols human lungs, and is made in large amounts in the bone marrow and lymph nodes, where the immune system first primes its attack.

But it’s been shown that viruses can punch back. Single-stranded coronaviruses, like SARS-CoV, can avoid ZAP by reducing these CpG signposts, thus rendering ZAP powerless. A similar examination of HIV, another RNA virus, shows that it has also exploited this evolutionary trick to lose CpG in response to human antiviral defenses. One implication of this is that the remaining CpG dinucleotides on the viral genome are likely functionally important for the virus and could serve as target of modification to attenuate virulence in vaccine development.

“Think of a decreased amount of CpG in a viral pathogen as an increased threat to public health, while an increased amount of CpG decreases the threat of such viral pathogens,” said Xia. “A virus with an increased amount of CpG would be better targeted by the host immune system, and result in reduced virulence, which would be akin to natural vaccines.”

To perform the study, Xia examined all 1252 full-length betacoronavirus genomes deposited into GenBank to date. Xia found that SARS-CoV-2 and its most closely related known relative, a bat coronavirus (BatCoV RaTG13), have the lowest amount of CpG among its close coronavirus relatives.

“The most striking pattern is an isolated but dramatic downward shift in viral genomic CpG in the lineage leading to BatCoV RaTG13 which was reported to be sampled from a bat (Rhinolophus affinis) in Yunnan Province in 2013 but only sequenced by Wuhan Institute of Virology after the outbreak of SARS-CoV-2 infection in late 2019,” said Xia. “This bat CoV genome is the closest phylogenetic relative of SARS-CoV-2, sharing 96% sequence similarity.”

“In this context, it is unfortunate that BatCoV RaTG13 was not sequenced in 2013, otherwise the downshifting in CpG might have served as a warning due to two highly significant implications,” said Xia. “First, the virus likely evolved in a tissue with high ZAP expression which favors viral genomes with a low CpG. Second and more importantly, survival of the virus indicates that it has successfully evaded ZAP-mediated antiviral defense. In other words, the virus has become stealthy and dangerous to humans.”

Xia applied his CpG tool to reexamining the camel origin of MERS, and found those viruses infecting camel digestive system also had lower genomic CpG than those infecting camel respiratory system.

When he examined the data in dogs, he found that only genomes from canine coronaviruses (CCoVs), which had caused a highly contagious intestinal disease worldwide in dogs, have genomic CpG values similar to those observed in SARS-CoV-2 and BatCoV RaTG13. Second, canids, like camels, also have coronaviruses infecting their digestive system with CpG lower than those infecting their respiratory system (canine respiratory coronavirus or CRCoV belonging to BetaCoV).

In addition, the known cellular receptor for SARS-CoV-2 entry into the cell is ACE2 (angiotensin I converting enzyme 2). ACE2 is made in the human digestive system, at the highest levels in the small intestine and duodenum, with relatively low expression in the lung. This suggests that mammalian digestive systems are likely to be a key target infected by coronaviruses.

“This is consistent with the interpretation that the low CpG in SARS-CoV-2 was acquired by the ancestor of SARS-CoV-2 evolving in mammalian digestive systems and interpretation is further corroborated by a recent report that a high proportion of COVID-19 patients also suffer from digestive discomfort,” said Xia. “In fact, 48.5% presented with digestive symptoms as their chief complaint.”

Humans are the only other host species Xia observed to produce coronavirus genomes with low genomic CpG values. In a comprehensive study of the first 12 COVID-19 patients in the U.S., one patient reported diarrhea as the initial symptom before developing fever and cough, and stool samples from 7 out of 10 patients tested positive for SARS-CoV-2, including 3 patients with diarrhea.

Canids are often observed to lick their anal and genital regions, not only during mating but also in other circumstances. Such behavior would facilitate viral transmission from the digestive system to the respiratory system and the interchange between a gastrointestinal pathogen and a respiratory tract and lung pathogen.

“In this context, it is significant that the bat coronavirus (BatCoV RaTG13), as documented in its genomic sequence in GenBank (MN996532), was isolated from a fecal swab. These observations are consistent with the hypothesis that SARS-CoV-2 has evolved in mammalian intestine or tissues associated with intestine.”

Another finding of Xia’s study involves viruses recently isolated from pangolins. Nine SARS-CoV-2-like genomes have recently been isolated and sequenced from pangolin and deposited in GISAID database (gisaid.org). “The one with the highest sequence coverage (GISAID ID: EPI_ISL_410721) has an ICpG value of 0.3929, close to the extreme low end of the CpG values observed among available SARS-CoV-2 genomes. Thus, SARS-CoV-2, BatCoV RaTG13 and those from pangolin may either have a common ancestor with a low CpG or have convergently evolved low CpG values.”

Based on his results, Xia presents a scenario in which the coronavirus first spread from bats to stray dogs eating bat meat. Next, the presumably strong selection against CpG in the viral RNA genome in canid intestines resulted in rapid evolution of the virus leading to reduced genomic CpG. Finally, the reduced viral genomic CpG allowed the virus to evade human ZAP-mediated immune response and became a severe human pathogen.

“While the specific origins of SARS-CoV-2 are of vital interest in the current world health crisis, this study more broadly suggests that important evidence of viral evolution can be revealed by consideration of the interaction of host defenses with viral genomes, including selective pressure exerted by host tissues on viral genome composition,” said Xia.

Reference: “Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense” by Xuhua Xia, 14 April 2020, Molecular Biology and Evolution.

DOI: 10.1093/molbev/msaa094

Footnotes

[1] Here is Dr. Xia’s explanation on why SARS-CoV-2 couldn’t have jumped directly from bats or pangolins into humans.

If we contrast early SARS-CoV-2 collected from December 24, 2019 to January 5, and late ones collected in March 1-13, 2020, with an average of 66.5844 days between the early and late groups, and use synonymous substitution rate as an approximation to substitution rate, then the substitution rate is 0.0278 substitutions per genome per day. The average distance between bat CoV RaTG13 and SARS-CoV-2 is 0.0365, or 1073.8158 substitutions per genome (for aligned length of 29409 sites). The time to the common ancestor of SARS-CoV-2 and bat’s RaTG13 is 19296.2808 days (= 1073.8158/2/0.0278) or about 53 years. So their common ancestor lived around 1966. The same method would date the common ancestor of SARS-CoV-2 and pangolin/Guangdong/1 back to around 1882.