An outbreak of serious pneumonia disease was reported in Wuhan, China, on 30 December 2019. The causative agent was soon identified as a novel coronavirus1, which was later named SARS-CoV-2. Case numbers grew rapidly from 27 in December 2019 to 3,090,445 globally as of 30 April 20203, leading to the declaration of a public health emergency, and later a pandemic, by the WHO (World Health Organization). Many of the early cases were linked to the Huanan seafood market in Wuhan city, Hubei province, from where the probable zoonotic source is speculated to originate2. Currently, only environmental samples taken from the market have been reported to be positive for SARS-CoV-2 by the Chinese Center for Disease Control and Prevention4. However, as similar wet markets were implicated in the SARS outbreak of 2002–20035, it seems likely that wild animals were also involved in the emergence of SARS-CoV-2. Indeed, a number of mammalian species were available for purchase in the Huanan seafood market before the outbreak4. Unfortunately, because the market was cleared soon after the outbreak began, determining the source virus in the animal population from the market is challenging. Although a coronavirus that is closely related to SARS-CoV-2, which was sampled from a Rhinolophus affinis bat in Yunnan in 2013, has now been identified6, similar viruses have not yet been detected in other wildlife species. Here we identified SARS-CoV-2-related viruses in pangolins smuggled into southern China.

We investigated the virome composition of pangolins (mammalian order Pholidota). These animals are of growing importance and interest because they are one of the most illegally trafficked mammal species: they are used as a food source and their scales are used in traditional Chinese medicine. A number of pangolin species are now regarded as critically endangered on the International Union for Conservation of Nature Red List of Threatened Species. We received frozen tissue samples (lungs, intestine and blood) collected from 18 Malayan pangolins (Manis javanica) during August 2017–January 2018. These pangolins were obtained during anti-smuggling operations performed by Guangxi Customs officers. Notably, high-throughput sequencing of the RNA of these samples revealed the presence of coronaviruses in 6 out of 43 samples (2 lung samples, 2 intestinal samples, 1 lung–intestine mixed sample and 1 blood sample from 5 individual pangolins; Extended Data Table 1). With the sequence read data, and by filling gaps with amplicon sequencing, we were able to obtain six complete or near complete genome sequences—denoted GX/P1E, GX/P2V, GX/P3B, GX/P4L, GX/P5E and GX/P5L—that fall into the SARS-CoV-2 lineage (within the genus Betacoronavirus of the Coronaviridae) in a phylogenetic analysis (Fig. 1b). The genome sequence of the virus isolate (GX/P2V) has a very high similarity (99.83–99.92%) to the five sequences that were obtained through the metagenomic sequencing of the raw samples, and all samples have similar genomic organizations to SARS-CoV-2, with eleven predicted open-reading frames (ORFs) (Fig. 1a and Extended Data Table 2; two ORFs overlap). We were also able to successfully isolate the virus using the Vero E6 cell line (Extended Data Fig. 1). On the basis of these genome sequences, we designed primers for quantitative PCR (qPCR) detection to confirm that the raw samples were positive for coronavirus. We conducted further qPCR testing on another batch of archived pangolin samples collected between May and July 2018. Among the 19 samples (9 intestine tissues, 10 lung tissues) tested from 12 animals, 3 lung tissue samples from 3 individual pangolins were positive for coronavirus.

Fig. 1: Evolutionary relationships among sequences of human SARS-CoV-2, pangolin coronaviruses and the other reference coronaviruses. a, Genome organization of coronaviruses including the pangolin coronaviruses obtained in this study, with the predicted ORFs shown in different colours (ORF1a is omitted for clarity). The pangolin coronavirus strain GX/P2V is shown with its sequence length. For comparison, the human sequences NC_045512.2 and NC_004718.3, and bat sequences MG772933.1, GQ153541.1 and KC881006.1 are included (see Extended Data Table 6 for sources). b, Phylogeny of the subgenus Sarbecovirus (genus Betacoronavirus; n = 53) estimated from the concatenated ORF1ab, S, E, M and N genes. Red circles indicate the pangolin coronavirus sequences generated in this study (Extended Data Table 1). GD/P1L is the consensus sequence re-assembled from previously published raw data7. Phylogenies were estimated using a maximum likelihood approach that used the GTRGAMMA nucleotide substitution model and 1,000 bootstrap replicates. Scientific names of the bat hosts are indicated at the end of the sequence names, and abbreviated as follows: C. plicata, Chaerephon plicata; R. affinis, Rhinolophus affinis; R. blasii, Rhinolophus blasii; R. ferrumequinum, Rhinolophus ferrumequinum; R. monoceros, Rhinolophus monoceros; R. macrotis, Rhinolophus macrotis; R. pearsoni, Rhinolophus pearsoni; R. pusillus, Rhinolophus pusillus; R. sinicus, Rhinolophus sinicus. Palm civet (P. larvata, Paguma larvata; species unspecified for Civet007 and PC4-13 sequences) and human (H. sapiens, Homo sapiens) sequences are also shown. Full size image

In addition to the animals from Guangxi, after the start of the SARS-CoV-2 outbreak researchers of the Guangzhou Customs Technology Center re-examined five archived pangolin samples (two skin swabs, two unknown tissue samples and one scale) obtained in anti-smuggling operations performed in March 2019. Following high-throughput sequencing, the scale sample was found to contain coronavirus reads, and from these data we assembled a partial genome sequence of 21,505 bp (denoted as GD/P2S), representing approximately 72% of the SARS-CoV-2 genome. Notably, this virus sequence, obtained from a pangolin scale sample, may in fact be derived from contaminants of other infected tissues. Another study of diseased pangolins in Guangdong performed in 2019 also identified viral contigs from lung samples that were similarly related to SARS-CoV-27. Different assembly methods and manual curation were performed to generate a partial genome sequence that comprised 86.3% of the full-length virus genome (denoted as GD/P1L in the phylogeny shown in Fig. 1b).

These pangolin coronavirus genomes have 85.5% to 92.4% sequence similarity to SARS-CoV-2, and represent two sub-lineages of SARS-CoV-2-related viruses in the phylogenetic tree, one of which (comprising GD/P1L and GD/P2S) is very closely related to SARS-CoV-2 (Fig. 1b). It has previously been noted that members of the subgenus Sarbecovirus have experienced widespread recombination8. In support of this, a recombination analysis (Fig. 2) revealed that bat coronaviruses ZC45 and ZXC21 are probably recombinants, containing genome fragments derived from multiple SARS-CoV-related lineages (genome regions 2, 5 and 7) as well as SARS-CoV-2-related lineages, including segments from pangolin coronaviruses (regions 1, 3, 4, 6 and 8).

Fig. 2: Recombination analysis. a, Sliding window analysis of changing patterns of sequence similarity between human SARS-CoV-2, pangolin and bat coronaviruses. The potential recombination breakpoints are shown in pink dash lines, and regions separated by the breakpoints are alternatively shaded in yellow. These potential breakpoints subdivide the genomes into eight regions (regions with fewer than 200 bp were omitted), indicated by the red bars at the bottom of the analysis boxes. The names of the query sequences are shown vertically to the right of the analysis boxes. The similarities to different reference sequences are indicated by different colours. Guangdong pangolin-CoV GD/P1L and pangolin-CoV GD/P2S were merged for this analysis. The blue arrows at the top indicate the position of the ORFs in the alignment. b, Phylogenetic trees of different genomic regions. SARS-CoV- and SARS-CoV-2-related lineages are shown in blue and red tree branches, respectively. Branch supports obtained from 1,000 bootstrap replicates are shown. Branch scale bars are shown as 0.1 substitutions per site. Full size image

More notable, however, was the observation of putative recombination signals between the pangolin coronaviruses, bat coronavirus RaTG13 and human SARS-CoV-2 (Fig. 2). In particular, SARS-CoV-2 exhibits very high sequence similarity to the Guangdong pangolin coronaviruses in the receptor-binding domain (RBD) (97.4% amino acid similarity, indicated by red arrow in Fig. 2a; the alignment is shown in Fig. 3a), even though it is most closely related to bat coronavirus RaTG13 in the remainder of the viral genome. Indeed, the Guangdong pangolin coronaviruses and SARS-CoV-2 possess identical amino acids at the five critical residues of the RBD, whereas RaTG13 only shares one amino acid with SARS-CoV-2 (residue 442, according to numbering of the human SARS-CoV9) and these latter two viruses have only 89.2% amino acid similarity in the RBD. Notably, a phylogenetic analysis of synonymous sites only from the RBD revealed that the topological position of the Guangdong pangolin is consistent with that of the remainder of the viral genome, rather than being the closest relative of SARS-CoV-2 (Fig. 3b). Therefore, it is possible that the amino acid similarity between the RBD of the Guangdong pangolin coronaviruses and SARS-CoV-2 is due to selectively mediated convergent evolution rather than recombination, although it is difficult to differentiate between these scenarios on the basis of the current data. This observation is consistent with the fact that the sequence similarity of ACE2 is higher between humans and pangolins (84.8%) than between humans and bats (80.8–81.4% for Rhinolophus sp.) (Extended Data Table 3). The occurrence of recombination and/or convergent evolution further highlights the role that intermediate animal hosts have in the emergence of viruses that can infect humans. However, all of the pangolin coronaviruses identified to date lack the insertion of a polybasic (furin-like) S1/S2 cleavage site in the spike protein that distinguishes human SARS-CoV-2 from related betacoronaviruses (including RaTG13)10 and that may have helped to facilitate the emergence and rapid spread of SARS-CoV-2 through human populations.

Fig. 3: Analysis of the RBD sequence. a, Sequence alignment showing the RBD in human, pangolin and bat coronaviruses. The five critical residues for binding between SARS-CoV RBD and human ACE2 protein are indicated in red boxes, and ACE2-contacting residues are indicated by yellow boxes as previously described9. In the Guangdong pangolin-CoV sequence, the codon positions encoding the amino acids Pro337, Asn420, Pro499 and Asn519 have ambiguous nucleotide compositions, resulting in possible alternative amino acids at these sites (threonine, glycine, threonine and lysine, respectively). Sequence gaps are indicated with dashes. The short black lines at the top indicate the positions of every 10 residues. GD, Guangdong; GX, Guangxi. b, Phylogenetic trees of the SARS-CoV-2-related lineage estimated from the entire RBD region (top) and synonymous sites only (bottom). Branch supports obtained from 1,000 bootstrap replicates are shown. Branch scale bars are shown as 0.1 substitutions per site. Full size image

To our knowledge, pangolins are the only mammals in addition to bats that have been documented to be infected by a SARS-CoV-2-related coronavirus. It is notable that two related lineages of coronaviruses are found in pangolins that were independently sampled in different Chinese provinces and that both are also related to SARS-CoV-2. This suggests that these animals may be important hosts for these viruses, which is surprising as pangolins are solitary animals that have relatively small population sizes, reflecting their endangered status11. Indeed, on the basis of the current data it cannot be excluded that pangolins acquired their SARS-CoV-2-related viruses independently from bats or another animal host. Therefore, their role in the emergence of human SARS-CoV-2 remains to be confirmed. In this context, it is noteworthy that both lineages of pangolin coronaviruses were obtained from trafficked Malayan pangolins, which originated from Southeast Asia, and that there is a marked lack of knowledge of the viral diversity maintained by this species in regions in which it is indigenous. Furthermore, the extent of virus transmission in pangolin populations should be investigated further. However, the repeated occurrence of infections with SARS-CoV-2-related coronaviruses in Guangxi and Guangdong provinces suggests that this animal may have an important role in the community ecology of coronaviruses.

Coronaviruses, including those related to SARS-CoV-2, are present in many wild mammals in Asia5,6,7,12. Although the epidemiology, pathogenicity, interspecies infectivity and transmissibility of coronaviruses in pangolins remains to be studied, the data presented here strongly suggests that handling these animals requires considerable caution and their sale in wet markets should be strictly prohibited. Further surveillance of pangolins in their natural environment in China and Southeast Asia are necessary to understand their role in the emergence of coronaviruses and the risk of future zoonotic transmissions.