Researchers studying coronaviruses—a family of enveloped positive-strand RNA viruses infecting vertebrates8—have been confronted several times with the need to define whether a newly emerged virus causing a severe or even life-threatening disease in humans belongs to an existing or a new (yet-to-be-established) species. This happened with SARS9,10,11,12 and with Middle East respiratory syndrome (MERS)13,14 a few years later. Each time, the virus was placed in the taxonomy using information derived from a sequence-based family classification15,16.

The current classification of coronaviruses recognizes 39 species in 27 subgenera, five genera and two subfamilies that belong to the family Coronaviridae, suborder Cornidovirineae, order Nidovirales and realm Riboviria17,18,19 (Fig. 1). The family classification and taxonomy are developed by the Coronaviridae Study Group (CSG), a working group of the ICTV20. The CSG is responsible for assessing the place of new viruses through their relation to known viruses in established taxa, including placements relating to the species Severe acute respiratorysyndrome-related coronavirus. In the classification of nidoviruses, species are considered biological entities demarcated by a genetics-based method21, while generally virus species are perceived as man-made constructs22. To appreciate the difference between a nidoviral species and the viruses grouped therein, it may be instructive to look at their relationship in the context of the full taxonomy structure of several coronaviruses. Although these viruses were isolated at different times and locations from different human and animal hosts (with and without causing clinical disease), they all belong to the species Severe acute respiratorysyndrome-related coronavirus, and their relationship parallels that between human individuals and the species Homo sapiens (Fig. 1).

Fig. 1: Taxonomy of selected coronaviruses. Shown is the full taxonomy of selected coronaviruses in comparison with the taxonomy of humans (the founders of virology and other eminent scientists represent individual human beings for the sake of this comparison), which is given only for categories (ranks) that are shared with the virus taxonomy. Note that these two taxonomies were independently developed using completely different criteria. Although no equivalence is implied, the species of coronaviruses is interpreted sensu stricto as accepted for the species of humans. Full size image

Even without knowing anything about the species concept, every human recognizes another human as a member of the same species. However, for assigning individual living organisms to most other species, specialized knowledge and tools for assessing inter-individual differences are required. The CSG uses a computational framework of comparative genomics23, which is shared by several ICTV Study Groups responsible for the classification and nomenclature of the order Nidovirales and coordinated by the ICTV Nidovirales Study Group (NSG)24 (Box 3). The Study Groups quantify and partition the variation in the most conserved replicative proteins encoded in open reading frames 1a and 1b (ORF1a/1b) of the coronavirus genome (Fig. 2a) to identify thresholds on pair-wise patristic distances (PPDs) that demarcate virus clusters at different ranks.

Fig. 2: Phylogeny of coronaviruses. a, Concatenated multiple sequence alignments (MSAs) of the protein domain combination44 used for phylogenetic and DEmARC analyses of the family Coronaviridae. Shown are the locations of the replicative domains conserved in the ordert Nidovirales in relation to several other ORF1a/b-encoded domains and other major ORFs in the SARS-CoV genome. 5d, 5 domains: nsp5A-3CLpro, two beta-barrel domains of the 3C-like protease; nsp12-NiRAN, nidovirus RdRp-associated nucleotidyltransferase; nsp12-RdRp, RNA-dependent RNA polymerase; nsp13-HEL1 core, superfamily 1 helicase with upstream Zn-binding domain (nsp13-ZBD); nt, nucleotide. b, The maximum-likelihood tree of SARS-CoV was reconstructed by IQ‑TREE v.1.6.1 (ref. 45) using 83 sequences with the best fitting evolutionary model. Subsequently, the tree was purged from the most similar sequences and midpoint-rooted. Branch support was estimated using the Shimodaira–Hasegawa (SH)-like approximate likelihood ratio test with 1,000 replicates. GenBank IDs for all viruses except four are shown; SARS-CoV, AY274119.3; SARS-CoV-2, MN908947.3; SARSr-CoV_BtKY72, KY352407.1; SARS-CoV_PC4-227, AY613950.1. c, Shown is an IQ‑TREE maximum-likelihood tree of single virus representatives of thirteen species and five representatives of the species Severe acute respiratory syndrome-related coronavirus of the genus Betacoronavirus. The tree is rooted with HCoV-NL63 and HCoV-229E, representing two species of the genus Alphacoronavirus. Purple text highlights zoonotic viruses with varying pathogenicity in humans; orange text highlights common respiratory viruses that circulate in humans. Asterisks indicate two coronavirus species whose demarcations and names are pending approval from the ICTV and, thus, these names are not italicized. Full size image

Consistent with previous reports, SARS-CoV-2 clusters with SARS-CoVs in trees of the species Severe acute respiratory syndrome-related coronavirus (Fig. 2b) and the genus Betacoronavirus (Fig. 2c)25,26,27. Distance estimates between SARS-CoV-2 and the most closely related coronaviruses vary among different studies depending on the choice of measure (nucleotide or amino acid) and genome region. Accordingly, there is no agreement yet on the exact taxonomic position of SARS-CoV-2 within the subgenus Sarbecovirus. When we included SARS-CoV-2 in the dataset used for the most recent update (May 2019) of the coronavirus taxonomy currently being considered by ICTV19, which includes 2,505 coronaviruses, the species composition was not affected and the virus was assigned to the species Severe acute respiratory syndrome-related coronavirus, as detailed in Box 4.

With respect to novelty, SARS-CoV-2 differs from the two other zoonotic coronaviruses, SARS-CoV and MERS-CoV, introduced to humans earlier in the twenty-first century. Previously, the CSG established that each of these two viruses prototype a new species in a new informal subgroup of the genus Betacoronavirus15,16. These two informal subgroups were recently recognized as subgenera Sarbecovirus and Merbecovirus18,28,29 when the subgenus rank was established in the virus taxonomy30. Being the first identified representatives of a new species, unique names were introduced for the two viruses and their taxa in line with the common practice and state of virus taxonomy at the respective times of isolation. The situation with SARS-CoV-2 is fundamentally different because this virus is assigned to an existing species that contains hundreds of known viruses predominantly isolated from humans and diverse bats. All these viruses have names derived from SARS-CoV, although only the human isolates collected during the 2002–2003 outbreak have been confirmed to cause SARS in infected individuals. Thus, the reference to SARS in all these virus names (combined with the use of specific prefixes, suffixes and/or genome sequence IDs in public databases) acknowledges the phylogenetic (rather than clinical disease-based) grouping of the respective virus with the prototypic virus in that species (SARS-CoV). The CSG chose the name SARS-CoV-2 based on the established practice for naming viruses in this species and the relatively distant relationship of this virus to the prototype SARS-CoV in a species tree and the distance space (Fig. 2b and the figure in Box 4).

The available yet limited epidemiological and clinical data for SARS-CoV-2 suggest that the disease spectrum and transmission efficiency of this virus31,32,33,34,35 differ from those reported for SARS-CoV9. To accommodate the wide spectrum of clinical presentations and outcomes of infections caused by SARS-CoV-2 (ranging from asymptomatic to severe or even fatal in some cases)31, the WHO recently introduced a rather unspecific name (coronavirus disease 19, also known as COVID-19 (ref. 36)) to denote this disease. Also, the diagnostic methods used to confirm SARS-CoV-2 infections are not identical to those of SARS-CoV. This is reflected by the specific recommendations for public health practitioners, healthcare workers and laboratory diagnostic staff for SARS-CoV-2 (for example, the WHO guidelines for SARS-CoV-2 (ref. 37). By uncoupling the naming conventions used for coronaviruses and the diseases that some of them cause in humans and animals, we wish to support the WHO in its efforts to establish disease names in the most appropriate way (for further information, see the WHO’s guidelines for disease naming38). The further advancement of naming conventions is also important because the ongoing discovery of new human and animal viruses by next-generation sequencing technologies can be expected to produce an increasing number of viruses that do not (easily) fit the virus–disease model that was widely used in the pre-genomic era (Box 1). Having now established different names for the causative virus (SARS-CoV-2) and the disease (COVID-19), the CSG hopes that this will raise awareness in both the general public and public health authorities regarding the difference between these two entities. The CSG promotes this clear distinction because it will help improve the outbreak management and also reduces the risk of confusing virus and disease, as has been the case over many years with SARS-CoV (the virus) and SARS (the disease).

To facilitate good practice and scientific exchange, the CSG recommends that researchers describing new viruses (that is, isolates) in this species adopt a standardized format for public databases and publications that closely resembles the formats used for isolates of avian coronaviruses39, filoviruses40 and influenza virus1. The proposed naming convention includes a reference to the host organism that the virus was isolated from, the place of isolation (geographic location), an isolate or strain number, and the time of isolation (year or more detailed) in the format virus/host/location/isolate/date; for example, SARS-CoV-2/human/Wuhan/X1/2019. This complete designation along with additional and important characteristics, such as pathogenic potential in humans or other hosts, should be included in the submission of each isolate genome sequence to public databases such as GenBank. In publications, this name could be further extended with a sequence database ID—for example, SARS-CoV-2/human/Wuhan/X1/2019_XYZ12345 (fictional example)—when first mentioned in the text. We believe that this format will provide critical metadata on the major characteristics of each particular virus isolate (genome sequence) required for subsequent epidemiological and other studies, as well as for control measures.

Box 3 Classifying coronaviruses Initially, the classification of coronaviruses was largely based on serological (cross-) reactivities to the viral spike protein, but is now based on comparative sequence analyses of replicative proteins. The choice of proteins and the methods used to analyse them have gradually evolved since the start of this century20,28,29,51. The CSG currently analyses 3CLpro, NiRAN, RdRp, ZBD and HEL1 (ref. 52) (Fig. 2a), two domains less than previously used in the analyses conducted between 2009 and 2015 (refs. 16,18). According to our current knowledge, these five essential domains are the only ones conserved in all viruses of the order Nidovirales52. They are thus used for the classification by all ICTV nidovirus study groups (coordinated by the NSG). Since 2011, the classification of coronaviruses and other nidoviruses has been assisted by the DivErsity pArtitioning by hieRarchical Clustering (DEmARC) software, which defines taxa and ranks23,24. Importantly, the involvement of all coronavirus genome sequences available at the time of analysis allows family-wide designations of demarcation criteria for all ranks, including species, regardless of the taxa sampling size, be it a single or hundreds of virus(es). DEmARC delineates monophyletic clusters (taxa) of viruses using weighted linkage clustering in the PPD space and according to the classification of ranks defined through clustering cost (CC) minima presented as PPD thresholds (PPD accounts for multiple substitutions at all sequence positions and thus may exceed 1.0, which is the limit for conventional pair-wise distances (PDs)). In the DEmARC framework, the persistence of thresholds in the face of increasing virus sampling is interpreted to reflect biological forces and environmental factors21. Homologous recombination, which is common in coronaviruses53,54,55, is believed to be restricted in genome regions encoding the most essential proteins, such as those used for classification, and to members of the same virus species. This restriction promotes intra-species diversity and contributes to inter-species separation. To facilitate the use of rank thresholds outside of the DEmARC framework, they are converted into PD and expressed as a percentage, which researchers commonly use to arrive at a tentative assignment of a given virus within the coronavirus taxonomy following conventional phylogenetic analysis of selected viruses.