In the case of such an epidemic, it is important to make as reliable as possible an estimate of the basic reproductive number (R 0 , the number of cases generated from a single infected person) and the dynamics of transmission. The aim of this study was to investigate the temporal origin, rate of viral evolution and population dynamics of SARS‐CoV‐2 using 52 full genomes of viral strains sampled in different countries on known sampling dates available at the moment when the study was performed.

Belonging to the β‐coronavirus genus of the Coronaviridae family, SARS‐CoV‐2 is closely related to SARS‐CoV as there is more than 70% nucleotide similarity in their approximately 30 kb long genomes. 1 A recent study has supported the view that, like other β‐coronaviruses causing human infections such as SARS‐CoV and MERS‐CoV, SARS‐CoV‐2 originated from bats, and reported 96% genomic identity with a previously detected SARS‐like bat coronavirus. 2 , 3 However, it remains unclear whether the spillover also involved a different intermediary animal host.

On 30 January 2020, the World Health Organization declared that the outbreak of an infection due to a novel‐coronavirus (SARS‐CoV‐2) was a “Public Health Emergency of International Concern” ( https://www.who.int/news‐room/detail/30‐01‐2020‐statement‐on‐the‐second‐meeting‐of‐the‐international‐health‐regulations‐(2005)‐emergency‐committee‐regarding‐the‐outbreak‐of‐novel‐coronavirus‐(2019‐nCoV) ). Emerging as a human pathogen in the Chinese city of Wuhan, SARS‐CoV‐2 ( https://www.who.int/docs/default‐source/coronaviruse/situation‐reports/20200121‐sitrep‐1‐2019‐ncov.pdf?sfvrsn=20a99c10_4 ) has caused a widespread outbreak of febrile respiratory illness and, as of 13 February 2020, there were 60 349 confirmed cases (including 527 outside mainland China) and a total of 1360 fatalities ( https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 ).

For the birth‐death analysis, one and two intervals and a lognormal prior to R e , with a mean ( M ) of 0.0 and a variance ( S ) of 1.0 were chosen, which allows the R e values to change between less than 1 (0.193) and more than 5. A normal prior with M = 48.7 and S = 15 (corresponding to a 95% interval from 24.0 to 73.4) was used for the rate of becoming uninfectious. These values are expressed as units per year and reflect the inverse of the time of infectiousness (5.3‐19 days; mean, 7.5) according to the serial interval estimated by Li et al. 10 Sampling probability ( ρ ) was estimated assuming a prior β (α = 1.0 and β = 999), corresponding to a minority of the sampled cases (between 10 −5 and 10 −3 ). The origin of the epidemic was estimated using a normal prior with M = 0.1 and S = 0.05 in units per year.

The analyses were based on the previously selected HKY substitution model and the evolutionary rate was set to the value of 8.0 × 10 −4 subs/site/year, which corresponds to the mean substitution rate estimated using a relaxed clock under the exponential coalescent model as transformed into units per year.

The birth‐death skyline model implemented in Beast 2.48 was used to infer changes in the effective reproductive number ( R e ), and other epidemiological parameters such as the death/recovery rate ( δ ), the transmission rate ( λ ), the origin of the epidemic, and the sampling proportion ( ρ ). 9 Given that the samples were collected during a short period of time, a “birth‐death contemporary” model was used.

The basic reproductive number ( R 0 ) was calculated on the basis of the exponential growth rate ( r ) using the equation R 0 = rD + 1, where D is the average duration of infectiousness estimated as described below. 8 The doubling time of the epidemic was directly estimated setting the tree before the coalescent exponential growth analysis with doubling time parameterization.

The final trees were summarized by selecting the tree with the maximum product of posterior probabilities (pp) (maximum clade credibility) after a 10% burn‐in using Tree Annotator v.1.8.4 (included in the BEAST package) and were visualized using FigTree v.1.4.2 ( http://tree.bio.ed.ac.uk/software/figtree/ ).

The MCMC analysis was run until convergence with sampling every 100 000 generations. Convergence was assessed by estimating the effective sampling size (ESS) after 10% burn‐in using Tracer v.1.7 software ( http://tree.bio.ed.ac.uk/software/tracer/ ), and accepting ESS values of 300 or more. The uncertainty of the estimates is indicated by 95% highest marginal likelihoods estimated 6 by path sampling (PS)/stepping stone (SS) methods. 7

Different coalescent priors and molecular clock models (constant population size, exponential growth, and a Bayesian skyline plot [BSP]) were tested using strict and relaxed molecular clock models. Given the large credibility interval (CI) and high level of uncertainty due to very close sampling dates, all the estimates were made using days as the unit of time and a normal prior with substitution rates obtained from our preliminary estimates (mean rate 2.2 × 10 −6 subs/site/day, with a standard deviation of 1.1 × 10 −6 ).

Table 1 shows the parameters estimated using the birth‐death skyline plot. The epidemic originated an estimated mean of 3.7 months (CI, 3‐4) before the present (BP), corresponding to October to November 2019, before the root tree (3.6 months BP). The estimated recovery rate (the time to becoming noninfectious) was 7.3 days (CI, 4.7‐16.5 days), whereas the transmission rate ( λ ) increased from 40.5 to 112.4 in units per year in December 2019. On the basis of these values, the growth rate in the second period is r = 0.17 (0.16‐0.19), corresponding to a mean doubling time of 4.1 days (range, 3.9‐4.3).

The estimated growth rate under the exponential growth model was 0.218 days −1 , corresponding to an R 0 estimation of 2.6 (CI, 2.1‐5.1). The direct estimation of the doubling time of the epidemic gave a mean of 3.6 days (varying from 1.0 to 7.7). Figure 1B shows the Bayesian birth‐death skyline plot of the R e estimates with 95% HPD and indicates that R e increased from less than 1 (mean, 0.8; 95% HPD, 0.3‐1.3) to a mean value of 2.4 (95% HPD, 1.5‐3.5) in December 2019, and has since remained at this value. The estimation allowing a single R e gave a mean value of 1.85 (95% HPD, 1.37‐2.4).

A, Bayesian skyline plot of the SARS‐CoV‐2 outbreak. The y ‐axis indicates Ne and x ‐axis shows the time in year units (0 = 30 January; 18.2 days; 36.5 days; 54.7 days; and 73 days before). The thick solid line represents the median value of the estimates, and the gray area the 95% HPD. B, Birth‐death skyline plot of the SARS‐CoV‐2 outbreak allowing two R e intervals. The curve and the orange area show the mean R e values and their 95% confidence intervals. The y and x ‐axes, respectively, represent R values and time in days. HPD, highest posterior density

The Bayesian tree showed three main significant clades. The largest clade (pp = .84) encompassed 10 sequences and consisted of two significant subclades (pp = .9 and pp = 1). Overall, this cluster included fewer recent isolates than the other two clusters, and dated back to 47.5 days ago (95% HPD, 25.5‐76.6), corresponding to 13 December 2019. The second (pp = .99) and third significant clusters (pp = .95) dated back to 29.2 (95% HPD, 0.7‐47.45) and 21.9 (95% HPD 3.6‐54.7) days ago, corresponding to 1 to 8 January 2020.

The sequence analyses under a relaxed (uncorrelated lognormal) or strict molecular clock showed that the former performed better as assessed by using BF with PS and SS (strict vs relaxed molecular clock BF(PS) = −8.66 and BF(SS) = −10.7 for relaxed clock). Comparison of the different demographic models showed that the BSP model best fitted the data (BSP vs exponential growth BF(PS) = 7.3 and BF(SS) = 8.78 for BSP; BSP vs constant population size BF(PS)= 8.3; and BF(SS) = 10.7 for BSP).

4 DISCUSSION

The SARS‐CoV‐2 epidemic is unique in the history of human infectious diseases not only because it is caused by a novel virus, but also because of the immediate availability of epidemiological and genomic data (the first entire genome was published on 24 December 2019). The prompt availability of research data on internet platforms such as the GISAID has allowed us and other research groups to make a phylogenetic reconstruction of the origin of SARS‐CoV‐2 and to share these findings with other scientists.

The temporal reconstruction of the SARS‐CoV‐2 phylogeny obtained in the present study is in line with previous estimates and suggests that the epidemic originated between October and November 2019, several weeks before the first cases were described. This was confirmed by means of coalescent analysis and the birth‐death method of estimating the origin of the epidemic. The estimated evolutionary rate is also in line with that of SARS and MERS viruses,12, 13 and the recent estimates concerning SARS‐CoV‐2 (http://virological.org/t/phylodynamic‐analysis‐67‐genomes‐08‐feb‐2020/356).

One of the most important epidemiological parameters when monitoring an epidemic is R 0 (ie, the number of secondary cases induced by a single infected individual in a totally susceptible population) because it is fundamental to assess the potential spread of a microorganism. Its value changes during an epidemic being called the effective reproduction number (R e ). R 0 is usually estimated on the basis of the growth rate of the number of cases. The available epidemiological estimates of SARS‐CoV‐2 R 0 range from 2.2 to 2.9, although they changed from 1.4 to more than 7 during the first phases of the epidemic.10, 14

Recently developed evolutionary models have made it possible to estimate epidemiological parameters on the basis of phylogenesis,9, 15 and a coalescent and birth‐death methods were used to estimate R 0 and the changes in the R e of the SARS‐CoV‐2 epidemic during a short period of time. This has allowed us to make a preliminary estimate that mean R 0 from the beginning of the epidemic to the first days of February 2020 was 2.2 (range, 3.6‐5.8), and the birth‐death skyline analysis showed an increase in R e from less than 1 to 2.4 (CI, 1.5‐3.5) during December 2019. This agrees with the BSP analysis showing an increase in the number of infections in the same period of time. Commonly, the R e decreases during an epidemic because the decrease in the number of susceptible individuals. However, an increase in R e could be due to an increase in the transmissibility of the virus or in the contact rates within the population.16 It is, therefore, possible to hypothesize, on the basis of our data, that the first passage of the virus from animal to human occurred through rather inefficient and still unknown transmission modes causing relatively few cases in the early times (before December). In December, the virus acquired a more efficient mode of human‐to‐human transmission (ie, through droplets), causing exponential growth also detected by the skyline.

On the same basis, the estimated epidemic doubling time was 3.6 days with a CI between 1 and 7 days. We also tried to calculate it on the basis of the transmission (λ) and recovery rate (δ) estimated using the birth‐death model, which lead to an estimated mean doubling time of 4.1 days, with the most probable values falling between 3.9 and 4.3 days. Previous studies have suggested that the doubling time during the early phases of the epidemic was approximately 7.4 days.10 The difference in the estimate here obtained, may be due to the increased epidemic growth rate observed during the last days of January, or the initial delay in recognizing and reporting new cases.

This preliminary study has some limitations. The R values and doubling times were estimated phylogenetically using all of the whole genomes available in a public database at the time the study was carried out (https://www.gisaid.org/). Given the small number of sequences and the relatively short sampling period, the CIs are wide and limit the precision of the estimates. Moreover, the analysis included isolates collected outside mainland China as it is assumed that they all belong to the same epidemic originating in Wuhan.

Serial intervals were used to estimate the duration of infectiousness, although we do not yet have any information concerning the possible existence and duration of a latent (preinfectious) period that would contribute to the serial interval.

More detailed and accurate analyses can be made when a larger number of genomes and more precise data on the infectious period become available. However, although the R 0 calculated on the basis of the direct observation of the number of infected individuals may be affected by omissions or delayed notifications of cases,17 a phylogenetic estimate of the same parameter may be more reliable.

This became particularly evident recently (on 12 February 2020) when the change in diagnosis classification led to a sudden increase in the reported cases by Hubei, China (https://myemail.constantcontact.com/COVID‐19‐Updates‐‐‐Feb‐12.html?soid=1107826135286&aid=Kdg8a0rBTAk).