On 9 January 2020, the novel coronavirus SARS-CoV-2 was officially identified as the cause of the COVID-19 outbreak in Wuhan, China. One of the most critical clinical and public health questions during the emergence of a completely novel pathogen, especially one that could cause a global pandemic, pertains to the spectrum of illness presentation or severity profile. For the patient and clinician, this affects triage and diagnostic decision-making, especially in settings without ready access to laboratory testing or when surge capacity has been exceeded. It also influences therapeutic choice and prognostic expectations. For managers of health services, it is important for rapid forward planning in terms of procurement of supplies, readiness of human resources to staff beds at different intensities of care and generally ensuring the sustainability of the health system through the peak and duration of the epidemic.

At the population level, determining the shape and size of the ‘clinical iceberg’2,3, both above and below the observed threshold (in turn determined by symptomatology, care-seeking behavior and clinical access), is key to understanding the transmission dynamics and interpreting epidemic trajectories. Specifically, delineating the proportion of infections that are clinically unobserved under different circumstances is critical to refining model parameterization. In turn, estimates of both the observed and unobserved infections are essential for informing the development and evaluation of public health strategies, which need to be traded off against economic, social and personal freedom costs. For example, drastic social distancing and mobility restrictions, such as school closures and travel advisories/bans, should only be considered if an accurate estimation of case fatality risk warrants these interventions, which seriously disrupt social and economic stability.

For a completely novel pathogen, especially one with a high (say, >2) basic reproductive number (the expected number of secondary cases generated by a primary case in a completely susceptible population) relative to other recently emergent and seasonal directly transmissible respiratory pathogens4, assuming homogeneous mixing and mass action dynamics, the majority of the population will be infected eventually unless drastic public health interventions are applied over prolonged periods and/or vaccines become available sufficiently quickly. Even under more realistic assumptions about mixing informed by observed clustering of infections within households and the increasingly apparent role of superspreading events (for example, the Diamond Princess cruise ship, Chinese prisons and the church in Daegu, South Korea)5,6, at least one-quarter to one-half of the population will very likely become infected, absent drastic control measures or a vaccine. Therefore, the number of severe outcomes or deaths in the population is most strongly dependent on how ill an infected person is likely to become, and this question should be the focus of attention.

We therefore extended our previously published transmission dynamics model4, updated with real-time input data and enriched with additional new data sources, to infer a preliminary set of clinical severity estimates that could guide clinical and public health decision-making as the epidemic continues to spread globally. Estimation of true case numbers—necessary to determine the severity per case—is challenging in the setting of an overwhelmed healthcare system that cannot ascertain cases effectively. Therefore, as in our prior work4, our approach has been to use a range of publicly available and recently published data sources (numbered 1 to 8 below) to build a picture of the full number of cases and deaths by age group. Briefly, because the healthcare structure has been overwhelmed in Wuhan and milder cases were unlikely to have been tested, we used the prevalence of infection in travelers (both on commercial flights before 19 January and on charter flights from 29 January to 4 February) to estimate the true prevalence of infection in Wuhan; we also used the Wuhan case numbers from only the first 425 cases to estimate the growth rate of the epidemic (assuming that the ascertainment proportion was constant between 10 December 2019 and 3 January 2020) (Fig. 1).

Fig. 1: Data used in the inference. a, The daily number of confirmed cases in Wuhan (with no epidemiologic links to Huanan Seafood Wholesale Market, i.e., cases due to human-to-human (H2H) transmission) between 1 December 2019 and 3 January 2020 (blue), the daily number of cases exported from Wuhan to cities outside mainland China via air travel between 25 December 2019 and 19 January 2020 (orange) and the proportion of expatriates on charter flights between 29 January and 4 February 2020 who were laboratory-confirmed to be infected (green). The numbers of passengers and confirmed cases who returned to their countries from Wuhan on chartered flights are provided in Supplementary Table 3. Bars indicate the 95% confidence intervals (CIs) of the proportion. b, The daily number of deaths in Wuhan reported between 1 December 2019 and 28 February 2020. Full size image

Specifically, we inferred the epidemiologic parameters listed in Extended Data Fig. 1 by fitting an age-structured transmission model to the following data:

1. The epidemic curve of confirmed cases of COVID-19 in Wuhan with no epidemiologic links to Huanan Seafood Wholesale Market (which was postulated to be the index zoonotic source of the COVID-19 epidemic) between 10 December 2019 and 3 January 2020 (Fig. 1 and Supplementary Table 1)7. 2. The number of confirmed cases who departed from the Wuhan international airport to cities outside mainland China via air travel on each day between 25 December 2019 and 19 January 2020 (Fig. 1 and Supplementary Table 2)4. 3. The number of expatriates and visitors who returned to their countries from Wuhan on charter flights between 29 January and 4 February 2020 and the proportion of passengers on each flight who had laboratory-confirmed infection with COVID-19 (by polymerase chain reaction with reverse transcription, RT-PCR) on arrival (Fig. 1 and Supplementary Table 3). 4. The age distribution of all confirmed cases of COVID-19 in Wuhan as of 11 February 20208 (Supplementary Table 4). 5. The age distribution of all death cases of COVID-19 in mainland China as of 11 February 20208 (Supplementary Table 5). 6. The cumulative number of deaths among confirmed cases of COVID-19 infection in Wuhan as of 25 February 20209 (Supplementary Table 6). 7. The time between onset and death or the time between admission and death for 41 death cases of COVID-19 in Wuhan10,11,12 (Supplementary Table 7). 8. The time between the onset dates (that is, serial intervals) of 43 infector–infectee pairs (Supplementary Table 8).

The clinical severity of infectious diseases is typically measured in terms of infection fatality risk (IFR), symptomatic case fatality risk (sCFR) and hospitalization fatality risk (HFR). The case definitions underlying these severity measures are as follows:

1. IFR defines a case as a person who would, if tested, be counted as infected and rendered (at least temporarily) immune, as usually demonstrated by seroconversion or other immune response13. Such cases may or may not be symptomatic. 2. sCFR defines a case as someone who is infected and shows certain symptoms. 3. HFR defines a case as someone who is infected and hospitalized. It is typically assumed in such estimates that the hospitalization is for treatment rather than isolation purposes.

Figure 2 summarizes our estimates of age-specific sCFRs and susceptibility to symptomatic infection. Both parameters increase substantially with age. If the probability of developing symptoms after infection, P sym , is 0.5, the sCFR values are 0.3% (0.1–0.7%), 0.5% (0.3–0.8%) and 2.6% (1.7–3.9%) for those aged <30 years, 30–59 years and >59 years, respectively. The overall sCFR is 1.4% (0.9–2.1%). Compared to those aged 30–59 years, those aged <30 years and >59 years are 0.16 (0.15–0.17) and 2.0 (1.95–2.08) times more susceptible to symptomatic infection. Our estimates of sCFRs would be lower if P sym were higher than the baseline value of 0.5; for example, the overall sCFR is 1.3% (0.8–2.3%) and 1.2% (0.7–1.9%) if P sym is 0.75 and 0.95, respectively. Our estimates of age-specific susceptibility are not sensitive to P sym .

Fig. 2: Estimates of age-specific sCFR and susceptibility to symptomatic infection for COVID-19 in Wuhan. a, Estimates of age-specific sCFRs assuming P sym is 0.50 (red), 0.75 (green) and 0.95 (blue). b, Estimates of relative susceptibility to symptomatic infection by age assuming P sym is 0.50 (red), 0.75 (green) and 0.95 (blue). The markers in both panels show the posterior means and the bars show 95% credible intervals (CrIs). Full size image

Figure 3 summarizes our estimates of the key epidemiologic parameters of COVID-19 in Wuhan. In the baseline scenario (P sym = 0.5), the basic reproductive number is 1.94 (1.83–2.06). The mean serial interval is 7.0 (5.8–8.1) days, with a standard deviation of 4.5 (3.5–5.5) days. The mean time from onset to death is 20 (17–24) days, with a standard deviation of 10 (7–14) days. The epidemic doubling time (the time it takes for daily incidence to double) was 5.2 (4.6–6.1) days before Wuhan was quarantined and public health interventions implemented within Wuhan reduced transmissibility by 48% (24–71%). We estimate that only 1.8% (0.9–3.3%) of symptomatic cases that occurred between 10 December 2019 and 3 January 2020 were ascertained. Figure 3 suggests that our estimates of the basic reproductive number, mean generation time and intervention effectiveness would be slightly lower if P sym were higher than the baseline value of 0.5, whereas our estimates of the other parameters are largely insensitive to P sym .

Fig. 3: Estimates of key epidemiologic parameters of the COVID-19 epidemic in Wuhan. Estimates of basic reproductive number, mean serial interval, initial doubling time, intervention effectiveness, ascertainment rate and the mean time from onset to death, assuming P sym is 0.50 (red), 0.75 (green) and 0.95 (blue). The markers show the posterior means and the bars show 95% CrIs. Full size image

There is a clear and considerable age dependency in symptomatic infection (susceptibility) and outcome (fatality) risks, by multiple folds in each case. Given that we have parameterized the model using death rates inferred from projected case numbers (from traveler data) and observed death numbers in Wuhan, the precise fatality risk estimates may not be generalizable to those outside the original epicenter, especially during subsequent phases of the epidemic. The experience gained from managing those initial patients and the increasing availability of newer, and potentially better, treatment modalities to more patients would presumably lead to fewer deaths, all else being equal. Public health control measures widely imposed in China since the Wuhan alert have also kept case numbers down elsewhere, so that their health systems are not nearly as overwhelmed beyond surge capacity, thus again perhaps leading to better outcomes6,8. Indeed, so far, the death-to-case ratio in Wuhan has been consistently much higher than that among all the other mainland Chinese cities (Extended Data Fig. 2). Given the intensive efforts of case finding and the sharp drop in community transmission of COVID-19 in Chinese cities outside Hubei over the past few weeks, the ascertainment rates in these cities were probably very high. As such, we postulate that confirmed case fatality risk in these cities should be in some ways comparable to our sCFR estimates for Wuhan, which attempt to account for under-ascertainment of cases in Wuhan. Nonetheless, crude case fatality risks estimated from cities outside Wuhan should be, and are, lower than our sCFR estimates for Wuhan, because the former do not account for the delay between onset and death (thus being artefactually lower) and because healthcare outside Hubei is less overwhelmed (thus allowing a truly lower CFR). Indeed, as of 29 February 2020, the crude case fatality risk in areas outside Hubei was 0.85%, which is ~23–41% lower than our sCFR estimates of 1.2–1.4% for Wuhan9.

Considering the risk estimates in context, Extended Data Fig. 3 compares infection, case and hospitalization fatality risks for pandemic influenza in 1918 and 2009, SARS and MERS. SARS causes moderate to severe disease requiring hospitalization, so the infection fatality risk and case fatality risk are essentially the same as the hospitalization fatality risk. The hospitalization fatality risk for MERS is well documented, although the shape and depth of the clinical iceberg remains less well defined. In contrast, because (1) the majority of COVID-19 infections do not cause severe disease8 and (2) hospitals in Wuhan have been overwhelmed, presumably having led to prioritized admission of more serious cases, the sCFR will be substantially lower than the HFR. However, despite a lower sCFR, COVID-19 is likely to infect many more (given emerging evidence of presymptomatic transmission14,15 and growing evidence of extensive community spread in numerous countries16), thus ultimately causing many more deaths than SARS and MERS. Compared with the 1918 and 2009 influenza pandemics, our estimates are intermediate but substantially higher than 2009, which was generally regarded as a low-severity pandemic. We find that sCFR is highest in the oldest age group. Unlike any previously reported pandemic or seasonal influenza, we find that risk of symptomatic infection also increases with age, although this may be in part due to preferential ascertainment of older and thus more severe cases. One largely unknown factor at present is the number of asymptomatic, undiagnosed infections. These do not enter our estimates of sCFR, but if such asymptomatic or clinically very mild cases existed and were not detected, the infection fatality risk would be lower than sCFR. Further clarifying this requires new data sources that are not yet available, specifically including age-stratified serologic studies.

Our inferences were based on a variety of sources, and have a number of caveats that are highlighted below, but considering the totality of the findings they nevertheless indicate that COVID-19 transmission is difficult to control. With a basic reproductive number of around two, we might expect at least half of the population to be infected, even with aggressive use of community mitigation measures. Perhaps the most important target of mitigation measures would be to ‘flatten out’ the epidemic curve, reducing the peak demand on healthcare services and buying time for better treatment pathways to be developed. In due course, but almost certainly after the first global wave of infections, vaccines may also be available to protect against infection or severe disease. Although our estimates of sCFR are concerning, these could be reduced if effective antivirals were identified and widely adopted for the treatment of severe cases. Timely data from clinical trials of remdesivir, lopinavir/ritonavir and other potential chemotherapies, as well as supportive care modalities, would be extremely informative.

Several important caveats are worth mentioning, as follows. First, and most importantly, our modeled estimates have necessarily relied on numerous strong assumptions, given the paucity of definitive data elements such as serosurveys, serial viral shedding studies, robust ascertainment of sufficient transmission chains and incomplete testing of travelers and returnees from Wuhan, all of which need to be underpinned by systematic unbiased sampling of the underlying population and by important age and other sub-groups.

Our estimates of sCFR are inevitably affected by under-ascertainment of cases and deaths of COVID-19. On the one hand, overstretched and overwhelmed healthcare surge capacity in Wuhan could result in sCFRs that are higher than they would be in a less stressed healthcare setting, as presumably the sicker patients would have been prioritized for admission while leaving the milder cases untested and thus unconfirmed. Our prevalence estimates relying on travelers are based on those well enough to travel, so may slightly underestimate prevalence in Wuhan by not including those who are already in a serious condition and perhaps hospitalized. We have accounted for the possibility that travelers may underestimate the prevalence of infection in Wuhan17 by using our best estimate, from a separate analysis, of the probability of detection for international travelers (38% (22–64%))17. On the other hand, the numerator of the number of deaths could also have been undercounted, although much less likely compared to enumerating the denominator, for the same surge capacity reason or due to imperfect test sensitivity, especially during the first month of the outbreak18. If deaths in Wuhan were under-ascertained, this would bias our severity estimates downward.

Another caveat concerns one of our key inputs—the infection prevalence among returnees airlifted out of Wuhan on charter flights. Their point prevalence might well be lower than that among local residents, because of a generally more advantaged socioeconomic background, and the sensitivity for detecting infected individuals among them might not be 100%, as assumed. As such, this would be a lower bound of the cross-sectional disease prevalence. If this were the case, then we would have overestimated the reduction in transmissibility conferred by public health interventions in Wuhan and overestimated the severity. Based on only publicly available data, there is necessarily substantial uncertainty in our estimates of the effectiveness of intra-Wuhan public health interventions in reducing transmissibility. Calculating the instantaneous reproductive number from a set of line lists that are updated daily would be the most reliable method for detecting changes in transmissibility associated with interventions.

There has been refinement of case definitions at both national and provincial levels, such as excluding RT-PCR-test-positive asymptomatics (perhaps, in fact, very mildly symptomatics) from being labeled an officially ‘confirmed’ case19 or including test-naïve clinically diagnosed cases with clear epidemiologic links as ‘confirmed’20. Although these should not affect our estimation given our data sources from the earlier phase of the epidemic, such changes in the reporting criteria may influence the interpretation of future data. Finally, given that Wuhan is no longer the only (albeit the first) location with sustained local spread, it would be important to assess and take into account the experience from elsewhere, both domestically in mainland China and overseas. These secondary epicenters, having learned from the early phase of the Wuhan epidemic, might have had a systematically different epidemiology and response that could impact the parameters estimated here21,22,23,24,25,26,27,28,29,30,31.