The case definition was initially narrow and was gradually broadened to allow detection of more cases as knowledge increased, particularly milder cases and those without epidemiological links to Wuhan, China, or other known cases. These changes should be taken into account when making inferences on epidemic growth rates and doubling times, and therefore on the reproductive number, to avoid bias.

From Jan 15 to March 3, 2020, seven versions of the case definition for COVID-19 were issued by the National Health Commission in China. We estimated that when the case definitions were changed, the proportion of infections being detected as cases increased by 7·1 times (95% credible interval [CrI] 4·8–10·9) from version 1 to 2, 2·8 times (1·9–4·2) from version 2 to 4, and 4·2 times (2·6–7·3) from version 4 to 5. If the fifth version of the case definition had been applied throughout the outbreak with sufficient testing capacity, we estimated that by Feb 20, 2020, there would have been 232 000 (95% CrI 161 000–359 000) confirmed cases in China as opposed to the 55 508 confirmed cases reported.

We examined changes in the case definition for COVID-19 in mainland China during the first epidemic wave. We used exponential growth models to estimate how changes in the case definitions affected the number of cases reported each day. We then inferred how the epidemic curve would have appeared if the same case definition had been used throughout the epidemic.

When a new infectious disease emerges, appropriate case definitions are important for clinical diagnosis and for public health surveillance. Tracking case numbers over time is important to establish the speed of spread and the effectiveness of interventions. We aimed to assess whether changes in case definitions affected inferences on the transmission dynamics of coronavirus disease 2019 (COVID-19) in China.

Here, we review the various COVID-19 case definitions that have been used in mainland China as of March 13, 2020, and examine the implications of changes in case definitions on the epidemiology of COVID-19, aiming to quantify the effect of changes in the case definition on inferences about transmission parameters based on the epidemic curve.

Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified in a cluster of patients with atypical pneumonia in Wuhan, China, in December, 2019.At the end of January, 2020, it became clear that infection was spreading efficiently from person to person, and also that there was a broader clinical spectrum of infections.As a consequence of the evolving information on the epidemiological and clinical spectrum of infections, there have been several revisions to the case definition for COVID-19 in mainland China.

When a newly emerging infectious disease is first identified, specifying appropriate case definitions can help to identify individuals who are infected in an efficient manner.Often a hierarchy of case definitions will be used, so that a suspected case can be defined based on broad epidemiological and clinical criteria—eg, patients with particular exposures or in particular geographical locations, with particular signs or symptoms, at a particular time. A confirmed case can be defined as a suspected case in which the pathogen of interest is identified or isolated with a specific laboratory test. Epidemiological and clinical information for patients who meet a case definition can inform the source or sources of infections, potential modes of transmission, transmission dynamics, and severity of the infection. All this information is important for establishing optimal control measures.

The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

To account for the uncertainty in estimates of the onset-to-reporting interval, and to allow us to quantify the uncertainty in model parameters including the growth rates, we did our analysis in a Bayesian framework and constructed a Markov chain Monte Carlo algorithm,which allowed joint parameter estimation ( appendix p 3 ). Substantial differences in parameters (eg, growth rate and doubling time) were defined as the non-overlap of their credible intervals (CrI), meaning that the probability that the two parameters were the same was less than 0·05. On the basis of the modelling results, we estimated the number of cases if version 5 of the case definition had been applied throughout the outbreak ( appendix p 4 ). All statistical analyses were done using R version 3.5.2. All data and code required to reproduce the analysis are available online appendix p 4 ).

When changing the case definition, there could be a backfill of cases that fulfilled the new case definition around the change time. We allowed for backfill up to 10 days before each change in case definition by assuming that a change in case definition could have a partial effect on incidence before the change date t, accounting for the reporting delay, which was estimated from the onset time series and report time series ( appendix p 2 ). We estimated the growth rate as one of the model parameters, and we estimated the doubling time using log(2) divided by the estimated growth rate. We fitted separate models for Wuhan, Hubei province excluding Wuhan, and the rest of mainland China excluding Hubei province, to account for the regional differences in growth rates, epidemic timing, and potential transmissibility. We estimated the basic reproductive number R, corresponding to the mean number of secondary infections from one case at the start of the outbreak, using the formula: 1 divided by M(−r),where r was the growth rate and M was the moment generating function of the generation time distribution. We assumed the generation time distribution followed the same gamma distribution as a previously estimated serial interval distribution with a mean of 7·5 days (SD 3·4).As a sensitivity analysis, we used another estimated serial interval distribution with the mean of 4·7 days (SD 2·9).In addition, we did a sensitivity analysis allowing backfill for up to 15 days before each change in case definitions.

We reviewed the case definitions and highlighted the key changes in sequential updates. We fitted an exponential growth model to the incidence of cases to quantify the effect of changing case definitions on the epidemic curve for laboratory-confirmed cases ( appendix p 2 ). In the model, we assumed that each change in case definition increased the proportion of cases that would be detected among all infections. Also, we assumed the effect of changing case definition was the same for all regions in China. To account for the control measures, such as the lockdown in Wuhan and other cities in China on Jan 23, 2020, and the subsequent days,we allowed the growth rate to change on this date. Because the interventions acted to prevent infections but the epidemic curve was based on date of symptom onset in our analysis, the effect of the interventions would be expected to have a slightly delayed effect on the epidemic curve, which we accounted for by incorporating the incubation period distribution ( appendix p 2 ). The incubation period was assumed to follow a log-normal distribution with a mean of 5·2 days (SD 3·9).

Changes of case definitions or laboratory testing capacity should be accounted for when analysing an epidemic curve. In China, broadening the case definitions over time allowed a greater proportion of infections to be detected as cases. Taking into account these changes, we estimated that there were at least 232 000 infections in the first epidemic wave of COVID-19 in mainland China. The true number of infections could still be higher than that currently estimated considering the possibility of under-detection of some infections, particularly those that were mild and asymptomatic, even under the broadest case definitions.

We collected publicly available information on epidemic curves in China, and summarised the changes in seven versions of case definitions. We found that changes in the case definitions of COVID-19 had a substantial effect on the proportion of infections that were detected as cases. We estimated that if changing case definitions were unaccounted for, the growth rate would be overestimated. We also estimated the total number of cases if a broader case definition had been applied at the early stage of the epidemic and if there had been sufficient laboratory capacity. With these assumptions, we estimated that approximately as many as 232 000 infections could have been confirmed as COVID-19 cases in China by Feb 20, 2020, around four times more than the 55 508 cases identified by that date.

We examined 19 studies in detail and found three studies that were relevant to the change of case definition. One study estimated the incubation period in the early stage of the outbreak, which can be helpful for modifying the case definitions. Another modelling study allowed for the change of case definitions in Wuhan via additional parameters for the change in case detection probabilities. A study also noted that their analyses were based on earlier case definitions, and stated that the estimated effective reproductive number was an upper bound because later case definitions could capture more cases. We found no study directly estimating the effect of case definitions on epidemic curves except two studies that briefly summarised the changes of case definitions in the guidelines on the diagnosis and treatment of patients with COVID-19 in China.

Coronavirus disease 2019 (COVID-19) case numbers increased throughout January, 2020, in China. As more information became available on disease spectrum, and laboratory testing capacity was expanded, the case definitions were also changed. We searched PubMed for studies published in English from database inception up until April 4, 2020, reporting the effect of changing case definitions on the epidemic curve for COVID-19 using keywords including “COVID-19”, “2019-nCoV”, “novel coronavirus-infected pneumonia”, “SARS-CoV-2”, and “case definition”.

We obtained the officially published guidelines on diagnosis and treatment of COVID-19 from the National Health Commission and other public sources. The first two editions were not originally released publicly, while the third edition onwards have been released by the National Health Commission.Epidemic curves by onset date and report date from Dec 2, 2019, to Feb 20, 2020, in China were extracted from the data presented in the report of the WHO-China Joint Mission on Coronavirus Disease 2019 in February, 2020.

Results

Figure 1 Evolution of case definitions for COVID-19 Show full caption Seven editions of the National Guideline for Diagnosis and Treatment of COVID-19 have been published in China since Jan 15, 2020. COVID-19=coronavirus disease 2019. SARS-CoV-2=severe acute respiratory syndrome coronavirus 2. *Version 4 referred to travel history to or residence in an area with sustained local transmission of SARS-CoV-2 infection. †Individuals with symptoms were considered as those showing fever and respiratory symptoms in versions 2 and 3, and fever or respiratory symptoms in other versions of the definition. ‡In version 4, a patient was either one of a cluster of patients or epidemiologically linked to a confirmed COVID-19 case. §Clustering events were further clarified in version 7 as “2 or more cases with fever and/or respiratory symptoms found in a small area within 2 weeks”, but not in previous versions. ¶Multiple rows indicate alternative options for meeting the case definition. We analysed the changes of the case definition for COVID-19 applied in China from Jan 15 to March 3, 2020. Before Jan 15, 2020, we were unable to identify the case definition that was used in Wuhan to identify the earliest 41 confirmed cases. The first national guideline for diagnosis and treatment was issued on Jan 15, 2020, and required six specific criteria to be met for a patient to be a confirmed case of COVID-19 ( figure 1 appendix p 6 ). Notably, patients needed to have an epidemiological link to Wuhan or a wet market in Wuhan and had to fulfil four clinical conditions indicative of viral pneumonia to be identified as suspected cases. They then had to have a respiratory specimen tested by full genome sequencing showing a close homology with SARS-CoV-2 for the final confirmation of COVID-19. In the following days and weeks, several revisions were made to the case definitions, allowing gradually greater sensitivity in the criteria required for case confirmation ( figure 1 ). We present the seven versions of cases definitions in the appendix (pp 6–18)

The second edition of the case definitions removed the requirement for failure of antibiotic treatment to identify suspected cases and allowed PCR confirmation in addition to whole genome sequencing. There was no change in case definitions in the third edition, but classifications of severe and critical cases were modified and clarified. The fourth edition allowed patients to have an epidemiological link to other areas with reported cases, instead of being restricted to Wuhan, and suspected cases required only two, instead of all three, types of clinical manifestations in addition to an epidemiological link. The greatest change was in the fifth edition, which introduced a new category of cases (ie, clinically confirmed cases), specifically for Hubei province, which was the epicentre of the outbreak and had the largest number of cases identified in the country. Here, clinically confirmed cases were patients that met clinical criteria and had radiological evidence of pneumonia with or without a certain epidemiological link but did not need to have a virological confirmation of infection. In the sixth edition, this criterion for diagnosis of clinically confirmed cases was removed and no distinction was made between cases inside or outside Hubei province. In the seventh edition, serology was added as an additional option for laboratory confirmation.

We modelled the effects of changes in case definition from version 1 to version 2, from version 2 to 4, and from version 4 to 5. We did not explore the effects of changing from version 2 to 3 because version 3 applied the same definitions for suspected and confirmed COVID-19 cases as version 2 but only included updates to the severity classifications and therefore had no effect on the incidence or the epidemic curve. We were not able to explore the change after version 5 as we only analysed data up to Feb 20, 2020, which included just the first 2 days after the release of version 6. We were not able to find publicly available information on incidence of cases by illness onset date after Feb 20, and had to censor our analysis at that point.

Figure 2 Reported COVID-19 cases by date of onset and the the modelled exponential growth of daily numbers of cases by application of different versions of case definitions Show full caption Data are assuming that the version of the case definition was applied throughout the study period in mainland China, as of Feb 20, 2020. Symbols and lines show daily numbers of reported and estimated cases, and colours indicate cases in line with the different versions of COVID-19 case definitions. The coloured shading areas reflect that changing case definitions were adjusted earlier to reflect the assumption that there was a backfill of symptomatic cases who had not yet presented for diagnosis up to 10 days before each change in case definition, and therefore the effect of changing case definition would appear to modify the proportion of infections captured as cases before the actual day of change. The vertical dashed line indicates the implementalion of control measures. COVID-19=coronavirus disease 2019. The changes in case definitions had a clear effect on the proportion of infections that were identified and counted as confirmed cases. As of Feb 20, 2020, there were 55 508 confirmed cases in China, among which 27 000 were from Wuhan, 16 000 were from the rest of Hubei province, and 13 000 were from the rest of China. We estimated that the mean onset-to-reporting delay was 8·6 days (95% CrI 7·4–10·1) and the 95th percentile of this distribution was 15·7 days (13·0–20·1). Allowing for a 10 day backfill of cases, we estimated that when the case definitions were changed from version 1 to 2, version 2 to 4, and version 4 to 5, the proportion of infections being identified as COVID-19 cases was increased by 7·1 times (95% CrI 4·8–10·9) from version 1 to 2, 2·8 times (1·9–4·2) from version 2 to 4, and 4·2 times (2·6–7·3) from version 4 to 5 ( figure 2 ).

Figure 3 Occurrence of COVID-19 cases by different case definitions Show full caption COVID-19 cases by date of illness onset in Wuhan (A), Hubei province excluding Wuhan (B), and other provinces in mainland China excluding Hubei province (C). Observed cases are indicated with blue bars. Red bars indicate estimates for case definition version 2, yellow bars for case definition version 4, and grey bars for case definition version 5. COVID-19=coronavirus disease 2019. Based on the model, we estimated that if the case definitions from version 5 had been applied throughout the outbreak, and there had been sufficient availability of laboratory testing with RT-PCR from the early phase of the epidemic, 232 000 cases (95% CrI 161 000–359 000) could have met the case definition and could have been detected by Feb 20, 2020, of which 127 000 cases (86 000–198 000) were from Wuhan, 55 000 (38 000–86 000) were from the rest of Hubei province excluding Wuhan, and 50 000 (34 000–78 000) were from the rest of China excluding Hubei ( figure 3 ). Among the 127 000 cases that we estimated in Wuhan by Feb 20, we estimated that there could have been approximately 11 000 infections (95% CrI 7000–21 000) that met version 5 of the case definition with illness onset by Jan 1, 2020. In the observed data, there were 114 confirmed COVID-19 cases with illness onset by Jan 1, 2020, corresponding to around 1% of our estimated total. Before Jan 23, we estimated that 92% (95% CrI 88–95) of cases were undetected.

8 Li Q

Guan X

Wu P

et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. 10 Nishiura H

Linton NM

Akhmetzhanov AR Serial interval of novel coronavirus (COVID-19) infections. We estimated that after implementation of control measures on Jan 23, the growth rate declined substantially to less than 0, from 0·08 to −0·15 in Wuhan, which was a change of −0·23 (95% CrI −0·27 to −0·20). The corresponding changes in growth rate were −0·26 (−0·30 to −0·22) for the rest of Hubei province excluding Wuhan, and −0·28 (−0·32 to −0·25) for the rest of China excluding Hubei. These findings suggested that the control measures were very effective, reducing the effective reproductive number to well less than 1. Specifically, using a mean serial interval of 7·5 days,the effective reproductive numbers were reduced to 0·21–0·28 for the three regions, while the estimates were reduced to 0·36–0·44 with a mean serial interval of 4·7 days.

8 Li Q

Guan X

Wu P

et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. 0 estimates in the range of 1·8–2·0. If we instead used the growth rate estimates of 0·15–0·19 ( 0 estimates in the range of 2·8–3·5. In a sensitivity analysis, using a mean serial interval of 4·7 days, 10 Nishiura H

Linton NM

Akhmetzhanov AR Serial interval of novel coronavirus (COVID-19) infections. 0 estimates using a growth rate of 0·08–0·10 were in the range of 1·4–1·5, while using the growth rate estimates of 0·15–0·19 ( 0 estimates in the range of 1·9–2·2. Table Estimates of the epidemic growth rate and doubling time before Jan 23, 2020, with or without adjustment for changes in case definitions Wuhan Hubei province excluding Wuhan China excluding Hubei province Growth rate, per day With adjustment for changes in case definitions 0·08 (0·06–0·10) 0·10 (0·08–0·12) 0·10 (0·08–0·12) Without adjustment 0·15 (0·14–0·17) 0·18 (0·13–0·28) 0·19 (0·16–0·24) Doubling time, days With adjustment for changes in case definitions 8·7 (7·3–10·8) 7·0 (5·8–8·8) 7·0 (5·8–8·7) Without adjustment 4·5 (4·1–4·8) 3·9 (2·4–5·3) 3·6 (2·9–4·3) Data are growth rates and doubling times with 95% credible intervals. After adjusting for the changes in case definitions, we estimated that the epidemic growth rate before Jan 23, 2020, was around 0·08 to 0·10 and the doubling time was around 7·0 to 8·7 days for these three geographical areas, and the differences among them were not substantial ( table ). If instead the change in case definitions was unaccounted for, the growth rate would have been substantially overestimated and the doubling time would have been substantially underestimated ( table ). Using a growth rate of 0·08–0·10 with a mean serial interval of 7·5 dayswould lead to Restimates in the range of 1·8–2·0. If we instead used the growth rate estimates of 0·15–0·19 ( table ), we would obtain Restimates in the range of 2·8–3·5. In a sensitivity analysis, using a mean serial interval of 4·7 days,estimates using a growth rate of 0·08–0·10 were in the range of 1·4–1·5, while using the growth rate estimates of 0·15–0·19 ( table ) we would obtain Restimates in the range of 1·9–2·2.

In a sensitivity analysis allowing for 15 days of backfill each time the case definition changed, the proportion of infections being identified as COVID-19 cases was increased by 3·0–8·8 times. We estimated that 253 000 cases (95% CrI 158 000–436 000) would have met the case definition and could have been detected by Feb 20, 2020. These estimates were slightly higher, but as expected, given the backfill period was longer.