1 WHO

Coronavirus disease 2019 (COVID-19): Situation report—46. 2 Stoye E China coronavirus: how many papers have been published?. Since it was first reported by WHO in Jan 5, 2020, over 80 000 cases of a novel coronavirus disease (COVID-19) have been diagnosed in China, with exportation events to nearly 90 countries, as of March 6, 2020.Given the novelty of the causative pathogen (named SARS-CoV-2), scientists have rushed to fill epidemiological, virological, and clinical knowledge gaps—resulting in over 50 new studies about the virus between January 10 and January 30 alone.However, in an era where the immediacy of information has become an expectation of decision makers and the general public alike, many of these studies have been shared first in the form of preprint papers—before peer review.

3 Krumholz HM

Bloom T

Ross JS Preprints can fill a void in times of rapidly changing science. 0 ]) of SARS-CoV-2 on or before Feb 1, 2020 to investigate the role that preprints have had in information dissemination during the ongoing outbreak. We also analysed the agreement of preprint estimates compared with those presented by peer-reviewed studies and propose a consensus-based approach for evaluating the validity of preprint findings during public health crises. For our analysis, we collected publicly available data from scientific studies, news reports, and search trends pertaining to SARS-CoV-2 and its R 0 . Defined as the average number of secondary infections that a new case might transmit in a fully susceptible population, estimates of R 0 can provide decision makers with insights into the epidemic potential of a given outbreak. For the past three decades, preprint servers have become commonplace in the scientific publication ecosystem, and COVID-19 has prompted a seemingly unprecedented use of these platforms.Although peer-review is crucial for the validation of science, the ongoing outbreak has showcased the speed with which preprints can disseminate information during emergencies. In this Comment, we used both preprint and peer-reviewed studies that estimated the transmissibility potential (ie, basic reproduction number [R]) of SARS-CoV-2 on or before Feb 1, 2020 to investigate the role that preprints have had in information dissemination during the ongoing outbreak. We also analysed the agreement of preprint estimates compared with those presented by peer-reviewed studies and propose a consensus-based approach for evaluating the validity of preprint findings during public health crises. For our analysis, we collected publicly available data from scientific studies, news reports, and search trends pertaining to SARS-CoV-2 and its R. Defined as the average number of secondary infections that a new case might transmit in a fully susceptible population, estimates of Rcan provide decision makers with insights into the epidemic potential of a given outbreak.

0 associated with SARS-CoV-2 in the body of the text. Relevant news reports were discovered through MediaCloud and search trends by use of Google Search Trends , and both served as a proxy indicator for information dissemination. Meanwhile, relevant scientific studies were discovered through a combination of searches executed with use of Google Scholar and, to address possible delays in indexing, four popular public preprint servers (ie, arXiv, bioRxiv, medRxiv, and Social Science Research Network [SSRN]) that we believe are representative of the relevant preprint literature. Search terms and specifications for each data source are outlined in the appendix (p 2) . All studies discovered through Google Scholar, arXiv, bioRxiv, medRxiv, and SSRN were manually checked for relevance to the topic area of interest. We retained only studies that included estimates for the Rassociated with SARS-CoV-2 in the body of the text.

4 Majumder MS

Mandl KD Early transmissibility assessment of a novel coronavirus in Wuhan, China. , 5 Majumder MS

Mandl KD Early transmissibility assessment of a novel coronavirus in Wuhan, China. , 6 Read JM

Bridgen JRE

Cummings DAT

et al. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. , 7 Read JM

Bridgen JRE

Cummings DAT

et al. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. , 8 Riou J

Althaus CL Pattern of early human-to-human transmission of Wuhan 2019-nCoV. , 9 Tang B

Wang X

Li Q

et al. Estimation of the transmission risk of 2019-nCov and its implication for public health interventions. , 10 Zhao S

Ran J

Musa SS

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. , 11 Zhao S

Ran J

Musa SS

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. , 12 Zhou T

Liu Q

Yang Z

et al. Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV. , 13 Zhou T

Liu Q

Yang Z

et al. Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV. , 14 Li Q

Guan X

Wu P

et al. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. , 15 Riou J

Althaus CL Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. , 16 Zhao S

Lin Q

Ran J

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. , 17 Wu KT

Leung K

Leung GM Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. , 18 Zhao S

Musa SS

Lin Q

et al. Estimating the unreported number of novel coronavirus (2019-nCoV) cases in China in the first half of January 2020: a data-driven modelling analysis of the early outbreak. 0 estimates were also extracted from each study for further analysis. In the event of multiple R 0 estimates—because of preprint revisions after the first version or the use of multiple approaches in a single study—each estimate was recorded and treated as a separate entry to represent all available knowledge at any given point in time ( 0 were posted to SSRN by us on Jan 23, we plotted search trend fractions and news report volume between Jan 23 and Feb 1 ( 0 estimates produced by the 11 studies, including both the mean and the estimate range (eg, 95% CI, 95% credible interval, and so on) presented ( Figure R 0 mean and range estimates from 11 different studies of COVID–19 as a function of time Show full caption 0 estimates were presented in a single study because of the use of multiple approaches, the version number is followed by a single decimal place to indicate the approach used (n.n). If a first author published more than one relevant independent study before Feb 1, the version number is followed immediately by an alphabetical marker ordered by date of publication (nx). Ranges presented vary by study (eg, 95% CI, 95% credible interval, and so on) and are presented in the 0 =basic reproduction number. For preprints that were revised before publication of the first relevant peer-reviewed study on Jan 29, the version number is indicated between parentheses as (n). When multiple Restimates were presented in a single study because of the use of multiple approaches, the version number is followed by a single decimal place to indicate the approach used (n.n). If a first author published more than one relevant independent study before Feb 1, the version number is followed immediately by an alphabetical marker ordered by date of publication (nx). Ranges presented vary by study (eg, 95% CI, 95% credible interval, and so on) and are presented in the appendix (p 3) . R=basic reproduction number. After this initial data discovery phase, which yielded 11 individual studies, date of first publication, publication platform, review status (ie, preprint vs peer-reviewed), and methodological details were manually curated from each study ( appendix p 3 ).estimates were also extracted from each study for further analysis. In the event of multiple Restimates—because of preprint revisions after the first version or the use of multiple approaches in a single study—each estimate was recorded and treated as a separate entry to represent all available knowledge at any given point in time ( appendix p 3 ). Given that the first known preprint estimates for Rwere posted to SSRN by us on Jan 23, we plotted search trend fractions and news report volume between Jan 23 and Feb 1 ( appendix p 4 ). Baseline data for both sources before Jan 23, 2020, yielded negligible search trend interest and news report volume, and data collected up to Feb 9, 2020, showed diminishing interest and volume after the catchment window ( appendix p 4 ). To illustrate when each of the 11 relevant studies became available to the public, indicator bars were overlaid against the search trend and news report data by date of publication ( appendix p 4 ). We then plotted each of the 16 Restimates produced by the 11 studies, including both the mean and the estimate range (eg, 95% CI, 95% credible interval, and so on) presented ( appendix p 3 ). Estimates were plotted by date of publication and alphabetically there-in, offering a side-by-side comparison of preprint versus peer-reviewed results; averages and 95% CIs were also computed for both groups ( figure ).

0 associated with COVID–19 peaked before the publication of relevant peer-reviewed studies during the early stages of the epidemic. In the selected time frame, search interest peaked on Jan 27 after a sharp increase between Jan 23 and Jan 25 immediately after the publication of five early preprint studies—all of which estimated R 0 —in bioRxiv, medRxiv, and SSRN. Meanwhile, news media interest peaked on Jan 28, coinciding with a sixth preprint study published in arXiv ( 14 Li Q

Guan X

Wu P

et al. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. , 19 The New England Journal of Medicine

Frequently asked questions. 0 estimates across the preprint group were 3·61 (95% CI 2·77–4·45) and 2·54 (2·17–2·91) across the peer-reviewed group—showing overlap in 95% CIs despite a wide diversity of modelling methods and data sources used both in-group and across-group ( 0 higher than the 95% CI maximum; 9 Tang B

Wang X

Li Q

et al. Estimation of the transmission risk of 2019-nCov and its implication for public health interventions. , 10 Zhao S

Ran J

Musa SS

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. 0 estimate of 3·02 (95% CI 2·65–3·39) for the preprint group. Notably, two studies in the peer-reviewed group had previously been published as preprints. 15 Riou J

Althaus CL Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. , 16 Zhao S

Lin Q

Ran J

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. Google Search Trends and MediaCloud data suggested that both general (ie, search) interest and news media interest in the Rassociated with COVID–19 peaked before the publication of relevant peer-reviewed studies during the early stages of the epidemic. In the selected time frame, search interest peaked on Jan 27 after a sharp increase between Jan 23 and Jan 25 immediately after the publication of five early preprint studies—all of which estimated R—in bioRxiv, medRxiv, and SSRN. Meanwhile, news media interest peaked on Jan 28, coinciding with a sixth preprint study published in arXiv ( appendix p 4 ). The first peer-reviewed estimates were then published by Li and colleagues in The New England Journal of Medicine on Jan 29 at 17:00 h (eastern standard time), followed by four additional peer-reviewed studies in Eurosurveillance, The International Journal of Infectious Diseases, The Lancet, and Journal of Clinical Medicine up to Feb 1.Average Restimates across the preprint group were 3·61 (95% CI 2·77–4·45) and 2·54 (2·17–2·91) across the peer-reviewed group—showing overlap in 95% CIs despite a wide diversity of modelling methods and data sources used both in-group and across-group ( appendix p 3 ). Although the average mean for the preprint group was higher than that for the peer-reviewed group, this effect was driven primarily by two upper-limit outlier estimates (with Rhigher than the 95% CI maximum; figure ).Exclusion of these two estimates by use of a consensus-based approach based on the 95% CIs yielded an average Restimate of 3·02 (95% CI 2·65–3·39) for the preprint group. Notably, two studies in the peer-reviewed group had previously been published as preprints.Although estimates presented by Riou and Althaus remained unchanged after peer review, estimates presented by Zhao and colleagues were higher before peer review than afterwards.

20 @WHO Our findings suggest that, because of the speed of their release, preprints—rather than peer-reviewed literature in the same topic area—might be driving discourse related to the ongoing COVID-19 outbreak. Although our analysis focused on search trends and news media data as a measure for general discourse, it is likely that preprints are also influencing policy making discussions, given that WHO announced on Jan 26, 2020, that they would be creating a repository of relevant studies—including those that have not yet been peer-reviewed.

21 Pradhan P

Pandey AK

Mishra A

et al. Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag (withdrawn). 22 Oransky I

Marcus A Quick retraction of a faulty coronavirus paper was a good moment for science. 22 Oransky I

Marcus A Quick retraction of a faulty coronavirus paper was a good moment for science. Nevertheless, despite the advantages of speedy information delivery, the lack of peer review can also translate into issues of credibility and misinformation, both intentional and unintentional. This particular drawback has been highlighted during the ongoing outbreak, especially after the high-profile withdrawal of a virology study from the preprint server bioRxiv, which erroneously claimed that COVID-19 contained HIV “insertions”.The very fact that this study was withdrawn showcases the power of open peer-review during emergencies; the withdrawal itself appears to have been prompted by outcry from dozens of scientists from around the globe who had access to the study because it was placed on a public server.Much of this outcry was documented on Twitter (a microblogging platform) and on longer-form popular science blogs, signalling that such fora would serve as rich additional data sources for future work on the impact of preprints on public discourse.However, instances such as this one described showcase the need for caution when acting upon the science put forth by any one preprint.

With this in mind, taking multiple studies into consideration as presented in our analysis can help operationalise the kind of caution necessitated by preprints while simultaneously allowing for important, robust insights before the publication of a peer-reviewed study in the same topic area. Here, we used a simple method in which we plotted the ten R 0 estimates that were posted as preprints before publication of the first peer-reviewed study on Jan 29; we then took the average of these estimates and excluded the two estimates that qualified as upper-limit outliers—both upon visual inspection and as a function of the 95% CI. Even before outlier elimination, this simple method yielded average R 0 estimates similar to those presented by the peer-reviewed studies subsequently published on and after Jan 29; however, more complex approaches that incorporate weighted averages based on estimate confidence, similar to traditional meta-analytical methods, offer a promising avenue for future work. Such collective, consensus-based approaches will arguably be easiest to use when the research of interest is quantitative in nature; nevertheless, given that many crucial epidemiological parameters that inform decision making (eg, incubation period, generation time, and so on) are quantitative, our proposed approach could work well in these contexts as well.

15 Riou J

Althaus CL Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. , 16 Zhao S

Lin Q

Ran J

et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. Our work showcases the powerful role preprints can have during public health crises because of the timeliness with which they can disseminate new information. Furthermore, given that two of the preprints included in this analysis were later published in peer-reviewed outlets, the evidence shows that that even prestigious journals now permit the sharing of important findings before peer review and that the use of preprint platforms does not jeopardise future peer-reviewed publication.Without question, primacy and peer-reviewed publications are key metrics in individual professional advancement (eg, academic promotion); nevertheless, the impact of preprints on discourse and decision making pertaining to the ongoing COVID-19 outbreak suggests that we must rethink how we reward and recognise community contributions during present and future public health crises.

This work was supported in part by grant T32HD040128 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health. We declare no competing interests.

Supplementary Material Supplementary appendix