Abstract The consolidation of scientific knowledge proceeds through the interpretation and then distillation of data presented in research reports, first in review articles and then in textbooks and undergraduate courses, until truths become accepted as such both amongst “experts” and in the public understanding. Where data are collected but remain unpublished, they cannot contribute to this distillation of knowledge. If these unpublished data differ substantially from published work, conclusions may not reflect adequately the underlying biological effects being described. The existence and any impact of such “publication bias” in the laboratory sciences have not been described. Using the CAMARADES (Collaborative Approach to Meta-analysis and Review of Animal Data in Experimental Studies) database we identified 16 systematic reviews of interventions tested in animal studies of acute ischaemic stroke involving 525 unique publications. Only ten publications (2%) reported no significant effects on infarct volume and only six (1.2%) did not report at least one significant finding. Egger regression and trim-and-fill analysis suggested that publication bias was highly prevalent (present in the literature for 16 and ten interventions, respectively) in animal studies modelling stroke. Trim-and-fill analysis suggested that publication bias might account for around one-third of the efficacy reported in systematic reviews, with reported efficacy falling from 31.3% to 23.8% after adjustment for publication bias. We estimate that a further 214 experiments (in addition to the 1,359 identified through rigorous systematic review; non publication rate 14%) have been conducted but not reported. It is probable that publication bias has an important impact in other animal disease models, and more broadly in the life sciences.

Author Summary Publication bias is known to be a major problem in the reporting of clinical trials, but its impact in basic research has not previously been quantified. Here we show that publication bias is prevalent in reports of laboratory-based research in animal models of stroke, such that data from as many as one in seven experiments remain unpublished. The result of this bias is that systematic reviews of the published results of interventions in animal models of stroke overstate their efficacy by around one third. Nonpublication of data raises ethical concerns, first because the animals used have not contributed to the sum of human knowledge, and second because participants in clinical trials may be put at unnecessary risk if efficacy in animals has been overstated. It is unlikely that this publication bias in the basic sciences is restricted to the area we have studied, the preclinical modelling of the efficacy of candidate drugs for stroke. A related article in PLoS Medicine (van der Worp et al., doi:10.1371/journal.pmed.1000245) discusses the controversies and possibilities of translating the results of animal experiments into human clinical trials.

Citation: Sena ES, van der Worp HB, Bath PMW, Howells DW, Macleod MR (2010) Publication Bias in Reports of Animal Stroke Studies Leads to Major Overstatement of Efficacy. PLoS Biol 8(3): e1000344. https://doi.org/10.1371/journal.pbio.1000344 Academic Editor: Ian Roberts, London School of Hygiene and Tropical Medicine, United Kingdom Received: August 24, 2009; Accepted: February 18, 2010; Published: March 30, 2010 Copyright: © 2010 Sena et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: We acknowledge financial support from the Scottish Chief Scientists' Office. MRM acknowledges the support of the Edinburgh MRC Trials Methodology Hub. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: CAMARADES, Collaborative Approach to Meta-analysis and Review of Animal Data in Experimental Studies, tPA, tissue plasminogen activator, IL1-RA, interleukin 1 receptor antagonist

Introduction Few publications describing natural phenomena are in themselves sufficient to change our understanding of the world, and knowledge advances through the summarising of data in conference presentations, review articles, and books. Traditionally this process has been rather haphazard, with sometimes partisan experts using narrative review articles to emphasise their own particular perspective. Attempts have been made to account for this bias using the technique of systematic review, in which there is prespecification of the biological question being addressed, the methods through which contributing data will be identified, and the criteria that will be used to select which data are included in the analysis [1]. While systematic reviewers often go to some lengths to identify unpublished data sources, both approaches are potentially confounded by the ability to include only available data. If experiments have been conducted but are not available to reviewers, and if the results of these experiments as a group are not the same as results from experiments that were published, then both narrative and systematic reviews, and the resulting expert opinion and public understanding, will be biased. This is the “file drawer problem” [2],[3]: at its most extreme, the 95% of studies that were truly neutral (that is, which reported no significant effects) remain in the files of the investigators, the 5% of experiments that were falsely positive are published, and reviewers conclude—falsely—that the literature represents biological truth. The consequences of the drawing of erroneous conclusions would be troubling if it involved, for instance, the interpretation of data from clinical trials; indeed, the recognition of a substantial publication bias in this literature has led to the introduction of clinical trial registration systems to ensure that those summarising research findings are at least aware of all relevant clinical trials that have been performed [4]. Publication bias has also been observed in reports of genetic association studies [5] and in ecology and evolution, in which 40% of meta-analyses were confounded by publication bias, and adjusting for publication bias might have altered the conclusions in around one-third of cases [6]. A related group of biases, the citation biases [7], can be addressed through rigorous systematic review, in that an attempt is made to include all relevant publications describing data meeting predefined inclusion or exclusion criteria. However, until recently there has been a paucity of systematic reviews of animal studies [8]. The extent and any impact of publication bias in the experimental sciences are not clear. Timmer and colleagues investigated the process of publication for abstracts submitted to a leading gastrointestinal conference, and suggested both that the most responsibility for nonpublication rested with the authors (76% of unpublished projects were never submitted as a manuscript) and that for basic science studies there was no relationship between the rate of publication and whether the study reported positive, neutral, or negative findings [9]. It has previously not been possible to ascertain the impact of publication bias in animal studies because of the paucity of systematic reviews and meta-analyses, the substantial heterogeneity in the research questions asked in experimental science and in the outcomes reported, and the qualitative rather than quantitative nature of many of those outcomes. Since 2004 the Collaborative Approach to Meta-Analysis and Review of Animal Data in Experimental Studies (CAMARADES) has curated data collected in the context of systematic reviews of reports of studies describing the efficacy in animals of candidate interventions for stroke [10]–[21]. Here we use that dataset, which includes quantitative data for reported outcomes from individual experiments, to estimate the prevalence and impact of publication bias in laboratory science.

Discussion These data provide to our knowledge the first quantitative estimates of the impact of publication bias in the literature describing animal experiments modelling human disease. Only 2.2% of publications identified in the included reviews did not report any significant findings. While our approach can provide at best only approximations of the magnitude of the problem, our data suggest that effect sizes are inflated by around one-third, and we estimate that around one-sixth of experiments remain unpublished. Many would consider these to be conservative estimates, and indeed a recent systematic review of individual animal data supporting the efficacy of NXY-059 showed that two of four unpublished experiments identified in the course of that review were neutral [28]. The different methods used to assess the presence of publication bias gave somewhat different results, which may reflect the different sensitivities of these approaches. However, it is likely that publication bias is highly prevalent in this literature, and this is likely to bias the conclusions drawn in both narrative and systematic reviews. The different methods used to ascertain publication bias gave somewhat different results; Egger regression suggested bias for all 16 interventions, whereas trim-and-fill suggested bias for ten of 16 interventions. Importantly, the median number of publications for those interventions in which trim and fill suggested publication bias was higher (27) than those in which publication bias was not found (10.5), suggesting that when publication number is small the trim-and-fill approach may lack statistical power compared with Egger regression. In discussion of factors that might result in funnel plot asymmetry in animal studies it is important to note that, given their small size and in contrast to clinical trials, variation in study precision relates more to underlying biological variability and to measurement error than to study size. However, there are a number of factors other than publication bias that can cause funnel plot asymmetry [29]: First, because studies of poorer methodological quality tend to overstate effect sizes [30], lower precision in these studies would lead to funnel plot asymmetry. However, we found no association between study precision and methodological quality in the publications contributing to this analysis. Second, the effect size may vary according to the size of individual studies. In clinical trials, smaller studies may involve patients at greater risk of an adverse outcome, in whom the intervention is proportionately more effective; or higher doses or more powerful interventions may be used in smaller studies; or smaller studies may focus on particular groups in whom the intervention is more effective. However, none of these features apply to the animal studies examined here. Third, the studies identified in the individual reviews may not be representative of all studies published. However, the included reviews used detailed search strategies involving multiple electronic databases and conference abstracts; had no language restriction; and where duplicate publication had occurred only one publication was included (see Methods). Selection bias is therefore unlikely. Finally, if more than one outcome measure was studied, and if effect sizes were consistently higher and precision consistently lower for a particular outcome measure, funnel plot asymmetry would result. However, because this analysis is restricted to studies reporting changes in infarct size, such a problem is unlikely to be an issue here. In view of the above, it is important to note that, because we have included all data reporting an effect on infarct volume and not just the largest effect size from each publication, we will have included at least some imprecise studies testing ineffective doses (at the lower end of a dose response curve) or at later time points, which could lead to a reversal of funnel plot asymmetry. For this reason, we think that the present study is more likely to underestimate than to overestimate the effect of publication bias. For meta-analyses of individual interventions, we do not believe that these techniques are sufficiently robust to allow the reliable reporting of a true effect size adjusted for publication bias. This is partly because most meta-analyses are too small to allow reliable reporting, but also because the true effect size may be confounded by many factors, known and unknown, and the empirical usefulness of a precise estimate of efficacy in animals is limited. However, these techniques do allow some estimation both of the presence and of the likely magnitude of publication bias, and reports of meta-analysis of animal studies should include some assessment of the likelihood that publication bias confounds their conclusions, and the possible magnitude of the bias. These quantitative data raise substantial concerns that publication bias may have a wider impact in attempts to synthesise and summarise data from animal studies and more broadly. It seems highly unlikely that the animal stroke literature is uniquely susceptible to the factors that drive publication bias. First, there is likely to be more enthusiasm amongst scientists, journal editors, and the funders of research for positive than for neutral studies. Second, the vast majority of animal studies do not report sample size calculations and are substantially underpowered. Neutral studies therefore seldom have the statistical power confidently to exclude an effect that would be considered of biological significance, so they are less likely to be published than are similarly underpowered “positive” studies. However, in this context, the positive predictive value of apparently significant results is likely to be substantially lower than the 95% suggested by conventional statistical testing [31]. A further consideration relating to the internal validity of studies is that of study quality. It is now clear that certain aspects of experimental design (particularly randomisation, allocation concealment, and the blinded assessment of outcome) can have a substantial impact on the reported outcome of experiments [14]. While the importance of these issues has been recognised for some years [32], they are rarely reported in contemporary reports of animal experiments [33]. The ethical principles that guide animal studies hold that the number of animals used should be the minimum required to demonstrate the outcome of interest with sufficient precision. For some experiments, this number may be larger than those currently employed. For all experiments involving animals, nonpublication of data means those animals cannot contribute to accumulating knowledge and that research syntheses are likely to overstate biological effects, which may in turn lead to further unnecessary animal experiments testing poorly founded hypotheses. We estimate that for the interventions described here, experiments involving some 3,600 animals have remained unpublished. We consider this practice to be unethical. Others have considered the issue of publication bias in animal stroke studies [34], and have made suggestions for how this might be addressed. Given that a framework regulating animal experimentation already exists in most countries, we suggest that this might be exploited to allow the maintenance of a central register of experiments performed, grouped according to their broad topic, anonymised if required, and referenced in publications arising from that work. Those responsible for preparing conference presentations, review articles, and books would then be much better placed to make a reasonable assessment of the extent to which publication bias may confound their conclusions.

Methods We conducted a systematic review for reports of the quantitative impact of publication bias in animal studies by electronic search of PubMed (4 December 2008) with the search term “publication bias”, limited to “animals”. We sought to include publications reporting a quantitative estimate of publication bias in meta-analyses describing the efficacy of interventions in animal models of human disease. Abstracts were independently screened by two investigators (ESS, MRM). We used data from all meta-analyses (published and unpublished) of interventions tested in animal stroke studies reposited in the database of CAMARADES (an international collaboration established in 2004 to support meta-analyses of animal data for stroke), which had been completed by August 2008. These reviews use a standard methodology including a broad search strategy, inclusion and exclusion criteria, systematic searching of multiple online databases, searching of conference abstracts, and screening of search results by two independent investigators. They perform well against the 12-item checklist for systematic reviews of animal studies (Text S2) proposed by Mignini et al. [35], with a median score of 11 (interquartile range 10.5–11). The CAMARADES data management system includes an analytical package to allow weighted and stratified mean difference meta-analysis; included studies are retained in the database for further analysis, and access to this database is publically available on request. The database includes details of each individual experiment, including effect size and its standard error. The reviews from which these data are drawn are representative of the literature; they include 11 of a total of 14 meta-analyses of animal studies of stroke which had been published by the end of 2008. Of the five included reviews unpublished at that time (IL1 RA, thrombolytics other than tPA, growth factors, minocycline, stem cells), one has been published and two are under review. Animal stroke studies report a variety of outcome measures, often measured from the same cohort of animals. To avoid duplication we have restricted the present analysis to reports of effects on infarct size. Where this was determined at multiple time points (for instance using serial MRI), the individual reviews recorded only the last outcome measured. Where a cohort of animals was represented more than once in the database (for instance in studies reporting the effects of tPA and hypothermia in combination), the overall analysis was censored such that each cohort appeared only once. No intervention was the subject of more than one review in the database. For each experiment, effect size and standard error were extracted. For each intervention, and for all interventions together (with individual experiments being pooled for a global analysis), the prevalence of publication bias was assessed using funnel plotting [36], Egger regression [37], and the Duval and Tweedie nonparametric trim-and-fill approach [27] (enabled in METATRIM, an additional module for STATA). The basis of funnel plotting and Egger regression is that, all other things being equal, imprecise studies should be as likely to understate efficacy as to overstate it. Where there is a preponderance of imprecise studies overstating efficacy, and all other things being equal, this suggests that imprecise studies understating efficacy are missing from the analysis, as occurs with publication bias. This leads to asymmetry in the funnel plot and to the movement of the Egger regression line y-intercept away from the origin. The basis of trim-and-fill is the identification of the publications contributing most to funnel plot asymmetry, to suppress these from the analysis, and to recalculate the overall estimate of efficacy. Studies contributing most to asymmetry around this new overall estimate are then suppressed, a new estimate calculated, and so the process continues until no further studies are excluded. Then the suppressed studies are replaced, along with matching imputed studies with an effect size calculated by reflection around the recalculated overall estimate and variance equal to that of the study which they are balancing. The number of imputed studies added to the dataset provides an estimate of the number of missing unpublished studies, and meta-analysis of this enlarged dataset provides an approximation of what the true efficacy might be were publication bias not present. We attempted to estimate the extent of publication bias in the animal stroke literature by measuring the relative and absolute differences between the observed estimate of efficacy and the estimated true efficacy. We tested any relationship between the precision and the methodological quality of individual studies using a ten-item study quality checklist comprising peer-reviewed publication, statement of control of temperature, random allocation to treatment or control, blinded induction of ischemia, blinded assessment of outcome, use of anaesthetic without significant intrinsic neuroprotective activity, appropriate animal model (aged, diabetic, or hypertensive), sample size calculation, compliance with animal welfare regulations, and statement of potential conflict of interests [11]. Despite the potential shortcomings of using aggregate checklist scores rather than assessing the impact of individual study quality items [38], across a range of systematic reviews publications scoring highly on this checklist tend to give lower estimates of treatment effect; while the score has not been formally validated it does have face validity, and has formed the basis for an international consensus statement of Good Laboratory Practice in the modelling of ischaemic stroke [39].

Acknowledgments We are grateful to the authors of individual works who responded to requests for further information and to the members of the CAMARADES group who were involved in individual reviews.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: ESS HBvdW PMWB DWH MRM. Performed the experiments: ESS. Analyzed the data: ESS MRM. Wrote the paper: ESS HBvdW PMWB DWH MRM. Provided data from systematic reviews: ESS HBvdW PMWB DWH. Provided data from systematic reviews and contributed to the design of the study: MRM.