Summary

We found that trialists engage at length with published correspondence identifying misreporting of pre-specified outcomes. However, inaccurate statements and misunderstandings about what constitutes correct outcome reporting were common, even among trialists publishing in high-impact journals. In addition, response styles such as ad hominem criticism, distraction and denial were commonly used.

Strengths and weaknesses

A larger sample of trials and trialists would have been preferable. Our study included the full correspondence with 20 teams of researchers and could have included all 58 trials with misreported outcomes identified during COMPare: however, our ability to engage with trialists was hindered by journal editors rejecting the majority of initial correction letters identifying misreporting of outcomes, despite clear evidence that these trial reports had all breached the CONSORT guidelines on correct outcome reporting; and by journals rejecting the majority of COMPare follow-up letters engaging with errors in trialists’ responses, as discussed below.

Context of other research

There have been extensive previous anecdotal reports in the grey and academic literature of researchers’ failures to engage constructively with post-publication peer review that is critical of study methods and results. COMPare is the first study to approach and document this problem systematically with a standardised set of correction letters and on an objective issue of accurate study reporting in line with standard best practice guidelines. COMPare is also the first study to systematically solicit and analyse detailed technical responses from a representative sample of trialists and engage them in a practical real-world detailed discussion of outcome reporting using examples of misreporting from their own work to identify knowledge gaps. There has been extensive previous research establishing the high prevalence of outcome misreporting [1] and other reporting flaws [9] and some questionnaire data on the limitations of trialists’ knowledge around correct outcome reporting. One previous survey on the prevalence of outcome misreporting also engaged trialists in semi-structured telephone interviews to explore their reasons for not reporting specific outcomes: this study design yielded less detail in terms of specific misunderstandings or inaccurate statements than ours; however, consistent with our findings, they did report that trialists “seemed generally unaware of the implications for the evidence base of not reporting all outcomes and protocol changes” and that some regarded non-significant results as “uninteresting” [10]. Another series of semi-structured telephone interviews with 59 trialists similarly yielded the finding that non-significant findings are sometimes regarded as uninteresting, and space constraints may hinder complete outcome reporting [11].

Interpretation

It is challenging to reach a fair interpretation of what drives trialists’ incorrect statements about correct outcome reporting. To retain neutrality, we have labelled all of these statements as “inaccurate” rather than either “misunderstandings” or “misleading comments” because it is not possible to know the level of knowledge for all researchers assessed. Some, none, or all of the inaccurate statements documented may have represented genuine misunderstandings or a lack of knowledge. To expand on this, it is possible that these trialists do not know what correct outcome reporting consistent with CONSORT looks like and are making genuine unintended errors; it is also possible that they do not care about CONSORT and are speaking implicitly or explicitly to a more vague alternative set of unstated principles around correct outcome reporting which they regard as superior.

Equally, some, none, or all of the inaccurate statements may have been used deliberately in an attempt to deflect criticism and publicly defend what the researchers knew to be misreporting. This would imply that researchers were not primarily concerned with what constitutes correct outcome reporting but rather with defending their reputation. At face value, it seems likely that anyone with good knowledge of correct outcome reporting, and concerned to defend their reputation, would be equally concerned by the negative reputational consequences of formally publishing a letter that contained clear misunderstandings around what constitutes correct outcome reporting. For this to be a rational position therefore, researchers would also have to believe that the public discussion is likely to be brief, poorly understood by onlookers (or ignored), and unlikely to lead to a resolution establishing who was right or wrong on matters of fact.

To an extent, this view is vindicated by the initial findings of COMPare, where journal editors mostly rejected letters reporting outcome misreporting, and often defended such misreporting, despite the journal’s being publicly listed as endorsing CONSORT. Researchers may also feel bolstered by the fact that a journal has published their paper after peer review and is therefore likely to feel some commitment to supporting it; by the fact that a paper with misreported outcomes is unlikely to be retracted, or even corrected, so this is just a matter for correspondence; and by the fact that letters in journals have lower visibility than original research. Related to the issue of managing the visibility of correspondence, it is notable that some research teams suggested that the discussion on their misreported outcomes should take place as annotations to our raw data archive rather than in the journal where their research was published.

There is also a third option combining both of the previous two: that these were “motivated misunderstandings”, where researchers do not have a full clear working understanding of correct outcome reporting, but are not inclined to develop one, and merely seek to survive a single round of public criticism in the reasonable expectation that any potentially inaccurate statements will not be exposed in the full cycle of post-publication peer review. Under any of these three models, two core problems obtain. First, the failure of journals to curate post-publication peer review such that errors on matters of fact are resolved has resulted in a sub-optimal approach from scientists to the accurate reporting of their own work; second, a widespread lack of knowledge around correct outcome reporting has contributed to both misreporting and poor discourse around that misreporting.

Separately to this, we found many examples of obfuscation, ad hominem criticisms, and other techniques that can fairly be described as “rhetorical”. Although these do not directly relate to the specific issues of outcome reporting and may not be reasonably regarded as unacceptable per se, they are part of a broader set of processes restricting adequate scrutiny of correct reporting. It is also worth noting that we may not have had access to the full breadth of ad hominem comments, because we do not have access to the text of the letters submitted, only those published. Letters published in The Lancet (the majority in our cohort) go through an extensive process of editorial control, proof-reading, and some re-drafting; we note that the tone of BMJ “rapid responses”—which are posted online within hours of submission, and usually unchanged—was often much more raw than the formal letters published after a delay in The Lancet. On the issue of self-censorship, it is also possible that the constitution of the COMPare team reduced the quantity of ad hominem criticism. Because such criticism is based on denigrating the recipient rather than ideas, it likely to be mediated by perceived relative social status, which in turn is mediated by factors such as class, gender and race. It is therefore possible that we received less than a different team might have done, since those submitting correction letters were all academics at Oxford, recently listed as the leading medical research institute in the world; we have a professor and other senior staff on our team; and the COMPare correspondents named on correction letters were all male and mostly identifiable as White British.

A related issue of power relations concerns the question of who should decide whether an outcome requires reporting. CONSORT is clear that all pre-specified outcomes should be reported or discrepancies flagged. As per our section “Trust the trialist”, many trialists stated that outcome switching is irrelevant if it does not affect the outcomes of the study. Ultimately, in our view, this reflects scientists asserting that they should be trusted to faithfully report summary results without oversight and asserting authority over the data as if it were owned by the trialist rather than participants or the wider community. This is inconsistent with the wider societal shift towards greater transparency and accountability in science.

Implications

We identify various implications of our study for editors, funders, trial registries, and ethics and regulators; for initiatives seeking to improve research methods and reporting; and for researchers whether they are publishing work, responding to published work, or consuming published work. We have found that trialists publishing in high-impact journals routinely misreport their pre-specified outcomes and, when challenged, regularly make incorrect statements on the topic of correct outcome reporting. This may reflect a lack of knowledge: where this is the case, we suggest that better education and training on research methods may improve matters. However, trialists are also deprived by journal editors of important feedback that would likely help to raise standards. Journals could improve standards by policing correct outcome reporting, giving feedback to trialists where they have submitted papers that fail to comply with CONSORT standards on outcome reporting, and encouraging trialists to engage positively with feedback on methodological and reporting flaws, as already recommended in ICMJE guidance. In some cases, the incorrect statements made by trialists may reflect deliberate or unconscious use of superficially plausible but incorrect arguments as a rhetorical device to defend misreported studies. Where this is the case, research integrity training may improve standards, alongside support for ongoing efforts to foster a culture of positive and reciprocal critical appraisal in scientific discourse.

Trial registries should emphasise that information on registries is important, give additional guidance on the specific elements required, and give feedback to trialists when registry entries fall short on required information. Registry managers and ethics committees could remind trialists that pre-specified outcomes in protocols and registry entries should match. Ethics committees and funders could take responsibility for “closing the loop” with a report at the end of a project, confirming that all results have been appropriately published, deviations from the ethically approved protocol accounted for, and post-publication peer review engaged with constructively. Organisations such as the EQUATOR (Enhancing the Quality and Transparency of Health Research) network, running the CONSORT guidelines, should disambiguate any areas in their recommendations that are perceived by researchers as unclear, and could offer a service for trialists or journals to check that trials have been correctly reported across a range of methodological issues. Lastly, consumers of the research literature should be aware that the peer-reviewed academic literature contains a high prevalence of misreported research and that efforts to correct this are routinely resisted by journal editors. The majority of initial letters from COMPare were rejected, and the overwhelming majority of responses to authors’ responses were also rejected. Therefore, the extensive errors documented in Table 1, in Additional file 1, and in the longer COMPare correspondence archive currently stand unaddressed and without a published response in the scientific literature, other than in this article.

Lastly, we believe that the rhetorical approaches demonstrated by many respondents in our cohort—such as diversion, hostility, and challenging the legitimacy of having a discussion—will be recognised by academics more broadly. We hope that this will be useful for those writing letters criticising the content of a scientific paper or anxious about a response they have received from an author. Although clarity and professionalism are important, the wide variation in responses we received to our large set of identical correction letters strongly suggests that hostile or obfuscatory responses are, at least in part, a function of the responding authors rather than the letter that stimulated the response.

Future research

The academic literature already contains a very large number of studies which retrospectively document the overall prevalence of methodological flaws or reporting discrepancies in clinical trials. These studies are expensive, requiring skilled labour from experienced researchers to identify a large number of flaws in published research. In our view, by publishing these findings as only a single anonymised prevalence figure, these teams are failing to maximise the value and impact of their work. We suggest that wherever research is done documenting the prevalence of flaws in individual studies, researchers should also submit letters for publication on each individual paper where a shortcoming has been identified, in order to alert other consumers of the academic literature to the presence of specific flaws in specific studies, to generate informative or corrective discussion with the researchers concerned, to raise awareness among individual researchers about flaws in their own research, and to generate dialogue allowing methodologists to better understand the misunderstandings or structural challenges driving methodological and reporting flaws, and so devise interventions to improve standards.