Conclusions Most pivotal studies forming the basis of EMA approval of new cancer drugs between 2014 and 2016 were randomised controlled trials. However, almost half of these were judged to be at high risk of bias based on their design, conduct, or analysis, some of which might be unavoidable because of the complexity of cancer trials. Regulatory documents and the scientific literature had gaps in their reporting. Journal publications did not acknowledge the key limitations of the available evidence identified in regulatory documents.

Results Between 2014 and 2016, the EMA approved 32 new cancer drugs on the basis of 54 pivotal studies. Of these, 41 (76%) were randomised controlled trials and 13 (24%) were either non-randomised studies or single arm studies. 39/41 randomised controlled trials had available publications and were included in our study. Only 10 randomised controlled trials (26%) measured overall survival as either a primary or coprimary endpoint, with the remaining trials evaluating surrogate measures such as progression free survival and response rates. Overall, 19 randomised controlled trials (49%) were judged to be at high risk of bias for their primary outcome. Concerns about missing outcome data (n=10) and measurement of the outcome (n=7) were the most common domains leading to high risk of bias judgments. Fewer randomised controlled trials that evaluated overall survival as the primary endpoint were at high risk of bias than those that evaluated surrogate efficacy endpoints (2/10 (20%) v 16/29 (55%), respectively). When information available in regulatory documents and the scientific literature was considered separately, overall risk of bias judgments differed for eight randomised controlled trials (21%), which reflects reporting inadequacies in both sources of information. Regulators identified additional deficits beyond the domains captured in risk of bias assessments for 10 drugs (31%). These deficits included magnitude of clinical benefit, inappropriate comparators, and non-preferred study endpoints, which were not disclosed as limitations in scientific publications.

Main outcome measures Study design characteristics (randomisation, comparators, and endpoints); risk of bias using the revised Cochrane tool (bias arising from the randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result); and reporting adequacy (completeness and consistency of information in trial protocols, publications, supplementary appendices, clinical trial registry records, and regulatory documents).

In this study, we examined the characteristics of randomised controlled trials that supported approval of cancer drugs by the EMA from 2014 to 2016. We focused on three aspects of cancer drug trials. The first aspect was trial design. A key feature of cancer drug trials is whether they are designed to demonstrate a benefit on overall survival or quality of life. 12 29 30 We determined whether recent randomised controlled trials of cancer drugs included overall survival or quality of life outcomes as endpoints. The second aspect was risk of bias. Previous studies primarily focused on crude metrics of trial quality such as blinding of participants and investigators. 18 19 31 Although these aspects are important, they are not an adequate measure of a trial’s validity because randomised controlled trials with blinding might still be at high risk of bias (conversely, trials without blinding could produce valid results). 32 We aimed to perform risk of bias assessments that more thoroughly evaluated deficits in the design, conduct, analysis, and reporting of randomised controlled trials. 28 The third aspect was the adequacy, completeness, and consistency of reporting across different sources. Trial reporting has improved substantially over the past few decades. 33 Yet, discrepancies can occur between regulatory documents and scientific publications, 34 and might lead to different interpretations. We investigated such discrepancies.

Randomised controlled trials are widely considered to be the “gold standard” for evaluating the clinical efficacy of new drugs. 20 However, flaws in the design, conduct, analysis, or reporting of randomised controlled trials can produce bias in estimates of treatment effect, potentially jeopardising the validity of their findings. A large body of literature documents these biases, which could be substantial in magnitude. 21 22 23 24 25 26 27 For example, in a large meta-epidemiological study of 1973 randomised controlled trials, lack of blinding was associated with an average 22% exaggeration of treatment effects among trials that reported subjectively assessed outcomes. 27 Such non-trivial differences could affect how trial results are interpreted and used in regulatory settings and clinical practice. Therefore, it is imperative to systematically examine the validity of randomised controlled trials that support the approval of new drugs through assessment of the risk of bias in their results. 28

Previous work described the characteristics of pivotal studies supporting new cancer drug approvals in Europe and the United States. Regulators in both settings generally review the same set of clinical studies when approving new drugs. 17 In a large evaluation that focused on US Food and Drug Administration approvals, clinical studies of cancer drugs were less likely to be randomised and double blinded than clinical studies of drugs in other therapeutic areas. 18 In another US study, cancer drugs with orphan (rare disease) designations were less likely to be randomised and double blinded than non-orphan cancer drugs. 19 In Europe, most of the new cancer drug approvals between 2009 and 2013 were supported by at least one randomised controlled trial. 11 However, an increasing proportion of new cancer drugs are approved on the basis of non-randomised, single arm studies. 5

About a third of “positive” randomised controlled trials of cancer drugs report treatment effects that are considered to be clinically meaningful according to the European Society of Medical Oncology Magnitude of Clinical Benefit Scale. 5 14 Moreover, there is no association between magnitude of benefit and drug price. 15 Because cancer drugs are responsible for most of the recent increases in pharmaceutical spending across healthcare systems, 16 the evidence base that supports their market entry warrants close scrutiny.

Recently, cancer drugs have comprised the single largest category of new drug approvals in Europe. In 2017, more than a quarter (24/92) of EMA approvals were for cancer drugs. 4 There is considerable debate and controversy about the therapeutic and economic value of these drugs. 5 6 7 8 9 10 Our recent research showed that most new cancer drugs were approved by the EMA without evidence of benefit on overall survival or quality of life. 11 In recent years there has been a substantial shift towards use of surrogate endpoints such as progression free survival. 12 There is growing recognition that the correlation between surrogate endpoints and overall survival is often poor. 13

Regulatory agencies are responsible for evaluating the clinical efficacy and safety of new medicines. In the European Union, the European Medicines Agency (EMA) serves as the gatekeeper to the pharmaceutical market; clinicians can only prescribe a new drug after it receives the EMA’s approval. 1 The EMA bases its decisions on a small number of key clinical studies completed and submitted by pharmaceutical manufacturers. 2 Between 2012 and 2016, about half of new drugs approved by the EMA were associated with a single pivotal study. 3

No patients or members of the public were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients or members of the public were asked to advise on interpretation or writing up of results. We plan to involve patients and members of the public when disseminating the study results on a publicly available website. We plan to disseminate the findings of this work to patient organisations.

Finally, one researcher (HN) documented the additional limitations of the available evidence highlighted in European public assessment reports and whether these were acknowledged in trial publications. These issues were related to the appropriateness and generalisability of the available evidence and focused on trial features that were not included in the five domains of the Cochrane risk of bias tool. For example, regulators often commented on the magnitude of clinical benefit, choice of comparators, and endpoints because these could affect the relevance of trial findings to clinical practice. Similarly, “maturity” of statistical analyses was routinely discussed because early termination of trials could affect the reliability and interpretation of findings, especially when interim analyses are not prespecified. 51 52

We assessed the risk of bias in each randomised controlled trial twice to compare the completeness and consistency of trial reporting in the regulatory documents and scientific literature. 50 Firstly, we relied on the published articles, and if available, their protocols and supplementary appendices. Secondly, we repeated the assessments by using information available in European public assessment reports alone (without consulting the trial publication, its protocol and supplementary appendix, or clinical trial registry records). There was a minimum “wash out” period of four weeks between assessments. If our risk of bias judgment differed when we used information from regulatory documents versus publications, we noted the reasons for the observed discrepancies. A third version of each risk of bias assessment was derived using the totality of information available for each trial by combining the scientific literature and regulatory documents (“combined information”).

Two trained researchers (XRS and NH) independently assessed the risk of bias in each pivotal randomised controlled trial. Areas of disagreement were resolved by discussion and consensus in face to face meetings, first between the two researchers and then with other members of the project team (HN and CD). Two researchers reached the same overall risk of bias judgment for 74% of trials when using the published articles and for 85% of trials when using the European regulatory documents. A third researcher (HN) independently reviewed and confirmed the accuracy of all assessments. Difficult cases were also discussed among team members with methodological (JS, JPTH, and JACS) and clinical (BG) expertise.

On the fifth domain, studies were at low risk of bias if the results were unlikely to have been selected on the basis of either multiple outcome measurements or multiple analyses of the data.

On the fourth domain, studies were at low risk of bias if outcome assessors were unaware of the intervention received by the study participants. We judged trials to be at high risk of bias if outcome assessors were not masked (or if blinding could be compromised) and outcome assessment could be influenced by knowledge of the intervention received. According to previous meta-epidemiological reviews, lack of blinding of outcome assessors in randomised controlled trials is associated with 36% exaggerated treatment effects for subjective outcomes. 49

On the third domain, studies were at low risk of bias if outcome data were available for all or nearly all randomised participants, as reported in the CONSORT (Consolidated Standards of Reporting Trials) flow diagram. 46 Unless there was evidence that results were robust to missing outcome data in sensitivity analyses, we considered trials to be at high risk of bias if the proportions of, or reasons for, missing outcome data differed between experimental and control groups (potentially resulting in an imbalance in censoring rates). 47 48 For time to event analyses, trials were judged to be at high risk of bias if, firstly, the proportions of participants who withdrew their consent to take part in the study differed between trial arms, and participants were censored when they withdrew consent; or secondly, participants who discontinued treatment were censored. In these cases, missingness could depend on the true value of the outcome. In some trials, patients who withdrew their consent continued to be followed up and contribute outcome data, unless they specifically withdrew their consent for further tumour assessments. In such cases, we judged the trials to be at low risk of bias.

On the second domain, our assessment was based on the effect of assignment to the interventions at baseline (the “intention to treat” effect). Studies were at low risk of bias if there were no deviations from intended interventions. We judged trials to be at high risk of bias if there were clear deviations from the intervention that was intended in the trial protocol; if such deviations were not balanced between the experimental and control groups; and if these deviations likely influenced the outcome. Trials were also judged to be at high risk of bias on this domain if some participants were not analysed in the group to which they were randomised and if there was potential for a substantial impact on study findings. Open label trials were not automatically at high risk of bias owing to deviations from intended interventions; similarly, trials with blinding were not immune to bias by default. In trials that masked participants, carers, and trial personnel, we carefully considered whether blinding could be compromised because of major differences in drug adverse events. 44 45 However, compromised blinding led to high risk of bias judgments for this domain only if there were deviations from intended interventions that were not balanced between groups and potentially influenced the outcome.

On the first domain, trials were judged to be at low risk of bias if they adopted appropriate methods to generate and conceal the allocation sequence. 42 43 We also examined whether there were imbalances in group sizes, baseline characteristics, or key prognostic factors that suggested a problem with the randomisation process. Previous meta-epidemiological studies have shown that trials with inadequate or unclear sequence generation and allocation concealment have on average 7-10% exaggerated treatment effects compared with trials with adequate methods. 26

Our assessment focused on the primary endpoint of each pivotal randomised controlled trial. If a trial had clinical and surrogate measures as coprimary endpoints, we relied on the clinical outcome (for example, risk of bias assessment was based on overall survival if a trial included overall survival and progression free survival as coprimary endpoints), unless data were only available for the surrogate measure at the time of EMA approval.

We used the revised Cochrane risk of bias assessment tool (RoB 2.0, version 2016, available at www.riskofbias.info ) to examine the internal validity of randomised controlled trials. 39 The Cochrane risk of bias tool was initially published in 2008 28 and has been widely used in systematic reviews of randomised controlled trials. 40 The updated version considers five bias domains: (a) bias arising from the randomisation process; (b) bias owing to deviations from intended interventions; (c) bias caused by missing outcome data; (d) bias in measurement of the outcome; and (e) bias in selection of the reported result. 41 Risk of bias judgments were based on answers to a series of signalling questions in each of the five bias domains. We relied on the tool’s standard algorithms to map our responses to signalling questions to risk of bias judgments. As recommended in the guidance document, if a trial was judged to be at “high risk of bias” in one domain, we considered it to be at high risk of bias overall. 39 In addition, a trial judged to have “some concerns” in three or more domains was considered to be at high risk of bias overall. 39

We documented the therapeutic indications for which cancer drugs received EMA first marketing authorisations. Pivotal studies were then characterised in terms of their design (randomised v non-randomised), study arms (experimental treatment and comparators), and primary and secondary endpoints. We categorised pivotal studies as randomised controlled trials if participants were randomly allocated to different treatment arms. We noted if trial endpoints included overall survival, health related quality of life, progression free survival, disease response rates or response duration. In addition to recording the primary endpoint of each study, we noted whether the secondary endpoints included overall survival or quality of life outcomes. All data were collected independently by two researchers (XRS and NH) and verified by a third (HN).

Regulatory approval could precede the publication of trial results in the scientific literature. We searched publicly available clinical trial registries in Europe (European Clinical Trials Database: EudraCT) and the US (US National Library of Medicine database of clinical trials: ClinicalTrials.gov) to identify published accounts of pivotal studies in the scientific literature. The latest search date was 15 May 2018. ClinicalTrials.gov routinely searches PubMed and automatically retrieves the peer reviewed publications associated with each record in the registry (study sponsors can also manually enter information on trial publications in ClinicalTrials.gov). 38 Therefore, we primarily relied on ClinicalTrials.gov to identify the publications of pivotal studies supporting EMA approvals during our study period. We cross checked EudraCT to capture any studies which might have been missed in ClinicalTrials.gov. If available, we also identified the protocols and supplementary appendices of pivotal studies. When the protocol was not available, we contacted the corresponding authors and requested access to their study protocol.

We included the pivotal studies that formed the basis of cancer drug approvals during our study period. Two researchers (XRS and NH) independently identified the pivotal studies, which were defined as those labelled as “main studies” in the European public assessment reports on the EMA website. A third researcher (HN) confirmed the list of pivotal studies. European public assessment reports are summaries of documents compiled by rapporteurs from European member states by using data submitted by pharmaceutical companies. These reports include publicly available information on the characteristics, findings, and EMA’s appraisal of pivotal and supportive clinical studies that support marketing authorisation decisions of new products.

We noted when a drug received a “conditional marketing authorisation” from the EMA. Conditional marketing authorisations are granted for drugs aimed at treating serious or life threatening conditions with an unmet medical need. 35 Such approvals rely on less comprehensive data than those required for regular marketing authorisations, and pharmaceutical companies are required to conduct additional studies to evaluate the clinical benefit of their products after market entry. 36 37 We also noted if a drug received an “orphan drug” designation, which is granted for the treatment of rare diseases.

Our study period ended in 2016, which allowed a minimum of one year for trials to be published in the peer reviewed literature after authorisation. We excluded approvals for the treatment of benign tumours, supportive treatments, and generic products, which was consistent with our previous study. 11

Two researchers (XRS and NH) independently searched the publicly available EMA database of European public assessment reports from 1 January 2014 to 31 December 2016. They used Anatomical Therapeutic Chemical Classification (ATC) codes L01-04 to identify “antineoplastic and immunomodulating” agents for solid tumours and haematological malignancies that received “first marketing authorisations”. A third researcher (HN) independently confirmed the sample of cancer drug approvals during this period. We excluded “type 2 variations,” which are additional marketing authorisations of already approved drugs in new therapeutic indications.

Overview of findings at the drug level according to approval characteristics (orphan v non-orphan conditions; conditional marketing authorisations v regular approvals). The figure shows whether there was at least one randomised controlled trial at low risk of bias; and whether there was at least one randomised controlled trial at low risk of bias and without major regulatory concerns before approval

Figure 4 summarises our findings according to approval characteristics. Of 13 cancer drugs approved in orphan conditions, 4 (31%) had at least one randomised controlled trial at low risk of bias and without major regulatory concerns. The corresponding number was 5 among the subset of 19 drugs (26%) approved in non-orphan conditions. A lower proportion of cancer drugs with conditional marketing authorisations had at least one randomised controlled trial at low risk of bias and without major regulatory concerns compared with drugs with regular EMA approvals (1/5 (20%) v 8/27 (30%), respectively).

Overview of findings at the drug level. For each cancer drug approval from 2014 to 2016, the table shows whether there was at least one randomised controlled trial supporting the EMA’s approval decision; whether there was at least one randomised controlled trial evaluating overall survival as a primary or coprimary endpoint; whether there was at least one randomised controlled trial at low risk of bias; whether EMA scientists and committee members raised additional concerns about the appropriateness of the available evidence according to factors that were not captured in risk of bias assessments; and whether EMA committee members issued a divergent opinion on the approval decision based on those concerns

Table 2 summarises our findings at the cancer drug level. Of 32 new cancer drugs approved by the EMA from 2014 to 2016, 27 entered the European market with at least one randomised trial. Of the cancer drugs with randomised controlled trials, only seven were evaluated in trials powered to measure overall survival as a primary or coprimary endpoint. Half (n=16) of cancer drugs had at least one randomised controlled trial at low risk of bias. European regulators identified other concerns for 7 of the 16 drugs that had at least one randomised controlled trial at low risk of bias.

The published article only reported the first set of interim results, which were initially submitted to the EMA and stated that “in accordance with the statistical analysis plan in the protocol and the principle of group sequential design, this was the final statistical analysis of progression free survival” despite providing more mature data on progression free survival data to the European regulators.

In response to the manufacturer’s request for reexamination, a SAG meeting was convened. The SAG “considered that on the basis of the primary PFS analysis, which was conducted according to the pre-specified statistical considerations, the trial met its objective of showing a statistically and clinically significant improvement in PFS.” Although the CHMP still maintained that “the total available evidence on efficacy is not as comprehensive as normally would be required,” a conditional marketing authorisation was ultimately granted.

There were several concerns regarding the single pivotal trial submitted by the manufacturer. Notably, there was a “worsening of results between the initially submitted analysis and the updated analysis.”

This was not highlighted in the published account of the trial. Instead, the publication reported: “although this study did not have a control arm, patients with the degree of treatment refractoriness in our study historically have poor outcomes.”

Despite the statement in the EPAR that “it cannot be concluded that an effect on OS has been established for talimogene,” the published article reported that it was “well tolerated and resulted in a higher DRR (P<.001) and longer median OS (P=.051) …” The discussion section added, “combined with the limited toxicity observed, these are clinically important results.”

In addition, “although there appeared to be an effect on OS in the subgroup of patients with Stage IIIB-IVM1a disease, OS was a secondary endpoint and the effect was based on exploratory subgroup analyses, after the analysis in the full analysis set was not statistically significant, and without a pre-specified strategy for multiplicity adjustment.”

The CAT expressed concerns over the comparator; questioned “the validity of using DRR as the primary endpoint for the pivotal trials as opposed to using other more robust endpoints such as PFS or OS”; and questioned the clinical relevance of the magnitude of treatment effect on DRR.

While the EPAR showed that “the results from the EORTC QLQ-C30 captured a consistently negative effect by the experimental regimen compared to the control arm …,” these results were not reported in the primary publication of the trial.

According to the EPAR, the CHMP concluded that “the benefit in PFS has not been translated into a similar relative benefit in OS.” A SAG meeting was convened to discuss whether the clinical benefit of panobinostat was “sufficient to justify exposing these patients to the severe adverse event profile of the drug.”

Although the published article reported that “baseline characteristics were similar in the nivolumab and [investigators’ choice chemotherapy] study groups, with the exception of history of brain metastases and high lactase dehydrogenase, which were higher in the nivolumab than the [investigators’ choice chemotherapy group],” findings on OS were not reported.

In addition, “the CHMP had concerns over the shape of the OS curves … The applicant provided a discussion on the possible confounding factors, mainly that there were baseline imbalances between the nivolumab arm and chemotherapy arm.”

According to the EPAR, “the CHMP was concerned that … carrying out an unplanned interim analysis … was questionable and would introduce uncertainties such as the potential for informative bias.” While the published article specified that this analysis was “unplanned,” it did not comment on the potential implications of this decision.

In contrast, the published article for the REGARD study stated that: “the survival benefit with ramucirumab was consistent across almost all subgroups. Although the effect on OS was attenuated in women, the PFS estimate in women favoured ramucirumab.”

In addition, some CHMP members “considered that the effect associated with ramucirumab as single agent was too marginal and possibly even inferior to single agent chemotherapy that are used in this setting.”

According to the EPAR, there was a differential OS outcome across regions in the REGARD study. “Inconsistency was also observed regarding gender (ramucirumab effective in men, but potentially detrimental in women).”

These limitations were not acknowledged in the published article. However, the published article reported that: “Our data cannot address differences that might exist between patients with BRCA germline mutations and those with a BRCAness phenotype.”

The target patient population for which approval was sought was retrospectively defined. In addition, “the SAG was uncertain about the true effect of olaparib in this [target] patient population due to the shortcomings of the pivotal study being a small phase II randomised study with a large percentage of censored observations for PFS analysis, and in view of the absence of improvement in OS.”

The pivotal study was terminated early owing to efficacy. According to the EPAR, “the magnitude of the treatment effect is therefore not well defined and further follow-up is needed.”

Regulatory reviewers identified additional limitations in the evidence base beyond the domains captured in the risk of bias assessments for 10 drugs (32%). These limitations focused on the appropriateness and generalisability of the available data and included choice of comparators, study endpoints, interim analyses, or a combination of factors ( box 1 ). In five cases, the committee questioned either the consistency or magnitude of the observed clinical benefit. The regulatory reviewers raised concerns that were substantial enough to warrant a divergent committee opinion in four cases, all of which were judged to be at low risk of bias according to our risk of bias assessment. None of these regulatory concerns were fully disclosed and discussed as limitations or uncertainties in the scientific literature (appendix box 1).

Table 1 lists the reasons for observed differences. Overall, the content and consistency of reporting varied between the two sources. For example, the methods adopted in generating and concealing the allocation sequence were more readily available in the scientific literature than in regulatory documents (n=15). In contrast, major protocol deviations were only explicitly reported in regulatory documents, albeit inconsistently (n=3). Although protocols were available for 23/39 randomised controlled trials in our sample, we could not spot major deviations without explicit acknowledgment and discussion of such deviations in the reports. For the remaining bias domains, there was no discernible pattern. While some regulatory documents had more complete reporting in terms of missing outcome data, this information was more routinely and comprehensively reported in the scientific literature in other cases.

Risk of bias assessments using combined information from the scientific literature and regulatory documents, only information available in the scientific literature, and only information available in regulatory documents. Risk of bias assessments were based on the primary efficacy endpoints

Figure 3 shows the domain specific risk of bias judgments when we considered information reported in the scientific literature (trial publication, protocol, appendix, clinical trial registry record) and European public assessment reports separately. Our judgments differed for at least one domain in 26 out of 39 randomised controlled trials (table 1). Most of these differences did not change the overall risk of bias judgments, however our conclusions changed for eight randomised controlled trials (21%).

Fewer randomised controlled trials that evaluated overall survival as the primary or coprimary endpoint were at high risk of bias than those that evaluated surrogate efficacy endpoints (2/10 (20%) v 16/29 (55%), respectively). Of the 16 randomised controlled trials with surrogate endpoints that were at high risk of bias, our judgments were informed by concerns about missing outcome data for six randomised controlled trials; measurement of the outcome for three randomised controlled trials; and a combination of domains for seven randomised controlled trials.

Taken together, 19 randomised controlled trials (49%) were at high risk of bias overall, 2 (5%) had some concerns, and 18 (46%) were at low risk of bias according to our assessments using the revised Cochrane tool. Of the 19 randomised controlled trials that were at high risk of bias overall, 13 had one domain at high risk of bias, 5 had two domains at high risk of bias, and 1 had three domains with some concerns. Detailed justifications for our judgments are included in appendix table 3.

Risk of bias assessment for the pivotal randomised controlled trials supporting European Medicines Agency cancer drug approvals using information available in the scientific literature (trial publications, protocols, supplementary materials, and clinical trial registry records) and regulatory documents. Risk of bias assessments were based on the primary efficacy endpoints. DFS=disease free survival; EFS=event free survival; HRQoL=health related quality of life; OS=overall survival; PFS=progression free survival

Figure 2 shows the risk of bias in pivotal randomised controlled trials of cancer drugs by using combined information obtained from regulatory documents and the scientific literature. Based on our answers to signalling questions (appendix table 3), we judged two trials to be at high risk of bias that arose from the randomisation process. Four trials were at high risk of bias owing to deviations from intended interventions. Twenty three randomised controlled trials had some concerns because of deviations from intended interventions, which reflected either lack of blinding or risk of compromised blinding; however, none of these was responsible for a high risk of bias judgment overall. Ten trials were judged to be at high risk of bias owing to missing outcome data and seven because of measurement of the outcome. All randomised controlled trials were at low risk of bias in selection of the reported result.

We were able to identify published accounts of 39 of 41 randomised controlled trials. The two unpublished randomised controlled trials were for pegaspargase, which was previously approved by national health authorities in several European countries. European public assessment reports were available for all 39 randomised controlled trials. Trial protocols were publicly available for 21 randomised controlled trials and we gained access to two additional protocols.

Thirty two cancer drug approvals were supported by 54 pivotal studies. Of these, 41 (76%) were randomised controlled trials (two of which randomised patients to different doses of the experimental treatment), 11 (20%) were single arm studies that evaluated the experimental treatment alone without a comparator, and 2 (4%) were non-randomised comparative studies. Only seven cancer drug approvals were supported by two or more randomised controlled trials. Two of 13 cancer drugs (15%) with orphan designation were supported by single arm studies alone compared with 2 of 19 non-orphan drugs (11%). Two of 5 cancer drugs (40%) with conditional marketing authorisations were supported by single arm studies alone, whereas 2 of 27 drugs (7%) with regular approvals had only single arm studies.

Figure 1 shows the process that led to identification of our study sample. Of 64 potentially relevant marketing authorisations granted by the EMA between 1 January 2014 and 31 December 2016, 48 were for cancer products. After we excluded 16 generic and supportive care drugs, our sample consisted of 32 cancer drug approvals. A total of five (16%) approvals were indicated for the treatment of multiple myeloma, four (13%) for melanoma, and four (13%) for lung cancer (appendix table 1). During this period, 13 (41%) cancer drugs were approved for orphan indications and five (16%) received conditional marketing authorisations.

Discussion

Summary of findings In this study, we evaluated the evidence base underpinning the EMA’s recent cancer drug approvals. Between 2014 and 2016, a quarter of pivotal studies supporting cancer drug approvals were not randomised designs. Of the 39 randomised controlled trials that formed the basis of new cancer drug approvals, almost three quarters did not measure overall survival or quality of life outcomes as primary endpoints. Using the revised Cochrane tool, we judged 49% of randomised controlled trials to be at high risk of bias. Our judgments changed in either direction for a fifth of randomised controlled trials when we relied on information reported in regulatory documents and scientific publications separately. Regulators identified additional deficits beyond the domains captured in risk of bias assessments for several trials, which were not disclosed as limitations in scientific publications. The three key findings of this study warrant further discussion. Firstly, our evaluation characterises the design features of contemporary cancer drug trials. Although randomised controlled trials accounted for about 90% of pivotal studies from 2009 to 2013,11 such designs accounted for 75% of studies from 2014 to 2016. A growing proportion of recent cancer drug approvals were based on single arm studies, which are more likely to receive conditional marketing authorisations that target indications with unmet medical need.53 Even when trials had comparators, their appropriateness was at times questionable. We found two randomised controlled trials in which participants were randomised to receive different doses of the same experimental treatment (without a control). In other cases, the comparator either precluded isolation of the effect of the experimental treatment or did not adequately reflect standard of care; these trials were subsequently criticised by the EMA. In terms of study endpoints, only a quarter of randomised controlled trials were powered to evaluate overall survival as the primary outcome. According to the recent EMA guidelines on the evaluation of anticancer treatments,54 “convincingly demonstrated favourable effects on overall survival are from both a clinical and methodological perspective the most persuasive outcome of a clinical trial.” Yet, most cancer drugs were approved on the basis of other endpoints, such as progression free survival and disease response. Recent systematic reviews showed that progression free survival and disease response do not consistently translate to survival gains or quality of life benefits.1355565758 Cancer drugs that appear effective on these surrogate measures could even turn out to be harmful.59 Secondly, the evidence base underpinning EMA approvals of new cancer drugs has methodological weaknesses. In this study, the primary domains responsible for the high risk of bias judgments were missing outcome data and measurement of the outcome (see box 2 for illustrative examples). In several trials, the proportions and reasons for missing outcome data differed, which probably resulted in unbalanced censoring60 and potentially favoured the experimental drug.61 Considerable differences in the toxicity profiles of drugs was another common issue. When such differences were substantial, we concluded that blinding of participants, carers, or investigators could be compromised. As recognised by EMA scientists, “the real effectiveness of the blinding for cancer drugs can always be questioned.”62 We judged trials to be at high risk of bias if a subjective primary outcome (such as progression free survival) was assessed by local investigators who we concluded might no longer be blinded to treatment allocation. This judgment was also supported by the recent US Food and Drug Administration guidelines that recommend independent verification of tumour assessment endpoints when the adverse event profiles of comparator treatments could substantially unblind the trial in practice.63 Box 2 Illustrative examples of trials judged to be at high risk of bias because of missing outcome data and measurement of the outcome Elotuzumab We judged one of the pivotal trials of elotuzumab to be at high risk of bias. In trial CA204-009, which was open label, outcome assessors were aware of the intervention received by study participants, and assessment of the progression free survival outcome could have been influenced by knowledge of the intervention received. Because there was no blinded central assessment of outcomes, we concluded that this trial was at high risk of bias in measurement of the outcome. This potential limitation was also acknowledged in the trial publication. Nivolumab We judged one of the pivotal trials of nivolumab to be at high risk of bias because of missing outcome data. In trial CA209-037, outcome data were potentially missing for a considerable proportion of the population. In the investigator’s choice chemotherapy arm of the trial, 22/133 (16.5%) patients withdrew their consent, which meant withdrawing consent from the full protocol, including study treatment, study procedures, and survival follow-up. Proportions of missing outcome data and reasons for missing outcome data differed across intervention groups: 16.5% v <1% of patients withdrew their consent in the investigator’s choice chemotherapy and nivolumab arms of the trial, respectively. Missingness in the outcome could be related to both the intervention group and the true value of the outcome. Also, there were no sensitivity analyses conducted to test the robustness of study results to different assumptions about missing outcome data. Trametinib We judged the pivotal trial of trametinib to be at high risk of bias. In trial BRF113220, there was potential evidence of unbalanced censoring. According to the European public assessment report, there were relatively large proportions of censored participants in both trial arms. The censoring method included censoring for extended loss to follow-up, new anticancer therapy, and excluding symptomatic progression. While 31% of participants who received dabrafenib 150 mg were censored, 11% were censored among participants receiving trametinib. In the absence of evidence that results were robust to the presence of potentially missing outcome data (we were unable to find the results of “eight sensitivity analyses planned to investigate the robustness of progression-free survival against these censoring rules”), we concluded that this trial was at high risk of bias due to missing outcome data. In addition, local investigators in trial BRF113220 were unblinded and therefore aware of the intervention received by study participants. Because the assessment of the progression free survival outcome could be influenced by knowledge of the intervention received, the trial was also at high risk of bias owing to measurement of the outcome. The authors reported local results as their main analysis, although the results from the blinded review committee were also available in the main body of the publication. Results obtained from blinded assessment were less pronounced (hazard ratio for progression free survival was 0.39, 95% confidence interval 0.25 to 0.62, according to local assessment, compared with 0.55, 0.33 to 0.93). RETURN TO TEXT Thirdly, the regulatory documents and scientific publications had limitations in their reporting. Some of these limitations followed a discernible pattern. For example, key design elements of randomised controlled trials such as sequence generation and allocation concealment were consistently reported in trial publications, their protocols, or appendices. In contrast, regulatory documents seldom specified randomisation methods; instead, regulators often discussed potential imbalances in baseline characteristics to gauge the success of randomisation.64 Other discrepancies among regulatory documents and publications were less predictable. While both sources typically showed the flow of participants, strategies to deal with missing outcome data and the sensitivity of findings to censoring rules and assumptions were only haphazardly reported. Moreover, neither source consistently reported major deviations from intended interventions.

Comparison to other studies in the literature Previous studies have documented the shift in cancer trials away from evaluating overall survival in the 1970s to measuring surrogates of clinical benefit in more recent decades.296566 Our findings confirm that this trend has continued for trials that informed regulatory decisions from 2014 to 2016. Similar to recent evaluations, we found low risk of bias arising from the randomisation process and selection of the reported result.3367 Our concerns about missing outcome data were supported by an earlier study which showed that about one third of breast cancer trials had differential rates and reasons for censoring.61 Finally, our findings concur with those from previous studies which showed that incorporating data from additional documents often improves risk of bias assessments.68697071

Implications for practice and policy Our findings highlight the need to improve the design, conduct, analysis, and reporting of cancer drug trials.72 Regulatory agencies and their evidence requirements shape the design features of pivotal trials.737475 Therefore, regulatory action is needed to ensure that pharmaceutical manufacturers routinely evaluate their products in randomised trials that collect data on meaningful outcomes. In the absence of such data, it remains difficult to know whether new cancer drugs meet the needs of patients, clinicians, and healthcare systems. While some of the methodological problems identified in our study were avoidable (for example, by ensuring adequate sequence generation and allocation concealment), others could be less straightforward to address in complex cancer trials. For example, ensuring outcome data availability when participants withdraw their consent might not be possible. The proportions of participants who withdrew their consent frequently differed among trial arms, which likely reflected meaningful differences in toxicity profiles of comparator drugs. In such instances when missingness could depend on the true value of the outcome, it is essential to evaluate the sensitivity of trial findings to different assumptions about missing outcome data.76 However, some trials censored participants when they changed from their assigned treatment. This analysis strategy is not appropriate when estimating intention to treat effects and could lead to bias because of missing outcome data. Addressing other methodological problems in cancer trials could be feasible, but they come at a cost. Strategies to prevent unblinding might add complexity to trial designs.2377 For example, methods to avoid unblinding in randomised controlled trials include centralised dosage modification of treatments and centralised assessment of clinical side effects.77 Similarly, independent clinicians could perform blinded central evaluation of tumour assessment endpoints, but this might have major cost implications.78 In a recent review of randomised controlled trials in solid tumours, there was no systematic bias between the findings from blinded independent central review and local assessment, but there were statistical inconsistencies between the two sets of results in almost a quarter of trials.79 Moreover, previous meta-epidemiological reviews across different therapeutic areas have found that studies with non-blinded assessors of subjective outcomes generate biased findings.4980 This research strengthens the argument in favour of implementing blinded centralised assessments of tumour endpoints despite the associated costs. An important design consideration in cancer trials is the choice of primary endpoint. Surrogate measures of clinical benefit (eg, progression free survival and disease response) have important feasibility advantages because they can be assessed earlier and with smaller sample sizes (and therefore fewer resources) compared with overall survival. A recent study found that cancer drugs approved on the basis of surrogate measures had on average an 11 month shorter development duration compared with drugs approved on the basis of overall survival.81 However, the feasibility advantages of using surrogate measures should be weighed against their several disadvantages. Firstly, patients might misinterpret such endpoints and overestimate the magnitude of benefit associated with new cancer drugs.8283 Secondly, the strength of the correlation between surrogate and clinical outcomes in cancer trials is unclear.13 Over the past decade, several drugs (eg, bevacizumab in metastatic breast cancer)84 approved on the basis of surrogate measures failed to demonstrate overall survival gains in subsequent trials. In the recent BELLINI trial, patients who received venetoclax had shorter survival than those who received a control treatment (even though venetoclax appeared more effective than the control on the basis of progression free survival and response rate).59 Thirdly, and as recommended by regulators, unblinded trials (or trials at risk of unblinding) with surrogate endpoints might require additional (costly) safeguards such as independent blinded endpoint review to minimise risk of bias.63 Finally, and perhaps most importantly, evidence of overall survival benefit might never emerge for cancer drugs approved on the basis of surrogate measures alone.85 In an earlier study, we found that data on overall survival did not emerge in the postmarketing period for more than 90% of indications for which there was no evidence of such a benefit at the time of marketing authorisation.11 Taken together, these findings support more widespread use of overall survival as the primary endpoint in pivotal trials of new cancer drugs. Randomised controlled trials with overall survival endpoints were less likely to be at high risk of bias in our sample; this finding is consistent with an earlier assessment that showed that 66% of randomised controlled trials evaluating overall survival were at low risk of bias68 (the corresponding figure in our study was 80%). Overall survival would be largely immune to the risk of bias attributable to potential unblinding of outcome assessors and missing outcome data.222627 There is also an opportunity to further improve the reporting standards of regulatory documents and scientific publications. Publication of the CONSORT statement in 1996 (and its update in 2010)46 has led to major improvements in randomised controlled trial reporting in the scientific literature.86 Also, the 2015 revision87 of the European public assessment report template addressed some of the previous criticisms.88 Currently, publications and regulatory documents make it difficult to distinguish between trial deficits that can be avoided and those that are more difficult to address. When methodological shortcomings are inevitable (eg, missing outcome data owing to withdrawal of participant consent), more transparent reporting is warranted. In addition, key information required to perform risk of bias assessments is still inconsistently reported in regulatory documents, trial protocols, publications, supplementary appendices, and clinical trials registries. For example, neither journal articles nor regulatory documents discuss the possibility that trial investigators could be unblinded when the adverse event profiles of comparator treatments are substantially different. Similarly, these sources do not consistently report the occurrence of protocol deviations that arose from the experimental context; whether deviations are balanced between the groups; and whether deviations could affect the outcome. Journal editors and European regulators can take further action to facilitate more complete and consistent reporting of pivotal studies.89 Our recommendations for improving the design, conduct, analysis, and reporting standards of cancer trials are listed in box 3.29 Box 3 Recommendations to improve the design, conduct, analysis, and reporting of pivotal trials of new cancer drugs Because overall survival would be largely immune to several sources of potential bias, regulators should require overall survival to be the primary endpoint of pivotal trials.

Other desirable trial endpoints include quality of life and measures with established surrogacy.

The magnitude of benefit associated with new cancer therapies should be carefully considered in trial design.

When subjectively assessed outcomes are used as primary endpoints, trial sponsors should implement blinded independent central review of tumour assessments.

If the trial is blinded, trial sponsors and investigators should adopt strategies to avoid unblinding of investigators, for example centralised dosage modification of treatments; this is especially important when outcome assessment is not blinded.

Trial sponsors and investigators should report the risk of unblinding in trials in which investigators are not aware of treatment allocation; this is especially important when outcome assessment is not blinded.

Trial sponsors and investigators should conduct sensitivity analyses to evaluate the robustness of trial results to missing outcome data. Regulators and journal editors should require consistent reporting of the findings of these sensitivity analyses.

Regulators and journal editors should require consistent reporting of any major deviations from intended interventions that arose from the experimental context; whether deviations are balanced between the groups; and whether deviations could affect the outcome. RETURN TO TEXT

Limitations Our study had several limitations. We did not include clinical study reports, which, according to previous reviews, might provide the most comprehensive set of information on randomised controlled trials.5090 Because there is no established guidance on how to feasibly collect information from such reports, their use in systematic reviews remains limited.9192 We focused on cancer drug trials, therefore the generalisability of our findings to trials in other therapeutic areas is unclear. Nevertheless, cancer drugs comprise the single largest category of recent drug approvals.9394 Additionally our study included cancer drug approvals between 2014 and 2016; characteristics of randomised controlled trials that supported EMA approvals during our study period might not reflect the design, conduct, analysis, and reporting of cancer drug trials outside of this period. Furthermore, our risk of bias assessments were not blinded to study results because risk of bias assessments require examination of results. However, a systematic review of randomised trials did not identify evidence overall of a difference in risk of bias judgments between blinded and unblinded assessments.95 We examined the risk of bias, rather than bias itself. Therefore, it remains a possibility that trial results are unbiased despite the methodological flaws identified in our assessments. According to previous studies, risk of bias judgments based on publications alone might rely on incomplete information,50 and might not reflect the true methodological rigour of underlying studies.6970 To address this issue, we relied on a combination of regulatory documents, trial protocols, publications, supplementary materials, and clinical trials registries. In some cases, our judgments were substantiated with potential evidence of bias; for instance, when outcome measurements available from unblinded local investigators produced exaggerated findings compared with those obtained from an independent panel of masked assessors. For example, the magnitude of progression free survival benefit reported in the BRF113220 trial was less pronounced when assessed by an independent committee than by investigators (hazard ratio 0.39 v 0.55).96 Notably, our findings do not imply that EMA decisions are biased, and they do not suggest that pharmaceutical manufacturers deliberately introduce bias into their trials. Instead, our findings identify methodological shortcomings in pivotal trials of new cancer drugs. Finally, our assessments focused on the primary endpoints of randomised controlled trials; it remains possible that results for other outcomes could be at lower risk of bias. However, this is unlikely because pervasive limitations are well documented for secondary endpoints of cancer drug trials, including harms979899 and quality of life outcomes.100101 Therefore, we might not have fully captured other important shortcomings of randomised controlled trials that support cancer drug approvals.