The study sample consisted of all approved academic clinical drug trial applications submitted to the Danish Health and Medicines Authority in 1999, 2001, and 2003, which has been described previously [22]. The trials were defined on the basis of the data as well as the publication rights being the property of publicly employed researchers and the absence of a pharmaceutical company name on the first page of the protocol. Trials with the sponsor living outside Denmark were excluded, whereas 39 previously examined trials were included [23].

Screening

For each trial, the corresponding published reports were identified from May to September 2009 by a systematic PubMed search. The follow-up time was at least 5 years. The search terms were based on selected data from electronic files at the Danish Health and Medicines Authority: sponsor’s name, protocol title, investigational medicinal products, and a brief description, if available, of the study. A published report was defined as any article reporting data on the trial subjects. PhD theses, conference abstracts, reviews, and published reports not reporting data on trial subjects were excluded as these are either not indexed, not sufficiently detailed, or do not contain trial results.

The names of the submitting sponsors were extracted from all included trials, and contact information was updated by searching Google or a registry of Danish physicians. Contact to sponsor was made by e-mail or letter for confirmation or correction of the identified corresponding published report(s) or lack thereof. Two reminders were sent in case of no response.

Data collection

Data were extracted from the protocols, including correspondence and amendments, and from the corresponding published reports. Pre-specified definitions of consistency and discrepancy of the composite variables were developed and tested. Data were extracted by LB, and uncertainties were discussed with LGP and TC. A continuous decision log ensured reproducibility.

To avoid confusion, main outcomes denote those of our study, whereas primary endpoints denote those of the trials in the study sample. The main outcome was overall consistency between protocols and their corresponding published reports, which was a priori defined as consistency on all of the following variables: study type (exploratory/confirmatory), primary objective and primary endpoint and – for pairs of confirmatory protocols and corresponding confirmatory published reports – also as consistency in the hypothesis and sample size calculation. We also calculated the number of discrepancies per trial and the prevalence of discrepancy regarding each of the component variables.

If a published report showed discrepancy on a given variable but provided transparency, by either clearly stating the deviation from the protocol or referencing a previous published report that describes the study in accordance with the protocol, the variable was considered consistent.

The variables were defined as indicated below.

Discrepancy in the study type

We defined a confirmatory protocol/published report as a study testing a pre-specified hypothesis, which was associated with a formal sample size calculation. Studies with a primary confirmatory analysis and secondary exploratory analyses were considered confirmatory. All other studies were considered exploratory. A published report was categorized as discrepant if the study type differed from the study type derived from the protocol.

Discrepancy in the primary objective

The primary objective was defined as an objective explicitly defined as such. If there was no explicitly defined primary objective, the objective related to the primary endpoint was considered primary. In the special case of protocols consisting of more than one explicitly defined primary objective, consistency was determined as follows: 1) A published report stating the same or some of the protocol-specified primary objectives was considered consistent with the protocol. 2) A published report stating a non-protocol-specified primary objective was considered discrepant regardless of the consistency of other primary objectives. A published report only reporting secondary objectives and not stating the protocol-specified primary objective was considered consistent with the protocol only if a published report reporting or stating the primary objective was referenced (that is, providing transparency in the published report).

Consistency in the primary endpoint

The primary endpoint(s) was (were) defined as the one or two endpoints that were explicitly defined as primary. If no primary endpoint was explicitly defined, the endpoint used in the sample size calculation was considered as primary. If more than two endpoints were explicitly defined as primary, the protocol was considered to have no primary endpoints. In case of within-protocol or within-published report inconsistency, only the primary endpoint(s) substantiated in the body text was considered as primary. If one of two published report-specified primary endpoints differed from the primary endpoint(s) specified in the protocol, the published report was considered discrepant.

Pairs of confirmative protocol/confirmative published reports were also reviewed regarding discrepancy in the hypothesis and sample size calculation.

Discrepancy in the hypothesis

Hypotheses from the protocol and published report were compared. In the absence of an explicitly defined hypothesis, we formulated a hypothesis based on the sample size calculation as well as the rationale of the study (for example, “A better than B”). In case of a within-protocol inconsistency, the formulated hypothesis was based on the sample size calculation. For example, a protocol with a research question suited for an equivalence or noninferiority trial, but statistically designed to demonstrate superiority, was considered a superiority trial.

Discrepancy in the sample size calculation

The sample size calculation was considered discrepant if either the calculated sample size or any of the available components from the calculation differed between the protocol and the published report. It was also considered as a discrepancy if a sample size calculation was stated in the protocol but missing from the published report. The achieved sample size was not taken into account.

Data analysis and statistics

The sample size calculation was based on expected frequencies of overall consistency of 40 or 62 % of the trials. A sample size of 100 trials was chosen because the inclusion of at least 92 trials would yield a standard error of proportion (SEP)*z 2α less than 0.1. Data were registered in a Microsoft Access database with audit trail and analyzed in SAS 9.2 using χ2 and Fischer’s exact tests and logistic regression. P values < 0.05 were considered statistically significant. Kappa values were analyzed with GraphPad QuickCalcs (http://graphpad.com/quickcalcs). Multivariate logistic regression was planned but not conducted because only a few of the pre-specified variables for the regression showed association with overall consistency in 2 × 2 tables. We conducted post hoc logistic regression analyses adjusted by the association between published reports of the same protocol. This was done by the use of a repeated measures statement and with published reports as the unit of analysis.

Intra-rater agreement during data collection was determined from the test-retest of five protocols and 16 corresponding published reports assessed within an interval of 6 months. The variables were assumed independent of each other. Study types, primary objectives, and primary endpoints were extracted from 21 documents (five protocols and 16 published reports). Hypotheses and sample sizes were extracted from 10 documents (four protocols and six published reports). Overall, 77 of the 83 data points showed perfect agreement. The six disagreements were distributed as follows: primary endpoint (2/10), hypothesis (1/10), primary objective (1/10), and sample size calculation (2/10). No disagreements were found regarding trial type (exploratory/confirmatory).