Study Design and Oversight

The Testosterone Trials are a coordinated set of seven double-blind, placebo-controlled trials that are being conducted at 12 sites.10 To enroll in these trials overall, participants had to qualify for at least one of the three main trials (the Sexual Function Trial, the Physical Function Trial, or the Vitality Trial), but they could participate in more than one if they qualified. Participants were assigned to receive testosterone gel or placebo gel for 1 year. Efficacy was assessed at baseline and at 3, 6, 9, and 12 months. Data on adverse events were collected during the treatment period and for 12 months afterward. This report describes the efficacy results for the three main trials and adverse events in all the participants in these trials.

The protocol and consent forms were approved by the institutional review boards at the University of Pennsylvania and each participating trial site. All participants provided written informed consent. A data and safety monitoring board monitored data in an unblinded fashion every 3 months. The protocol, consent forms, and statistical analysis plan are available with the full text of this article at NEJM.org.

The investigators developed the protocol with assistance from the National Institutes of Health. AbbVie, one of the funders of the trial, donated the testosterone and placebo gels but did not participate in the design or conduct of the trials or in the analysis, review, or reporting of the data before the manuscript was submitted for publication. All the authors participated in the design and conduct of the trials. Trial statisticians performed all data analyses. The first author wrote the first draft of the manuscript, and all the authors contributed to subsequent drafts.

Participants

Participants were recruited principally through mass mailings.11 Respondents were screened first by telephone interview and then during two clinic visits. Eligibility criteria included an age of 65 years or older and serum testosterone levels that averaged less than 275 ng per deciliter. Exclusion criteria were a history of prostate cancer, a risk of all prostate cancer of more than 35% or of high-grade prostate cancer of more than 7% as determined according to the Prostate Cancer Risk Calculator,12 an International Prostate Symptom Score (IPSS; range, 0 to 35, with higher scores indicating more severe symptoms of benign prostatic hyperplasia) of more than 19, conditions known to cause hypogonadism, receipt of medications that alter the testosterone concentration, high cardiovascular risk (myocardial infarction or stroke within the previous 3 months, unstable angina, New York Heart Association class III or IV congestive heart failure, a systolic blood pressure >160 mm Hg, or a diastolic blood pressure >100 mm Hg), severe depression (defined by a score of ≥20 on the Patient Health Questionnaire 9 [PHQ-9; range, 0 to 27, with higher scores indicating greater severity of depressive symptoms]), and conditions that would affect the interpretation of the results.

Inclusion in the Sexual Function Trial required self-reported decreased libido, a score of 20 or less on the sexual-desire domain (range, 0 to 33, with higher scores indicating greater desire) of the Derogatis Interview for Sexual Functioning in Men–II (DISF-M-II),13 and a partner willing to have intercourse twice a month. Inclusion in the Physical Function Trial required self-reported difficulty walking or climbing stairs and a gait speed of less than 1.2 m per second on the 6-minute walk test.14 Men who were not ambulatory or who had disabling neuromuscular or arthritic conditions were excluded. Inclusion in the Vitality Trial required self-reported low vitality and a score of less than 40 on the Functional Assessment of Chronic Illness Therapy (FACIT)–Fatigue scale (range, 0 to 52, with higher scores indicating less fatigue).15

Study Treatment

We assigned participants to testosterone or placebo by means of a minimization technique, with participants assigned to the study treatment that best balanced the balancing factors between groups with 80% probability.16,17 Balancing variables included participation in the main trials, trial site, screening testosterone concentration (≤200 or >200 ng per deciliter), age (≤75 or >75 years), use or nonuse of antidepressants, and use or nonuse of phosphodiesterase type 5 inhibitors.

The testosterone preparation was AndroGel 1% in a pump bottle (AbbVie). The initial dose was 5 g daily. The placebo gel was formulated to have a similar application and appearance. Serum testosterone concentration was measured at months 1, 2, 3, 6, and 9 in a central laboratory (Quest Clinical Trials), and the dose of testosterone gel was adjusted after each measurement in an attempt to keep the concentration within the normal range for young men (19 to 40 years of age). To maintain blinding when the dose was adjusted in a participant receiving testosterone, the dose was changed simultaneously in a participant receiving placebo.

Assessments

At the end of the trials, the serum concentrations of total testosterone, free testosterone, dihydrotestosterone, estradiol, and sex hormone–binding globulin were measured in serum samples frozen at −80°C (see the Supplementary Appendix, available at NEJM.org). Steroid assays were performed at the Brigham Research Assay Core Laboratory (Boston) by liquid chromatography with tandem mass spectroscopy, and free testosterone was measured by equilibrium dialysis. All samples from each participant were measured in the same assay run.

Serum prostate-specific antigen (PSA) was measured and a digital rectal examination was performed at months 3 and 12, and PSA was measured at month 18. Detection of a prostate nodule or a confirmed increase in the PSA level by at least 1.0 ng per milliliter above baseline led to referral to the site urologist for consideration of prostate biopsy. The IPSS was determined at months 3 and 12. At every visit, adverse events were recorded and a cardiovascular-event questionnaire (see the protocol) was administered. Cardiovascular events were adjudicated by two cardiologists and two neurologists (see the Supplementary Appendix).

Outcomes

Efficacy outcomes were assessed at baseline and after 3, 6, 9, and 12 months of treatment. Dichotomous outcomes were used when a clinically important difference had previously been established. The primary efficacy outcome of each trial and the secondary outcomes of the Physical Function Trial were assessed in all participants; secondary outcomes for the other trials were assessed only in participants in those trials.

The primary outcome of the Sexual Function Trial was the change from baseline in the score for sexual activity (question 4) on the Psychosexual Daily Questionnaire (PDQ-Q4; range, 0 to 12, with higher scores indicating a greater number of activities).10,18 Secondary outcomes were changes in the score on the erectile-function domain (range, 0 to 30, with higher scores indicating better function) of the International Index of Erectile Function (IIEF)19 and the sexual-desire domain of the DISF-M-II.13 Details on the assessments in the Sexual Function Trial are provided in the protocol. The primary outcome of the Physical Function Trial was the percentage of men who increased the distance walked in the 6-minute walk test by at least 50 m.10,14 Secondary outcomes were the percentage of men whose score on the physical-function domain (PF-10; range, 0 to 100, with higher scores indicating better function) of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) increased by at least 8 points20 and changes from baseline in the 6-minute walking distance and PF-10 score. The primary outcome of the Vitality Trial was the percentage of men whose score on the FACIT–Fatigue scale increased by at least 4 points10,15; secondary outcomes were the change from baseline in the FACIT–Fatigue, the score on the vitality scale (range, 0 to 100, with higher scores indicating more vitality) of the SF-36,21 scores on the Positive and Negative Affect Schedule (PANAS) scales (range, 5 to 50 for positive affect and for negative affect, with higher scores indicating a greater intensity of the affect),22 and the PHQ-9 depression score.23 Every 3 months, participants were asked about their general impression of the change in sexual desire, walking ability, or energy (depending on the trial) and in overall health.

Statistical Analysis

Participants were evaluated according to the intention-to-treat principle. Each outcome was prespecified. Primary analyses of outcomes at all time points were performed with random-effects models for longitudinal data. Models included visit time as a categorical variable and a single main effect for treatment. For linear models of continuous outcomes, the treatment effect denoted the average difference in response between study groups across all four visits. For logistic models of binary outcomes, the treatment effect was the log odds ratio of a positive versus negative outcome for participants who received testosterone versus those who received placebo, averaged over all visits. Additional fixed effects were the baseline value for each outcome and balancing variables. Random intercepts were included for participant.

We analyzed the three trials as independent studies, without adjusting analyses of the primary outcomes for multiple comparisons. We also did not adjust the analyses of the primary and secondary outcomes within each trial for multiple comparisons, because the correlations among outcomes within a trial were expected to be very high, making such adjustment excessively conservative. Analyses of the primary outcomes that included all participants, however, were adjusted for multiple comparisons; we report the nominal P value only when it was lower than the threshold specified by the multiple-comparisons procedure.24 The sensitivity of results to missing data was assessed with the use of pattern-mixture models25 and shared random-effects models.26 The effect of change in total testosterone level on primary outcomes was assessed with the use of instrumental variables by two-stage residual inclusion,27 with study-group assignment as the instrument and change in testosterone level from baseline as the exposure of interest.

Sample sizes were calculated such that the studies would have 90% power, with the use of a two-sided test at a type I error rate of 0.05,10 to detect the following differences between the placebo group and the testosterone group: 15% versus 30% in the proportion of men with an increase of at least 50 m in the 6-minute walking distance, 20% versus 35% in the proportion of men with an increase of at least 4 points in the FACIT–Fatigue score, and a difference in change of 0.75 in the PDQ-Q4 score. These differences were conservatively based on comparisons between baseline and 12 months. Enrollment targets were 275 men for the Sexual Function Trial, 366 for the Physical Function Trial, and 420 for the Vitality Trial.