Trial Design, Interventions, and Oversight

This randomized, open-label, phase 2 trial was conducted at 26 sites (Table S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org). Randomization was performed on a 1:1 basis and stratified according to site, with the difference in the number of participants in the two treatment groups constrained to two or fewer at each site.14 The trial ended when the last participant completed the month 54 evaluation, with a maximum follow-up to 72 months. After an event of respiratory, renal, or cardiac failure occurred (i.e., event-free survival was not achieved), clinic visits ended but telephone contacts continued to month 54. Death was investigated with the use of public records, as needed.

Hematopoietic progenitors were mobilized with granulocyte colony-stimulating factor (G-CSF); after leukapheresis and CD34+ cell enrichment, the autologous product was cryopreserved.15 Fractionated total-body irradiation (800 cGy), cyclophosphamide (120 mg per kilogram of body weight), and equine antithymocyte globulin (90 mg per kilogram) were given as previously reported.9,16 Pulmonary and renal shields limited organ exposure to a target of 200 cGy.17 After conditioning, participants received CD34+ cells (median, 5.6×106 per kilogram; interquartile range, 3.8 to 6.0) and post-transplantation care with G-CSF, glucocorticoids, lisinopril, and antiinfective agents including acyclovir, which was given for 1 year (Table S2 in the Supplementary Appendix).9,18 In the cyclophosphamide group, an initial intravenous dose of 500 mg per square meter of body-surface area was followed by 11 monthly infusions of 750 mg per square meter with mesna prophylaxis.

A data and safety monitoring board appointed by the National Institute of Allergy and Infectious Diseases, the trial sponsor, provided oversight. An independent end-point review committee whose members were aware of the trial-group assignments verified causes of death and verified events of respiratory, renal, or cardiac failure. Institutional review boards at each site approved the protocol, and Rho (Chapel Hill, NC) held and analyzed the data. Members of the steering committee (Table S1 in the Supplementary Appendix) designed the trial, vouch for its adherence to the protocol, and attest to the accuracy and completeness of the data and analyses as specified in the protocol and statistical analysis plan, which are available at NEJM.org. The trial data are accessible from ImmPort (www.immport.org) in study SDY1039 (DOI: 10.21430/M3SM4LTLH). The first author wrote the initial draft, and all the coauthors reviewed the manuscript and agreed to publication. All the participants in the trial provided written informed consent.

Participants

Adults (18 to 69 years of age) with scleroderma (American College of Rheumatology 1995 criteria) for 5 years or less with pulmonary or renal involvement were eligible. Pulmonary involvement required active interstitial lung disease (as determined by bronchoalveolar cell composition or ground-glass opacities on computed tomography of the chest) plus either a forced vital capacity (FVC) or a diffusing capacity of the lung for carbon monoxide (DLco) of less than 70% of the predicted value. Renal involvement required previous scleroderma-related renal disease. Key exclusion criteria were active gastric antral vascular ectasia, a DLco of less than 40% of the predicted value, an FVC of less than 45% of the predicted value, a left ventricular ejection fraction of less than 50%, a creatinine clearance of less than 40 ml per minute, pulmonary arterial hypertension, or more than 6 months of previous treatment with cyclophosphamide.19,20

Evaluations and End Points

Participants were evaluated monthly through year 1, then approximately quarterly through year 5. Serial pulmonary-function testing was performed in the same laboratory, with DLco (Crapo–Morris calculation) corrected for anemia. Rheumatologists were certified to assess the modified Rodnan skin score (range, 0 [normal] to 51 [severe skin thickening]); minimally important differences are 3.2 to 5.3 points. Scales are more fully detailed in Section S1 of Methods in the Supplementary Appendix.21,22

The primary end point was the global rank composite score at 54 months. The global rank composite score is an analytic tool that accounts for multiple disease manifestations simultaneously but does not measure disease activity or severity. It reflects how participants compare with one another on the basis of a hierarchy of ordered outcomes: death, event-free survival (survival without respiratory, renal, or cardiac failure), FVC, the score on the Disability Index of the Health Assessment Questionnaire (HAQ-DI; range, 0 to 3, with higher scores indicating more disability), and the modified Rodnan skin score (Section S1 of Methods in the Supplementary Appendix). Participants who were alive at 54 months rank higher than those who died; those who survived event-free rank higher than those who had an event, and so forth down the hierarchy (Sections S2 and S3 of Methods in the Supplementary Appendix). With the assumption that transplant recipients would have worse early outcomes but could fare better long-term than participants in the comparison group, the global rank composite score is intentionally constructed to treat early and late deaths (or events of organ failure) as equal, irrespective of timing. Variables that were used to define an event included death, respiratory failure (decrease from baseline of >30% in percent predicted DLco or >20% in percent predicted FVC) (Section S2 of Methods in the Supplementary Appendix), renal failure (long-term dialysis or renal transplantation), or cardiac failure (clinical congestive heart failure or left ventricular ejection fraction <30%).

The lowest three components of the global rank composite score are ordinal. They were defined by improvement (increase of ≥10% in the percentage of the predicted FVC, decrease of >0.4 in the HAQ-DI score, or decrease of ≥25% in the modified Rodnan skin score, as compared with baseline values), no change (neither improvement nor worsening), or worsening (decrease from baseline of ≥10% in the percentage of the predicted FVC, increase of >0.4 in the HAQ-DI score, or increase of ≥25% in the modified Rodnan skin score, as compared with baseline values).

Secondary end points included individual components of the global rank composite score, measures of disease progression, and quality of life. Safety end points included treatment-related death, death from any cause, treatment-related toxic effects, infections, and hematologic engraftment. Deaths, cancers, and disease-progression events that occurred after an event of respiratory, renal, or cardiac failure were tracked as secondary end points but were not reported as adverse events.

Statistical Analysis

The trial was originally designed for 226 participants, with event-free survival as the primary end point. Low accrual prompted amendments, first to broaden entry criteria, then, ultimately, to reduce the sample size by changing the primary end point to the global rank composite score. Power for the new design with 114 participants was estimated by simulations at 93% for a two-sided test with an alpha level of 0.05. Assumptions for the simulations were guided by data on similarly treated patients involved in previous studies.4,9 No SCOT data informed the redesign process (details in Section S3 of Methods in the Supplementary Appendix). With continued slow accrual but without reviewing efficacy results, the data and safety monitoring board recommended stopping randomization at 75 participants.

For ordinal end points, including the global rank composite score, the Wilcoxon signed-rank test was used for comparisons; the van Elteren extension of the Wilcoxon signed-rank test was used for stratified analyses.23,24 The effect size for the Wilcoxon signed-rank test is reported as the percent of all possible pairs between the two groups that favor transplantation (or cyclophosphamide). Fisher’s exact test was used for dichotomous events, including death and event-free survival at 54 and 48 months; the Mantel–Haenszel chi-square test was used for stratified analysis. Kaplan–Meier survival curves were compared with the use of log-rank tests. The data and safety monitoring board reviewed four prespecified futility analyses that included an ability to stop for efficacy with P<0.0001, leaving an alpha level of 0.0496 for the primary intention-to-treat analysis of the global rank composite score at 54 months after randomization. The intention-to-treat population was defined as all the participants who had undergone randomization. Secondary analyses are presented for the per-protocol population, defined as participants who received a transplant or completed nine or more doses of cyclophosphamide. Secondary analyses are supportive; P values were not adjusted for multiple comparisons. Safety results are summarized for all the participants who initiated treatment. Analyses used SAS software, version 9.3 or higher.