Reanalysis of the original primary outcome measure in the STAR*D study suggests STAR*D findings inflate improvement on antidepressant medication and exclusion criteria in conventional clinical trials results in an overestimation of antidepressant efficacy.

A new study, led by Irving Kirsch, Associate Director of the Program in Placebo Studies at Harvard Medical School, reanalyzes primary outcome data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. Results of the study, published in Psychology of Consciousness: Theory, Research, and Practice, suggest inflation of antidepressant efficacy both in the STAR*D trial reports and in conventional clinical trials.

“Comparisons of [Hamilton Rating Scale for Depression] HRSD improvement in the STAR*D trial with improvement reported in conventional trials indicate that the improvement following antidepressant treatment is substantially lower in this highly generalizable sample than it is in conventional clinical trials,” Kirsch and his colleagues write. “The actual real-world effectiveness of antidepressants is approximately half that reported in efficacy trials.”

The STAR*D study attempted to mimic “real world” patients by recruiting from routine medical/psychiatric outpatient treatment centers and not including a placebo control. Additionally, the STAR*D did not exclude patients with comorbid diagnoses, as is often done in clinical trials. With over 4,000 participants, the STAR*D study “is the largest and most expensive antidepressant effectiveness trial ever conducted,” note the authors. Since the STAR*D was published, many have critiqued the study’s methodology and interpretation of findings.

The first step of the STAR*D, which is the focus of the present study, was a 12-week trial of citalopram. The authors note that the STAR*D research protocol identifies the Hamilton Rating Scale for Depression (HRSD) as the primary outcome measure. However, in the initial report, HRSD results are not provided. Instead, the STAR*D presents outcomes on the Quick Inventory of Depressive Symptomatology (QIDS). The authors address issues with swapping outcome measures, as well as limitations of the QIDS (i.e., it was developed by STAR*D researchers and therefore had not been used in previous studies).

In the present study, the authors acquired the STAR*D raw data through NIMH and reanalyzed the HRSD results. Due to shortcomings with only reporting on response or remission rates, the authors also report on average improvement on HRSD scores. A total of 3,110 patients are included in the analysis. The researchers then compared their findings to a large meta-analysis of antidepressant comparator trials that also used the HRSD measure. A major difference between the STAR*D and comparator clinical trials is that conventional clinical trials often have more stringent exclusion criteria (e.g., exclude individuals with comorbid diagnoses).

Findings show that 26% of STAR*D participants achieved remission (i.e., exit HRSD score of 7 or less) and 33% were treatment responders (i.e., 50% or more improvement on HRSD score). The average improvement on HRSD score from baseline to exit was 6.6. The authors also note that this average improvement translates to only “minimally improved” in clinical significance. The authors compare these results to outcomes in the meta-analysis of clinical trials which showed a 49% remission rate, 65% response rate, and a mean HRSD improvement of 14.8.

“These results suggest that the exclusion criteria used in conventional clinical trials inflate remission rates by 89%, response rates by 101%, and continuous improvement scores by 126%,” the researchers write.

The researchers also compare their results to the reported outcomes on the QIDS in the STAR*D study. In the STAR*D, the QIDS remission rate was 30%, and the response rate was 43%. Again, these numbers are significantly higher than the HRSD scores.

“These results indicate that remission and response rates are substantially inflated on the QIDS-SR relative to the HRSD and that scores from studies reporting one of these measures cannot be compared validly with scores reported in studies using the other,” state the authors.

The researchers note some limitations of their study. First, neither the STAR*D nor the studies in the comparator meta-analysis included a placebo. Second, the comparator meta-analysis included studies from various antidepressants, while the STAR*D only used citalopram. Lastly, the authors note that the study did not take into account contextual factors that may influence participants’ depression and recovery.

The present study is the first to report on the original outcome measure of the STAR*D: the HRSD. As the authors note, “such an analysis is long overdue.” The researchers demonstrate that (a) STAR*D reporting on the QIDS rather than the HRSD significantly inflated depression improvement scores, and (b) conventional clinical trials lead to inflated estimates of antidepressant efficacy compared to “real world” clinical practice.

****

Kirsch, I., Huedo-Medina, T. B., Pigott, H. E., & Johnson, B. T. (2018). Do outcomes of clinical trials resemble those “real world” patients? A reanalysis of the STAR*D antidepressant data set. Psychology of Consciousness: Theory, Research, and Practice. Advance online publication. http://dx.doi.org/10.1037/cns0000164 (Link)