SSRIs may affect the concentration of essential neurotransmitter substances in the brain and are therefore considered to exert effects on depressive symptoms. However, whether these effects are beneficial and clinically meaningful are the questions. Estimating a meaningful threshold for clinical significance is difficult and an assessment of clinical significance should ideally not only include a threshold on an assessment scale [182]. Major depressive disorder affects daily functioning, increases the risk of suicidal behaviour, and decreases quality of life [183]. Some adverse events might therefore be acceptable if SSRIs have clinically significant beneficial effects [13, 183, 184]. We therefore both predefined a threshold for clinical significance and assessed the balance between beneficial and harmful effects [13, 17, 184].

As threshold for clinical significance [14], we chose a drug-placebo difference of 3 points on the 17-item HDRS (ranging from 0 to 52 points) or an effect size of 0.50 standardised mean difference. This has been recommended by the National Institute for Clinical Excellence (NICE) in England and has been chosen in other reviews [4, 8, 31]. Nevertheless, these recommendations are not universally accepted and have been questioned [3]. Others have suggested the following ‘rules of thumb’ regarding the standardised mean difference: 0.2 a small effect, 0.5 a moderate effect, and 0.8 a large effect [16, 185]. One study has shown that a SSRI-placebo mean difference of up to three points on the HDRS corresponds to ‘no clinical change’ [186]. Another valid study has shown that a SSRI-placebo difference of 3 points is undetectable by clinicians, and that a mean difference of 7 HDRS points, or a standardized mean effect size of 0.875, is required to correspond to a rating of ‘minimal improvement’ [187]. It has been speculated that the ‘placebo’ response in antidepressant trials has been increasing during recent years [188]. If there is a ‘response’ to placebo this has of course to be considered when interpreting a mean difference between drug and placebo. However, it is unlikely that depressed patient have a significant placebo effect [189] and it has recently been shown that the placebo response has been stable for 25 years [188]. Even based on our predefined minimal thresholds for clinical significance, the effects of SSRIs did not have a clinically meaningful effect on depressive symptoms. Furthermore, per our meta-analyses SSRIs significantly increase the risk of both serious and non-serious adverse events.

The best-worst and worst-best case scenarios showed that incomplete outcome data bias alone theoretically could have caused the apparent statistically significant beneficial effect of SSRIs. Furthermore, seen in the light of the total number of trials, only a relatively limited number of trials reported on each of our pre-defined outcomes. This increases the risk of selective outcome reporting bias. Apart from the high risk of incomplete outcome data bias and selective outcome reporting bias, all the included trials were assessed at high risk of bias. All trials used placebo as control intervention and due to the large number of adverse events, some patients might have figured out if they received an ‘active’ intervention or not, which might question the blinding of the trials. Nevertheless, it may be argued that our bias risk assessment often will lead to no trials with low risk of bias. However, similar bias risk assessments have been used in several previous systematic review (see, e.g., most Cochrane Hepato-Biliary Group systematic reviews) and our bias risk assessment is based on valid evidence clearly showing that if each of the used bias risk domains is ‘high risk of bias’ or ‘unclear risk of bias’ then there is a risk of overestimation of benefits and underestimation of harms [184,191,192,193,194,195,196,, 190–197]. Furthermore, the risks of bias observed here just mirrors our experience in 786 randomised trials on depression [198].

We chose ‘remission’ as a primary outcome because we expected trialists to use this outcome frequently. To present a complete overview of the evidence on SSRIs for depression we also included ‘no response’ (less than 50% reduction on HDRS or MADRS during the intervention period) in a post hoc analysis because this outcome was frequently used in the included trials and by requests from peer reviewers. However, our results on no remission and no response should be interpreted with great caution for a number of reasons: 1) the assessments of remission and response were primarily based on single HDRS scores and it is questionable whether single HDRS scores are indications of full remission or adequate response to the intervention; 2) information is lost when continuous data are transformed to dichotomous data and the analysis results can be greatly influenced by the distribution of data and the choice of an arbitrary cut-point [16,200,, 199–201]; 3) even though a larger proportion of participants cross the arbitrary cut-point in the SSRI group compared with the control group (often HDRS below 8 for remission and 50% HDRS reduction for response), the effect measured on HDRS might still be limited to a few HDRS points (e.g., 3 HDRS points) or less; 4) by only focusing on how many patients cross a certain line for benefit, investigators ignore how many patients are deteriorating at the same time. If results, e.g., show relatively large beneficial effects of SSRIs when remission and response are assessed but very small averaged effects (as our results show) – then it must be because similar proportions of the participants are harmed (increase on the HDRS compared to placebo) by SSRIs. Otherwise the averaged effect would not show small or no difference in effect. The clinical significance of our results on ‘no remission’ and ‘no response’ should therefore be questioned. The methodological limitations of using ‘response’ as an outcome has been investigated in a valid study by Kirsch et al. who conclude that: “response rates based on continuous data do not add information, and they can create an illusion of clinical effectiveness” [202]. In retrospect, due to these methodological limitations we should not have assessed ‘no remission’ or ‘no response’ as outcomes. This is a clear limitation of our review [16,200,, 199–201].

Our tests for subgroup difference comparing trials with a baseline HDRS score below and above 23 points and meta-regression showed that the effects of SSRIs seem to increase with increased baseline HDRS score. Others have also shown that trials randomising participants with a higher baseline HDRS mean average seem to show larger effects of antidepressants [7, 8]. However, it is difficult to interpret why trials with higher average baseline HDRS score seem to have a larger effect of SSRIs. This might just be due to random error. No matter, it cannot be concluded based on these results that SSRIs work better on more severely depressed patients. To make such a conclusion individual patient data would be necessary, i.e., it would be necessary to show that it is actually the patients with higher baseline HDRS scores who have the larger effects. Gibbons et al. used longitudinal person-level data from a large set of published and unpublished studies and showed baseline severity was not significantly related to degree of SSRI treatment advantage over placebo [3]. It must be noted that the intervention effects in the group with HDRS scores above 23 points were still below our threshold for clinical significance, supporting Gibbons and co-workers’ results.

Leucht et al. have suggested that effects sizes of SSRIs in randomised clinical trials have declined over time [203]. Post-hoc meta-regression of the HDRS results confirmed their results (effect sizes going down from around 0.8 in the early 1980s to 0.25 in 2012). The reasons for the decreasing effect is not entirely understood but might be due to better methodology nowadays or recruitment of different types of participants [203]. Leucht et al. also suggested that a lack of difference between antidepressants and placebo is caused by an increasing ‘placebo’ effect (spontaneous recovery) [203]. This seem less important from a patient perspective, i.e., whether a certain drug should be used should be based on the benefits and harms of this drug compared with placebo. Furthermore, the increasing placebo effect has recently been severely questioned [188].

Our present systematic review has several strengths. Our protocol was registered prior to the systematic literature search in all relevant databases, data extraction, and data analyses [14]. Data were double-extracted by independent authors minimising the risk of inaccurate data extraction, and we assessed the risk of bias in all trials according to Cochrane [16]. We used Trial Sequential Analysis to control the risks of random errors [25, 29, 204], and the analyses of the primary outcomes showed that the accrued information sizes were sufficient. Both visual assessments of forest plots and statistical test showed limited signs of statistical heterogeneity, e.g., I2 was 0% when assessing risk of serious adverse events. Hence, these findings increase the validity of our review results and indicate that the effects shown are consistent across the different trials. Multiple previous reviews and meta-analyses have, as mentioned in our Background, assessed the effects of SSRIs and have generally concluded that SSRIs have significant effects on depressive symptoms [3,4,5,6,7,8]. However, the estimated results (and not the conclusions the review authors made) of these reviews and meta-analyses actually are in agreement with our present results and show that SSRIs do not seem to benefit patients more than a few HDRS points. This increases the validity of our present results. Furthermore, we assessed in detail the risks of serious adverse events and of non-serious adverse events and found that both were significantly increased by SSRIs.

Our systematic review has several limitations. Our HDRS mean differences were averaged effects. Hence, it cannot be concluded that SSRIs do not have clinically significant effects on all depressed participant. E.g., certain severely depressed patients compared with lightly depressed patients (e.g., so-called professional patients or symptomatic volunteers [203]) might benefit from SSRIs even though there is no evidence backing this hypothesis. However, any clinical research result will have this 'limitation'. Specific patients might benefit from any given intervention even though valid research results have shown that this intervention 'on average' is ineffective or even harmful. All trials were at high risk of bias per several bias risk domains and especially the risk of incomplete outcome data, selective outcome reporting, and insufficient blinding bias may bias our review results. Our GRADE assessments show that due to the high risks of bias the quality of the evidence must be regarded as very low. The high risks of bias question the validity of our meta-analysis results as high risk of bias trials tend to overestimate benefits and underestimate harms [194, 205]. The ‘true’ effect of SSRIs might not even be statistically significant.

We chose to include all SSRIs in our primary analysis. We did this to increase the statistical power and precision and to be able to compare the effects of the different SSRIs in subgroup analysis. Comparing the different SSRIs in test for subgroup differences did not show significant differences, indicating the effects (or lack of effects) of the different SSRIs are similar. Nevertheless, we cannot rule out that certain SSRIs may have beneficial or harmful effects that we have not identified in this review due to lack of relevant data. We identified very limited data on the effects of SSRIs on long-term outcomes, suicidal behaviour, and quality of life, so the effects of SSRIs on these outcomes are unclear. E.g., we only identified six trials assessing quality of life which substantially increase the risk of selective outcome reporting bias and thereby limit the validity of the meta-analysis result. Furthermore, the trialists did not use the same questionnaire. Quality of life is without question an outcome with great relevance to the patient and we urge future trialists to assess quality of life. However, any given quality of life questionnaire must be validated (shown to be correlated to, e.g., suicidal behaviour or other clinical events) before valid conclusions may be drawn based on this outcome. It must be shown that scores on a given questionnaire do reflect the actual ‘quality of life’. Valid consensus on choosing the optimal quality of life assessment method does not exist and this is a limitation of assessing quality of life in depressed patients. Our eight-step procedure used to assess if the thresholds for statistical and clinical significance are crossed, is based on generally accepted and validated methodology but the use of the eight-step procedure has not yet been validated in simulation studies or empirical studies [12, 13]. Even though the eight-step procedure has been used in several systematic reviews it is not universally accepted. This may be a limitation of our methodology.

The Committee for Medicinal Products for Human Use (CHMP) concluded”……… that, as no public health concerns have been identified, no regulatory action is necessary on the basis of Kirsch et al.'s findings” when the latter team questioned the benefits of antidepressants [182]. Per our results, we now believe that there is valid evidence for a public concern regarding the effects of SSRIs. We agree with Andrews et al. that that antidepressants seem to do more harm than good [206]. We have clearly shown that SSRIs significantly increase the risks of both serious and several non-serious adverse events. The observed harmful effects seem to outweigh the potential small beneficial clinical effects of SSRIs, if they exist. Our results confirm the findings from other studies questioning the effects of SSRIs [8, 207], but are in contrast to the results of other reviews concluding that SSRIs are effective interventions for depression [3, 6, 10, 208]. However, our present analyses represent the most comprehensive systematic review on the topic and we hope it may guide clinical practice.