Literature search

This meta-analysis was performed according to the Preferred Reporting for Systematic Reviews and Meta-analysis (PRISMA) Statement.33 The literature search was conducted by two independent researchers (C.L. and M.P.) using Pubmed (Medline), Embase, Cochrane Database of Systematic Reviews, and Psychinfo. Combinations of the following search terms were used: “raloxifene”, “evista” or “SERM” and “schizophrenia”, “psychosis”, “psychotic”, “schizoaffective”, or “schizophreniform”. The search had no year and language restrictions. See Table S2 for an example search string. The search cutoff date was 10 October 2017. Reference lists of the included studies were searched for cross-references. After independent screening was performed by M.P. and C.L., consensus about the included studies was reached between all authors.

Inclusion criteria

Articles were included when the following inclusion criteria were met: (1) randomized, double-blind placebo-controlled trials (used for quantitative synthesis) or case-reports (used for qualitative synthesis) that assessed the effect of raloxifene on one of our outcome measures; (2) included patients with schizophrenia spectrum disorder (schizophrenia, schizoaffective disorder, schizophreniform disorder or psychotic disorder not otherwise specified), according to the diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III, DSM-III-R, DSM-IV, DSM-IV-TR, DSM-5)34,35, or the International Classification of Diseases (ICD-9 or ICD-10); (3) studies were published in a peer-reviewed journal. For two studies that included the same patient sample, outcome measures that were similar were included in the analysis only once.23,24 Risk of bias was assessed independently by J.B. and M.P. using the Cochrane Risk of Bias tool for RCTs (Table S3).36

Outcome measures

The primary outcome measure was psychotic symptom severity, measured with the Positive and Negative Syndrome Scale (PANSS).37 Secondary outcome measures were cognitive functioning (for domains and included tests, see Table S4) and depressive symptoms (assessed by the Montgomery-Asberg Depression Rating Scale (MADRS)38 or Depression Anxiety and Stress Scale (DASS).39

Statistics

Comprehensive meta-analysis (CMA) software version 2.0 was used to perform all analyses, using a random-effects model.40 For every individual study, Hedges’ g was calculated for each outcome measure. To obtain this effect size, per treatment arm, mean differences in change scores (end of treatment minus baseline) and standard deviations (SD)) or pre- and post-means ( + SD) were used. To avoid overestimation of the true effect sizes caused by the pre-post treatment correlation,41 change scores were preferred. When these values were not reported, we used exact F-, t-, or p-values. All effect sizes were calculated twice independently from the original articles to check for errors.

Studies were combined in meta-analyses to calculate a mean weighted effect size for each outcome measure, using a random-effects model. To investigate whether studies could be taken together to share a common population effect size, the Q-value and I2-statistic were evaluated for each analysis. The Q-statistic tests the existence of heterogeneity, and displays a chi-square distribution with k-1 degrees of freedom (k = number of studies), where Q-values higher than the degrees of freedom indicate significant between-studies variability. I2 reflects which proportion of the observed variance reflects differences in true effect sizes, rather than sampling error, ranging from 0 to 100%. Values of 25%, 50%, and 75% can be interpreted as low, moderate, and high, respectively.42

Additionally, funnel plots were inspected for asymmetry in order to check for publication bias. Potential asymmetry was tested with Egger’s test, using a significance level of α = 0.05 (2-tailed). Effect sizes with a p-value smaller than 0.05 were considered statistically significant. Effect sizes were interpreted according to the guidelines by Cohen, with an effect size of 0.20 indicating a small effect, 0.50 a medium and over 0.80 a large effect.43

As in all papers either a dosage of 60 mg or 120 mg raloxifene was administered, a subgroup analysis was performed based on this categorization. This was done for PANSS outcomes only, as the amount of papers that reported depressive symptoms or cognitive functioning as an outcome measure was insufficient to perform this analysis. Furthermore, to assess the effect of treatment duration, this variable was used as a regressor in additional analyses.

Data-availability

The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary files. Since this is a meta-analysis no primary data were collected during this study. Additional data are available from the corresponding author upon request.