A total of 769 full-text publications and 40 conference abstracts were found from the literature search. These titles and abstracts were screened for eligibility. After excluding duplicates and irrelevant studies, 70 studies were assessed for full-text eligibility, from which 15 full-text articles [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] and one abstract [32] were included in the analysis. Among the excluded studies, two [33, 34] used the same cohort as the study by Healey et al. [20] and two [35, 36] reported the same cohort as Hakkou et al. [26]. A summary of the selection process is shown in the flowchart (Additional file 1: Figure S1).

Of the 16 included studies, 14 recruited participants with AS (including three axSpA studies which reported on AS participants separately) and two reported on nr-axSpA cohorts (including one axSpA study which reported on nr-axSpA participants separately). Sample size ranged from 60 to 1504. A total of 4753 axSpA patients were included across all studies, including separately described groups of 2857 AS and 334 nr-axSpA patients. Twelve studies were cross-sectional in design, three were longitudinal, and one was a randomised controlled trial (RCT). Four studies were from Turkey, five from China (including one from Hong Kong), three from the UK, and one each from Greece, Morocco and Spain (Canary Islands). The RCT recruited from Europe, Asia and South America.

Three screening criteria and one diagnostic criterion were used for identifying depression. Nine studies used HADS with three different thresholds, five used SDS with three different thresholds, and one used PHQ-9. For the purposes of meta-analyses, HADS ≥ 7/8 were grouped together, and SDS ≥ 50/51/53 were grouped together. Only one study used diagnostic criteria for depression, the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders (SCID).

Given the strict inclusion criteria employed for this meta-analysis, most studies had high quality scores with one scoring 6, three scoring 8, nine scoring 9 and three scoring ≥ 10 on the Health States Quality Index (Additional file 1: Table S2).

Prevalence of depression

The prevalence of depression ranged from 11 to 64%, depending on criteria and threshold used to identifying disease. Table 1 summarises the study characteristics, depression prevalence and quality score. Funnel/Doi plots and the LFK index suggested no evidence of publication bias (Additional file 1: Figure S2).

Table 1 Summary of study characteristics, prevalence of depression and quality of studies included in this meta-analysis Full size table

Figure 1 shows a forest plot of prevalence estimates using quality-effects models, grouped by criteria and threshold. Pooled prevalence of mild depression (HADS ≥ 7/8) was 38% (95% CI 30 to 45%, I2 = 85%). Pooled prevalence of at least moderate depression, using HADS ≥ 11, was 15% (95% CI 6 to 25%, I2 = 89%) and using SDS was 52% (95% CI 29 to 75%, I2 = 96%). The study by Hyphantis et al. [19] reported 15% depression using PHQ9 ≥ 10. The study by Chan et al. [24] reported a prevalence of 11% for major depressive disorder using the SCID reported. Pooled prevalence using both quality- and random-effects meta-analysis are shown in Table 2.

Fig. 1 Pooled prevalence of depression in axSpA cohorts, grouped by criteria and threshold Full size image

Table 2 Summaries of pooled depression prevalence grouped by screening criteria and thresholds used Full size table

Two studies reported disproportionately high prevalence of depression. Hakkou et al. [26] attributed this to the cohort’s low socioeconomic status. Excluding this study improved the HADS subgroup heterogeneity without altering the pooled estimates significantly (Table 2). The Chinese study by Jiang et al. [31] reported the highest depression prevalence (64%). This cohort had the lowest mean age (27 years) and reported a low participation rate, with only 25% (683/2772) of the total cohort completing the required assessments. Excluding this study reduced both heterogeneity and prevalence estimate (52 to 36%) for the SDS group.

Prevalence of depression was inversely associated with age (r s = − 0.71, P = 0.003) but not with study size, BASDAI, year of publication or proportion of males (data not shown).

Comparing axSpA, AS and nr-axSpA cohorts

Fourteen of 16 included studies reported the prevalence of depression for AS cohorts. The pooled prevalences for AS cohorts are shown in Table 2. Again, the studies by Hakkou et al. and Jiang et al. reported high prevalence of depression (Additional file 1: Figure S3) and were excluded in sensitivity analyses.

The studies by Chan et al. [24], Zou et al. [28] and Kilic et al. [18] reported similar prevalence of depression between axSpA and their AS subgroups (Table 1). Kilic et al. also reported similar prevalence between AS and nr-axSpA subgroups (45.4 vs 42.3%, P = 0.58).

Two studies reported depression prevalence for nr-axSpA cohorts. Pooled prevalence of mild depression (HADS ≥ 7/8) for nr-axSpA was similar to that of AS cohorts (Table 2).

Comparing markers of disease severity between groups with and without depression

Eight studies compared markers of disease severity between groups with and without depression (Additional file 1: Table S3). For BASDAI, spVAS and BASFI, most studies reported significantly higher scores in the group with depression compared to those without, regardless of criteria or threshold used to define depression (Fig. 2). Across the depressed groups, scores were generally worse for the Bath AS metrology index (BASMI), AS disease activity score (ASDAS), CRP and ESR, but few individual comparisons were statistically significant.

Fig. 2 Measures of disease activity and functional impairment are worse in axial spondyloarthritis patients with comorbid depression. Effect sizes shown as weighted mean difference (WMD) Full size image

All eight studies reported significantly worse BASDAI in the group with depression. Despite the variety of criteria and thresholds used, the weighted mean differences (WMDs) were similar. Pooling WMDs, BASDAI was 1.4 units (95% CI 1.0 to 1.9) higher in the depressed group. Of the six studies that reported spVAS, the groups with depression scored 1.2 units (95% CI 0.7 to 1.7) higher. Only two studies reported ASDAS, with a pooled WMD of 0.5 units (95% CI 0.3 to 0.7) between the two groups. ESR (3.5 mm/h, 95% CI 0.6 to 6.4 mm/h) was significantly higher in groups with depression, but not CRP (1.3 mg/dl, 95% CI − 0.9 to 3.4).

All studies reported significantly worse BASFI in the group with depression. The pooled difference in BASFI was 1.2 units (95% CI 0.6 to 1.8) but with more variation among the studies. The group with depression in the study by Hakkou et al. had much poorer function (BASFI, BASMI) compared to depressed groups of other studies, despite using a threshold for ‘mild’ depression. In contrast to other Bath indices, not all studies reported a difference in BASMI when comparing groups with and without depression. Nevertheless, the pooled estimate showed that axSpA patients with comorbid depression had significantly higher BASMI than those without (0.6 units, 95% CI 0.3 to 0.8).