Abstract

Importance Physicians in training are at high risk for depression. However, the estimated prevalence of this disorder varies substantially between studies.

Objective To provide a summary estimate of depression or depressive symptom prevalence among resident physicians.

Data Sources and Study Selection Systematic search of EMBASE, ERIC, MEDLINE, and PsycINFO for studies with information on the prevalence of depression or depressive symptoms among resident physicians published between January 1963 and September 2015. Studies were eligible for inclusion if they were published in the peer-reviewed literature and used a validated method to assess for depression or depressive symptoms.

Data Extraction and Synthesis Information on study characteristics and depression or depressive symptom prevalence was extracted independently by 2 trained investigators. Estimates were pooled using random-effects meta-analysis. Differences by study-level characteristics were estimated using meta-regression.

Main Outcomes and Measures Point or period prevalence of depression or depressive symptoms as assessed by structured interview or validated questionnaire.

Results Data were extracted from 31 cross-sectional studies (9447 individuals) and 23 longitudinal studies (8113 individuals). Three studies used clinical interviews and 51 used self-report instruments. The overall pooled prevalence of depression or depressive symptoms was 28.8% (4969/17 560 individuals, 95% CI, 25.3%-32.5%), with high between-study heterogeneity (Q = 1247, τ2 = 0.39, I2 = 95.8%, P < .001). Prevalence estimates ranged from 20.9% for the 9-item Patient Health Questionnaire with a cutoff of 10 or more (741/3577 individuals, 95% CI, 17.5%-24.7%, Q = 14.4, τ2 = 0.04, I2 = 79.2%) to 43.2% for the 2-item PRIME-MD (1349/2891 individuals, 95% CI, 37.6%-49.0%, Q = 45.6, τ2 = 0.09, I2 = 84.6%). There was an increased prevalence with increasing calendar year (slope = 0.5% increase per year, adjusted for assessment modality; 95% CI, 0.03%-0.9%, P = .04). In a secondary analysis of 7 longitudinal studies, the median absolute increase in depressive symptoms with the onset of residency training was 15.8% (range, 0.3%-26.3%; relative risk, 4.5). No statistically significant differences were observed between cross-sectional vs longitudinal studies, studies of only interns vs only upper-level residents, or studies of nonsurgical vs both nonsurgical and surgical residents.

Conclusions and Relevance In this systematic review, the summary estimate of the prevalence of depression or depressive symptoms among resident physicians was 28.8%, ranging from 20.9% to 43.2% depending on the instrument used, and increased with calendar year. Further research is needed to identify effective strategies for preventing and treating depression among physicians in training.

Introduction

Studies have suggested that resident physicians experience higher rates of depression than the general public.1-5 Beyond the effects of depression on individuals, resident depression has been linked to poor-quality patient care and increased medical errors.6-8 However, estimates of the prevalence of depression or depressive symptoms vary across studies, from 3% to 60%.9,10 Studies also report conflicting findings about resident depression depending on specialty, postgraduate year, sex, and other characteristics.4,11-13 A reliable estimate of depression prevalence during medical training is important for informing efforts to prevent, treat, and identify causes of depression among residents.14 We conducted a systematic review and meta-analysis of published studies of depression or depressive symptoms in graduate medical trainees.

Methods

Search Strategy and Study Eligibility

Cross-sectional and longitudinal studies published between January 1963 and September 2015 that reported on the prevalence of depression or depressive symptoms in interns, resident physicians, or both were identified using EMBASE, ERIC, MEDLINE, and PsycINFO (independently performed by D.A.M. and M.A.R.); by screening the reference lists of articles identified; and by correspondence with study investigators using the approach recommended by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Figure 1).15 The computer-based searches combined terms related to interns, resident physicians, and study design with those related to depression, without language restriction (full details of the search strategy are provided in eMethods 1 in the Supplement). Studies were included if they reported data on resident physicians, were published in peer-reviewed journals, and used a validated method to assess for depression or depressive symptoms.16

Data Extraction and Quality Assessment

The following information was independently extracted from each article by 2 trained investigators (D.A.M. and M.A.R.) using a standardized form: study design, geographic location, years of survey, specialty, postgraduate level, sample size, average age of participants, number and percentage of male participants, diagnostic or screening method used, outcome definition (ie, specific diagnostic criteria or screening instrument cutoff), and reported prevalence of depression or depressive symptoms. The most comprehensive publication was used when there were several involving the same population of residents. A modified version of the Newcastle-Ottawa Scale was used to assess the quality of nonrandomized studies included in systematic reviews and meta-analyses.17 This scale assesses quality in several domains: sample representativeness and size, comparability between respondents and nonrespondents, ascertainment of depressive symptoms, and statistical quality (full details regarding scoring are provided in eMethods 2 in the Supplement). Studies were judged to be at low risk of bias (≥3 points) or high risk of bias (<3 points). All discrepancies were resolved by discussion and adjudication of a third reviewer (S.S.).

Data Synthesis and Analysis

Prevalence estimates of depression or depressive symptoms were calculated by pooling the study-specific estimates using random-effects meta-analysis that accounted for between-study heterogeneity.18 Binomial proportion confidence intervals for individual studies were calculated using the Clopper-Pearson method, which allows for asymmetry. When longitudinal studies reported prevalence estimates made at different time periods within the year, the overall period prevalence for the time period was used. Between-study heterogeneity was assessed by standard χ2 tests and the I2 statistic (ie, the percentage of variability in prevalence estimates due to heterogeneity rather than sampling error, or chance, with values ≥75% indicating considerable heterogeneity)19,20 and by comparing results from studies grouped according to prespecified study-level characteristics (study design, country, year of baseline survey, specialty, postgraduate level, Newcastle-Ottawa Scale components, age, sex, and diagnostic method) using stratified meta-analysis and meta-regression.21,22Quiz Ref ID The influence of individual studies on the overall prevalence estimate was explored by serially excluding each study in a sensitivity analysis. A secondary analysis restricted to longitudinal studies reporting both preresidency and intraresidency depressive symptom prevalence estimates was performed to better isolate associations with the residency experience from associations with assessment tools. Bias secondary to small study effects was investigated by funnel plot and Egger test.23,24 All analyses were performed using R version 3.2.2 (R Foundation for Statistical Computing).25 Statistical tests were 2-sided and used a significance threshold of P < .05.

Results

Study Characteristics

Thirty-one cross-sectional10-13,26-52 and 23 longitudinal4,6-8,53-71 studies involving a total of 17 560 individuals were included in the study (Figure 1, Table 1, and Table 2). Thirty-five took place in North America, 9 in Asia, 5 in Europe, 4 in South America, and 1 in Africa. Twenty-eight studies recruited residents from multiple specialties, while 26 recruited exclusively from single specialties. Thirteen studies included interns only, 36 included both interns and residents, and 5 included upper-level residents only. The median number of participants per study was 141 (range, 27-2323). Eleven studies assessed for depressive symptoms using the Beck Depression Inventory (BDI),72 11 used the Center for Epidemiologic Studies Depression Scale (CES-D),73 8 used the 2-item Primary Care Evaluation of Mental Disorders questionnaire (PRIME-MD),74 7 used the 9-item Patient Health Questionnaire (PHQ-9),75 4 used the Zung Self-rating Depression Scale (SDS),76 3 used the Harvard Department of Psychiatry/National Depression Screening Day Scale (HANDS),77 and 7 used other methods.78-82 Three assessed for depression using structured interviews.83 The diagnostic criteria and scoring cutoffs used by the studies are summarized in Table 1. When evaluated by Newcastle-Ottawa quality assessment criteria, out of 5 possible points, 3 studies received 5 points, 13 received 4 points, 23 received 3 points, 10 received 2 points, 4 received 1 point, and 1 received 0 points (scores for individual studies are presented in eTable 1 in the Supplement).

Prevalence of Depression or Depressive Symptoms Among Resident Physicians

Meta-analytic pooling of the prevalence estimates of depression or depressive symptoms reported by the 54 studies yielded a summary prevalence of 28.8% (4969/17 560 individuals, 95% CI, 25.3%-32.5%), with significant evidence of between-study heterogeneity (Q = 1247, P < .001, τ2 = 0.39, I2 = 95.8%) (Figure 2). Sensitivity analysis, in which the meta-analysis was serially repeated after exclusion of each study, demonstrated that no individual study affected the overall prevalence estimate by more than 1% (eTable 2 in the Supplement).

To provide a range of the depression or depressive symptom prevalence estimates identified by these methodologically diverse studies, estimates were stratified by screening instrument and cutoff score (Figure 3). Summary prevalence estimates ranged from 20.9% for the PHQ-9 with cutoff of 10 or more (741/3577 individuals, 95% CI, 17.5%-24.7%, Q = 14.4, τ2 = 0.04, I2 = 79.2%) to 43.2% for the 2-item PRIME-MD (1349/2891 individuals, 95% CI, 37.6%-49.0%, Q = 45.6, τ2 = 0.09, I2 = 84.6%). The 8 studies using the 2-item PRIME-MD yielded significantly higher estimates than did the others (Q = 69.0, P < .001). In contrast, there were no significant differences between estimates made using the CES-D, PHQ-9, HANDS, BDI, or Zung SDS (Q = 8.65, P = .12), suggesting that variation between instruments did not explain the heterogeneity in the observed depression or depressive symptom prevalence estimates. A model including only those studies4,7,34,47,48,50,60,66 using inventories with specificities greater than 88% yielded a prevalence estimate of 20.2% (1119/5425, 95% CI, 18.0%-22.6%, Q = 22.0, P < .01, τ2 = 0.02, I2 = 68.2%).

Prevalence of Depression or Depressive Symptoms by Study-Level Characteristics

Among all 54 studies, the prevalence of depression or depressive symptoms significantly increased with baseline survey year (slope = 0.5% per calendar-year increase; 95% CI, 0.03%-0.9%; test of moderator, Q = 4.4, P = .04). This association persisted when studies using the 2-item PRIME-MD were excluded and the analysis was restricted to the 23 studies using the CES-D, PHQ-9, HANDS, BDI, or Zung SDS presented in Figure 3 (slope = 0.6% per calendar-year increase; 95% CI, 0.1%-1.2%, P = .02).

Among the full set of studies, no statistically significant differences in prevalence estimates were noted between cross-sectional vs longitudinal studies (2851/9447, 29.1% [95% CI, 23.9% to 34.9%] vs 2111/8113, 28.4% [95% CI, 24.2% to 33.0%]; test for subgroup differences, Q = 0.04, P = .85), studies in the United States vs elsewhere (3026/10 883, 26.6% [95% CI, 21.9% to 31.9%] vs 1936/6677, 31.1% [95% CI, 26.0% to 36.7%]; Q = 1.4, P = .23), studies of nonsurgical vs both nonsurgical and surgical residents (1570/5841, 28.9% [95% CI, 24.7% to 33.4%] vs 3392/11 719, 28.8% [95% CI, 23.6% to 34.7%]; Q = 0, P = .98), or studies of only interns vs those of only upper-level residents (1411/5127, 31.9% [95% CI, 25.4% to 39.1%] vs 211/1061, 26.6% [95% CI, 14.9% to 42.8%]; Q = 0.9, P = .62) (Figure 4). There were no significant associations between prevalence and mean or median age (slope = −1.0% per year [95% CI, −2.8% to 0.8%]; Q = 1.2, P = .28) or percentage of males (slope = 3.4% per percentage increase in males [95% CI, −28.9% to 22.1%]; Q = 0.1, P = .79).

When evaluated by Newcastle-Ottawa criteria, studies with lower total overall quality scores yielded higher depression estimates (660/1658, 36.7% [95% CI, 30.2%-43.7%] vs 4302/15 902, 26.1% [95% CI, 22.4%-30.2%]; Q = 7.3, P = .007) (Figure 5). In terms of individual quality assessment criteria, higher prevalence estimates were found among studies with less representative participant populations (569/1472, 37.7% [95% CI, 32.4%-43.2%] vs 4393/16 088, 26.8% [95% CI, 23.1%-30.9%]; Q = 10.4, P = .001) and less valid assessment methods (1835/4425, 36.2% [95% CI, 29.9%-43.0%] vs 3127/13 135, 25.7% [95% CI, 22.6%-29.0%]; Q = 8.6, P = .003). No statistically significant differences in prevalence estimates were noted when studies were stratified by respondent/nonrespondent comparability criteria (Q = 0.11, P = .75) or by quality of descriptive statistic reporting (Q = 0.23, P = .63).

Heterogeneity Within Screening Instruments

To identify potential sources of heterogeneity independent of assessment modality, heterogeneity was examined within the studies using common instruments when at least 5 studies were available and at least 2 studies were in each comparator subgroup. Among the 7 studies using the CES-D and a cutoff of 16 or greater, heterogeneity was not accounted for by study design (Q = 0.3, P = .61), baseline survey year (Q = 1.3, P = .25), specialty (Q = 0.2, P = .70), sample size (Q = 2.1, P = .15), age (Q = 0.7, P = .41), or sex (Q = 0.7, P = .41) (full results are provided in eTable3 in the Supplement). Among the 8 studies using the 2-item PRIME-MD, heterogeneity was partially explained by study design (cross-sectional studies yielded higher estimates, 49.8% vs 41.3%; Q = 5.2, P = .02) and respondent/nonrespondent comparability (studies that established comparability yielded lower estimates, 39.6% vs 50.4%; Q = 10.3, P = .001) but was not significantly explained by sample size (Q = 0.2, P = .64), sex (Q = 2.7, P = .10), baseline survey year (Q = 0.1, P = .80), or Newcastle-Ottawa score (Q = 0.2, P = .64). Among 7 studies using the 21-item BDI with cutoff of 10 or greater, heterogeneity was in part explained by country (United States vs other, 10.7% vs 44.6%; Q = 30.7, P < .001), baseline survey year (Q = 13.4, P < .001), and sex (Q = 10.7, P = .001), but not by specialty (Q = 0.3, P = .58), postgraduate year (Q = 0, P = .99), age (Q = 1.3, P = .26), or respondent/nonrespondent comparability (Q = 0, P = .99).

Secondary Analysis of Longitudinal Studies

In a secondary analysis of 7 longitudinal studies,4,58,59,66-68,70 the temporal relationship between exposure to residency training and increased depressive symptoms was assessed (Table 3). Because studies used different assessment instruments, the relative change in depressive symptoms was calculated for each study individually (ie, follow-up divided by baseline prevalence), and then the relative changes derived from individual studies were meta-analyzed. Overall, the median absolute increase in depressive symptoms with the onset of residency training was 15.8% (range, 0.3%-26.3%; relative risk, 4.5).

Assessment of Publication Bias

Although visual inspection of the funnel plot revealed relatively minimal asymmetry (eFigure in the Supplement), there was evidence of small studies effect (Egger test P = .02), with smaller studies (<200 participants) reporting more extreme depression prevalence estimates than larger studies (32.0% [95% CI, 27.1%-37.4%] vs 24.5% [95% CI, 20.0%-29.7%]; Q = 4.2, P = .04) (Figure 5).

Discussion

This systematic review and meta-analysis of 54 studies involving 17 560 physicians in training demonstrated that between 20.9% and 43.2% of trainees screened positive for depression or depressive symptoms during residency. Quiz Ref IDBecause the development of depression has been linked to a higher risk of future depressive episodes and greater long-term morbidity, these findings may affect the long-term health of resident doctors.84,85 Depression among residents may also affect patients, given established associations between physician depression and lower-quality care.6-8 These findings highlight an important issue in graduate medical education.

In interpreting the results of this meta-analysis, it is important to note that the vast majority of participants were assessed through self-report inventories that measured depressive symptoms, rather than gold-standard diagnostic clinical interviews for major depressive disorder. The sensitivity and specificity of these instruments for diagnosing major depressive disorder vary substantially (eTable 4 in the Supplement).86 Instruments such as the 2-item PRIME-MD have low specificity (66%, 95% CI, 48%-84%) and should be viewed as screening tools. In contrast, other commonly used instruments, such as the PHQ-9, have high sensitivity (88%, 95% CI, 74%-96%) and specificity (88%, 95% CI, 85%-90%) for diagnosing major depressive disorder and have been shown to be comparable with clinician-administered assessments. Quiz Ref IDFurthermore, although self-report measures of depressive symptoms have limitations, there is evidence that among medical trainees the absence of anonymity in formal diagnostic assessments may compromise accurate assessment of sensitive personal information such as depressive symptoms.87 To reflect the heterogeneity of the measures included in this meta-analysis, a range of prevalence estimates (ie, 20.9%-43.2%) was reported in addition to a single measure (ie, 28.8%).

This study found an increase in depressive symptoms among residents over time that in part explained the heterogeneity between studies. This increase, while modest, is notable given efforts by the Accreditation Council for Graduate Medical Education,88 European Working Time Directive,89 and others90 to limit trainee duty hours and improve work conditions. The identified trend may reflect the medical community’s increased awareness of depression or developments external to medical education.91 Future studies should explore specific factors that may explain this trend.

Quiz Ref IDA secondary analysis restricted to longitudinal studies found a significant increase in depressive symptoms among trainees after the start of residency. The median absolute increase in depressive symptoms among trainees was 15.8% (range, 0.3%-26.3%) within a year of beginning training. This finding, in combination with evidence that the prevalence of depressive symptoms is similar across specialties and countries, suggests that the underlying causes of depressive symptoms are common to the residency experience. Identifying the factors that negatively affect trainee mental health may help inform the development of effective interventions for the reduction of depression that would be generalizable to different countries and specialties.

Variation in study sample size contributed importantly to the observed heterogeneity in the data. Studies with fewer participants generally yielded more extreme prevalence estimates, suggesting the presence of publication bias. Furthermore, some studies used screening instruments in nonstandard ways (eg, with cutoff scores that have not been validated). These variations were captured in part by Newcastle-Ottawa score, which assessed the risk of bias in each study. Studies with higher risk of bias yielded higher prevalence estimates of depressive symptoms. Study design (ie, cross-sectional vs longitudinal), country, survey years, specialty, postgraduate level, age, and sex also contributed to the heterogeneity between studies.

Quiz Ref IDLimitations should be considered when interpreting the findings of this study. First, a substantial amount of the heterogeneity among the studies remained unexplained by the variables examined. Unexamined factors, such as the institutional cultures of specific residency programs, may contribute to the risk for depressive symptoms among trainees. A better understanding of program culture and working environments may help elucidate some of the root causes of depressive symptoms. Second, the data were derived from studies that used different designs and involved different groups of trainees (eg, from different countries, specialties, and years of training). For example, all but 3 studies used screening tools to measure depressive symptoms, and the 3 that employed structured interviews used convenience samples not representative of the resident population at large. Because the studies were heterogeneous with respect to screening inventories and resident populations, the prevalence of major depressive disorder could not be precisely determined. However, a secondary meta-analysis of studies using validated, high-specificity (>88%) inventories involving 5425 participants yielded a prevalence of 20.2%, which may better reflect the true prevalence of major depression. Third, the analysis relied on aggregated published data. A multicenter prospective study using a single validated measure of depression and structured diagnostic interviews in a random subset of participants would provide a more accurate estimate of the prevalence of depression among physicians in training.

Conclusions

In this systematic review, the summary estimate of the prevalence of depression or depressive symptoms among resident physicians was 28.8%, ranging from 20.9% to 43.2% depending on the instrument used, and increased with time. Further research is needed to identify effective strategies for preventing and treating depression among physicians in training.

Back to top Article Information

Corresponding Author: Douglas A. Mata, MD, MPH, Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St, Boston, MA 02115 (dmata@bwh.harvard.edu).

Author Contributions: Dr Mata had full access to the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Mata.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Mata, Ramos.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Mata, Bansal, Di Angelantonio.

Obtained funding: Guille, Sen.

Administrative, technical, or material support: Guille, Sen.

Study supervision: Guille, Sen.

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.

Funding/Support: This work was supported in part by a US Department of State Fulbright Scholarship (D.A.M.), National Institutes of Health (NIH) funding (R01MH101459 to S.S.), and NIH Medical Scientist Training Program funding (TG 2T32GM07205 to M.A.R.).

Role of the Funder/Sponsor: The study funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources.