The search for the genetic factors underlying complex neuropsychiatric disorders has proceeded apace in the past decade. Despite some advances in identifying genetic variants associated with psychiatric disorders, most variants have small individual contributions to risk. By contrast, disease risk increase appears to be less subtle for disease-predisposing environmental insults. In this study, we sought to identify associations between environmental pollution and risk of neuropsychiatric disorders. We present exploratory analyses of 2 independent, very large datasets: 151 million unique individuals, represented in a United States insurance claims dataset, and 1.4 million unique individuals documented in Danish national treatment registers. Environmental Protection Agency (EPA) county-level environmental quality indices (EQIs) in the US and individual-level exposure to air pollution in Denmark were used to assess the association between pollution exposure and the risk of neuropsychiatric disorders. These results show that air pollution is significantly associated with increased risk of psychiatric disorders. We hypothesize that pollutants affect the human brain via neuroinflammatory pathways that have also been shown to cause depression-like phenotypes in animal studies.

Funding: This work was funded by the NordForsk project 75007: Understanding the Link Between Air Pollution and Distribution of Related Health Impacts and Welfare in the Nordic countries (NordicWelfAir); the DARPA Big Mechanism program under ARO contract W911NF1410333; by National Institutes of Health grants R01HL122712, 1P50MH094267, and U01HL108634-01; and by a gift from Liz and Kent Dauten. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

This Short Report received both positive and negative reviews by experts. The Academic Editor has written an accompanying Primer that is published alongside this article ( https://doi.org/10.1371/journal.pbio.3000370 ). The linked Primer presents a complementary expert perspective; it discusses how the current study should be interpreted, given the modest effect size, potential biases, and other limitations.

The environment for the US part of this study appears as 3 sets of variables at the county level: (i) quality of air, water, land, and “built” environment (e.g., amount of vehicular traffic, transit access, and pedestrian safety); (ii) weather indices split into number of days with at least 4 hours of pleasant weather (defined according to the design standards for climate-controlled buildings) and number of days with at least 4 hours of harsh (either too hot or too cold) weather (this last group of factors is useful in dissecting the outdoor environment’s positive [open-air activities] and negative [pollution] influences); and (iii) sociodemographic factors, such as median income, population density, and urbanicity, which are known risk factors for many psychiatric disorders. Therefore, individuals’ exposures to pollutants were measured at a county level for the US data. For the Denmark counterpart of our analysis, the environmental factors were estimated as exposure to air pollution during the initial 10 years of life. Our hypothesis was that these environmental factors causally contribute toward the onset and development of the psychiatric disorders in exposed individuals.

Our exploratory analysis and conclusions concerning the significant associations between environmental quality and the rates of neuropsychiatric disorders are based on 2 independent, very large datasets. The first dataset is the IBM Health MarketScan Commercial Claims and Encounters Database [ 34 ], comprising insurance claims for 151,104,811 unique US individuals from 2003–2013. MarketScan was previously used for numerous studies involving, for example, estimation of prevalence of diseases in the US: traumatic brain injury [ 35 , 36 ], attention deficit-hyperactivity disorder (ADHD) [ 37 ], epilepsy [ 38 , 39 ], and depression [ 40 ]. The second dataset is the collection of Danish national treatment and pollution registers [ 41 ] comprising all individuals born in Denmark between January 1, 1979, and December 31, 2002, who were alive and residing in Denmark at their 10th birthday (1,436,702 unique individuals).

Far fewer studies have explored the links between physical environments and mental illnesses (see [ 18 – 22 ]), with a small subset of these specifically focused on environmental pollution or its constituent toxicants [ 23 ]. Yet concern has been growing about the diverse negative health effects of air pollution, raising the possibility that air quality may play an important role in mental health and cognitive function. While the study of air pollution and health was originally driven by dramatic events and drastic outcomes such as mortality during 1930 Meuse Valley fog [ 24 ] due to the combination of industrial air pollution and climatic conditions, and the 1952 Great London Fog event [ 25 – 27 ], in which a multiple day temperature inversion concentrated coal-based air pollutants and resulted in thousands of deaths, attention has been turning to the question of chronic exposures and chronic diseases, including neurodevelopmental and neurodegenerative conditions [ 28 , 29 ]. More recent events, such as the Eastern China smog in 2013 [ 30 ] and the New Delhi smog in 2017 [ 31 ] saw air pollution measurements reach record levels, conditions that led to significant increases in morbidity and mortality rates. Such events have led to considerable debate, along with an upsurge of environmental research, new government regulation (e.g., the Clean Air Act of 1956 in the UK and the Chinese Air Pollution Control Law in 2015), and heightened public awareness of the relationship between air quality and health. Increasing interest in the effect of pollution on neuropsychiatric disorders has only recently begun to direct attention toward the brain, with in vitro and animal model studies lending mechanistic insight into how air pollution components can be neurotoxic [ 32 , 33 ].

What aspects of human environments are driving psychiatric and neurological disease prevalence? Recent umbrella reviews of epidemiological studies analyzing putative risk factors associated with common psychiatric and neurological disorders suggest several contributing factors to mental health and well-being, such as individual attributes and behavior (medical illness, stressful life events, substance abuse, cognitive and/or emotional immaturity), social circumstances (poor access to basic services, unemployment, poverty, neglect, social injustice, relationship conflicts, work stress, exposure to violence, and abuse), and environmental factors (occupational exposure, and exposure to pollution) [ 9 – 11 ]. These reviews stressed that well-designed and adequately powered studies are necessary to map the environmental risk factors for psychiatric disorder. Studies of gene-environment interactions in the context of psychiatric disorders likewise point to a wide range of factors interacting with genotype in mental disorder prevalence [ 2 , 12 – 15 ]. Historically, most of the attention to the environment as a causal factor in these studies has focused on home or family environments, with an empirically-justified emphasis on childhood adversity and trauma, [ 16 ] and, more recently, on prenatal influences [ 17 ].

The increasing prevalence of mental disorders is a major global problem that affects millions of people every year. In addition to personal suffering, psychiatric disorders are associated with significant societal costs [ 1 ]. A number of putative contributors to the etiology of these illnesses have been identified, but the majority of risk factors remain unknown. Mental illnesses such as bipolar disorder and schizophrenia develop due to a complex interplay of genetic predispositions and life experiences or exposures [ 2 – 5 ]. In the last decade, the genetic underpinnings of mental disorders have been extensively studied. For instance, recent work has identified 145 genome-wide significant associations for schizophrenia [ 6 , 7 ]. However, genetics alone cannot account for full phenotypic variation in mental health and disease, and it has long been believed that genetic, neurochemical, and environmental factors interact at many different levels to play a role in the onset, severity, and progression of these illnesses. The major neuropsychiatric disorders cover a broad range of heritability values, leaving ample room for environmental influences to play a role. From a comprehensive twin meta-analysis [ 8 ], environmental effects contribute to a 55% to 66% risk for major depression, 32% risk for bipolar disorder, and 23% risk for schizophrenia. Increased knowledge of environmental risk factors is therefore vital for a more comprehensive understanding of disease causation.

Results

For the US cohort, we studied 4 psychiatric and 2 neurological conditions: bipolar disorder, major depression, personality disorder, schizophrenia, epilepsy, and Parkinson disease, each defined by sets of specific International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes (see Methods section). When we refer to these 6 conditions below, we are explicitly referring to data captured by IBM MarketScan database, which is the treated prevalence inferred from US insurance claims (see Table 1); because the data were potentially influenced by reporting biases, we refer to the IBM MarketScan disease rates as raw rates, to be further adjusted for confounders.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Demographics of unique persons in IBM MarketScan database with at least one health insurance claim with diagnosis of bipolar disorder, schizophrenia, Parkinson disease, personality disorder, epilepsy, or major depression during 2003 to 2013. https://doi.org/10.1371/journal.pbio.3000353.t001

Spatial patterns of putative environmental risk factors in the US Spatial distribution of environmental risk factors varies significantly across the US (see Fig 1A–1D). Air quality (Fig 1A) is predictably worse near larger cities on both the US East and West Coasts while generally much better in the middle of the country. Water quality (Fig 1B) measurements showed very little variation across the US and is generally worse in the western US, as well as in some interior states (e.g., Wyoming and Illinois). Resolution of the water quality data facet is not very high, as county water quality descriptors closely follow state boundaries. Land quality (Fig 1C) appears to be worse in the north of the continental US as well as in the west. Importantly, land quality is not highly correlated with air quality across geographical space, facilitating the disentanglement of associations between factors. Built quality (Fig 1D) is patchy rather than continuous across counties. Regarding fair- and poor-weather days (Fig 1E and Fig 1F), central US counties far from coasts tend to have many poor-weather days, whereas coastal areas tend to be enriched with fair-weather days. Continental counties are correlated with a higher number of poor- and fair-weather days. The sociodemographic factors, including population density, urbanicity, insurance status, and poverty, showed variable patterns across the US (S1 Fig). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Spatial patterns of putative environmental risk factors in the US. The county-level environmental quality is assessed by the EQIs designed by the US EPA. (A) A map showing the EPA air quality index across US counties divided into septiles such that Q1 represents the best and Q7 represents the worst air quality regions. The EPA designed the index based on the measurements of 87 pollutants. (B) A map showing the EPA water quality index across counties constructed by the analysis of 80 water quality indicator variables. (C) EPA land quality index map constructed by the analysis of 26 land quality indicator variables. (D) EPA built quality index designed by the analysis of 14 built quality indicators (e.g., amount of vehicular traffic, transit access, and pedestrian safety). (E) A county-level map showing the average number of good weather days that indicated whether at least 4 hours in a diurnal cycle were in a “comfort zone,” defined as a 4-point patch with vertices in temperature and humidity space (temperature [18°C, 27°C, 27 °C, 18 °C] and relative humidity [6.71%, 8.85%, 13.85%, 10%]). (F) A county-level map showing average number of bad weather days that indicated whether at least 4 hours in a diurnal cycle were in an “extremely uncomfortable zone,” defined as <−5 °C or >35 °C. For both the “good weather days” and “bad weather days,” the number per year was averaged over the years during the period 2003–2012. The underlying data for producing these maps can be found in S1 Data. EPA, Environmental Protection Agency; EQI, Environmental Quality Index; NA, not available. https://doi.org/10.1371/journal.pbio.3000353.g001

Raw prevalence, sex ratio, and spatial disease patterns in the US From the health insurance claims analysis of over 151 million individuals represented in the IBM MarketScan database (during 2003–2013), the observed spatial patterns for the raw prevalence of 4 psychiatric and 2 neurological disorders in the US differ geographically to a remarkable extent (Fig 2A–2G, S2 Fig and S3 Fig). The raw (unadjusted) prevalence rates for bipolar and personality disorders were 0.82% and 0.15%, respectively, with both disorders 1.6 times more prevalent among female patients. The prevalence of major depression was 6.64% and was 2.1 times more common among women. Prevalence of schizophrenia and epilepsy was 0.55% and 0.62%, respectively, with both disorders at 1.2 times higher prevalence among female patients. In contrast, Parkinson disease was 1.3 times more common in males, with an overall prevalence of 0.16% (see Table 1). Note that after correcting for potential confounders [regression analysis], we found that the adjusted rates of bipolar disorder and personality disorder were 1.5 times higher among women. The rate of major depression was twice as high—and the rate of epilepsy was 1.12 times higher—among female patients. There was no significant difference in the adjusted rate of schizophrenia in male and female populations. These MarketScan prevalence estimates are in excellent agreement with those published previously (S1 Table). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 2. Cartogram maps showing the spatial patterns of apparent neurological and psychiatric disorder prevalence inferred from IBM MarketScan database. (A) Cartogram map of the total patient population (151 million) present in the MarketScan database during 2003–2013. County and state land areas are rescaled in proportion to their patient population, producing distorted maps. The squeezed regions contribute smaller shares of patient population compared to their corresponding land area and vice versa. The total map area remains the same. The subsequent cartogram maps show the prevalence of 4 psychiatric disorders: (B) bipolar disorder, (C) schizophrenia, (D) personality disorder, and (E) major depression, and 2 neurological disorders: (F) epilepsy and (G) Parkinson disease. The underlying data for producing these cartogram maps can be found in S1 Data. https://doi.org/10.1371/journal.pbio.3000353.g002 Areas of the country distant from large bodies of water (in the continental US) are the most enriched for neuropsychiatric disorders across the board. This is particularly evident for major depression and bipolar disorder, and in Kentucky and Missouri, when comparing Fig 2A to the rest of the subfigures. At the state level, Alaska shows more psychiatric disorder diagnoses than expected for the overall population size—particularly for personality disorders and schizophrenia. Hawaii shows higher-than-expected rates of Parkinson disease and schizophrenia, whereas Michigan has an apparent increased prevalence of Parkinson disease, major depression, bipolar disorder, and schizophrenia. Our mixed-effect regression analyses suggested that Michigan’s apparent higher rate across all disorders is associated with reporting biases, visible in our analysis as high, state-specific random effects. The US East Coast experiences a higher prevalence of these phenotypes than the West Coast (S2 Fig and S3 Fig). Geospatial clusters with a high prevalence of major depression are observed among almost all counties of Michigan, New Hampshire, and Maine (Fig 2E and S3 Fig).

Association between environmental factors and the risk of neurological and psychiatric disorders in the US We considered several environmental factors for the prediction of neurological and psychiatric disease diagnosis among different age and sex groups at the US county level. These factors included the quality of air, water, land, built environment, and weather conditions. In addition, population density, median income, ethnic and racial composition, and the percentages of poor and insured populations were also included in the model. All environmental predictors were transformed into septiles, with Q1 representing the best-quality and Q7 representing the worst-quality regions (US counties). Similarly, for weather variables and sociodemographic covariates, Q1 and Q7 represent the regions with the least and highest percentages, respectively. We report the comparison of disease rates between referent group Q1 with all higher septiles (Q2–Q7). Reviewing results of the application of our mixed-effect Poisson regression model, we noticed significant variability in the prevalence of neuropsychiatric disorders across different racial/ethnic groups (Fig 3A–3F). The strongest predictor of mood disorders (major depression and bipolar disorder) in a county was its percentage of white individuals (using US Census race/ethnicity categories). By contrast, a higher percentage of black non-Hispanic individuals was associated with higher rates of schizophrenia and epilepsy. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 3. Relationship between environmental factors and neurological and psychiatric disorders in US. The results from the US data analysis in which all predictor variables are divided into septiles (7 groups) with each septile representing approximately 400 counties. Septile 1 (counties with the least exposure or the least percentage) is used as a referent to compare the disorder rates in the higher septiles (counties with systematically higher exposures or percentages). For air, water, land, and built qualities, a higher septile corresponds to the group of counties with poor quality. Similarly, for all other variables, a higher septile represents a higher fraction or the corresponding percentages. The estimated disorder rate from the mixed-effects regression model is shown for (A) bipolar disorder, (B) schizophrenia, (C) personality disorder, (D) major depression, (E) epilepsy, and (F) Parkinson disease. (G) Map showing the aggregated state-level random effects. The random effects for the 6 disorders are aggregated to produce 1 representative map. States shaded red show higher disorder diagnoses, and those shaded blue show lower disorder diagnoses that is not captured by our model. An apparent high neurological and psychiatric disorder rate in the states of Michigan, Missouri, Georgia, and New Mexico, and the apparent low rate in the states of South Dakota, Iowa, Wyoming, and North Carolina could be associated with reporting biases. (H) Map showing aggregated, county-level random effects. Random effects for the 6 disorders are aggregated to produce 1 representative map. Counties in red show higher disorder rates, and those in blue show lower disorder rates not captured by our model. County-level random effects can be thought of as residual variations not explained by fixed-effect predictors and state-level random effects. There are relatively few counties in which the county-level random effect is consistently low. For example, several counties are consistently low (San Diego, Imperial, Orange, and San Bernardino Counties in Southern California), and several counties are consistently high (San Luis Obispo County in California and Snohomish and King Counties in Washington). The underlying data for this figure can be found in S2 Table. https://doi.org/10.1371/journal.pbio.3000353.g003 The strongest predictor for bipolar disorder diagnosis, after a population’s ethnicity composition, was air quality (defined by the US Environmental Protection Agency [EPA] Environmental Quality Index [EQI]). The worst air quality was associated with an approximately 27% increase (95% credible interval [CrI] 15%–40%, p MCMC <10−4) in the apparent rate of bipolar disorder (Fig 3A and S2 Table). The estimated rate of bipolar disorder was 16.4% higher (95% CrI 5.8%–29.6%, p MCMC = 0.0044) in the most densely populated counties (Fig 3A). For major depression, a slight increase of 6% in the diagnosis rate (95% CrI 0%–12.4%, p MCMC = 0.05) was observed only among the worst air quality regions (Q7). We also observed a positive association with a small effect size between population density, urbanicity, and the rate of major depression diagnosis (see Fig 3D and S2 Table). Personality disorder was best predicted by land pollution (Alaska and Hawaii were not included in this analysis because we did not have matching high-resolution weather data). The regions with worst land quality (Q7) were associated with an estimated 19.2% increase (95% CrI 8.8%–29.9%, p MCMC <10−4) in the apparent rate of personality disorder (Fig 3C and S2 Table). The apparent protective effect of pleasant weather days was high across all our target disorders and was highest for bipolar disorder in our analysis. The counties with the highest number of pleasant weather days (Q7) were associated with an estimated 21.8% decrease (95% CrI 16.8%–26.8%, p MCMC <10−4) in the rate of bipolar disorder (Fig 3A–3F). At first glance, it seems counterintuitive that across all studied psychiatric and neurological disorders, both mean numbers of pleasant and harsh days would appear to be associated with a protective effect in neuropsychiatric disorders (Fig 3A–3F). However, this is not a contradiction or error because, in the continental climate, the number of days with at least 4 pleasant hours is strongly correlated with the number of days with at least 4 harsh hours. In these places, the same day can contribute to both the pleasant and the harsh list (e.g., pleasant in the early morning or late evening and harsh at midday). Therefore, it is likely that one effect, possibly the protective days with harsh weather (keeping individuals indoors, away from environmental exposure to contaminated air and land), is causal, and another effect—the number of pleasant days—is driven by a secondary correlation. Random effects at the state and county levels showed dissimilar distribution across all 6 disorders studied here. For example, random effects for Michigan, Missouri, New Mexico, and Georgia were consistently high, whereas those for South Dakota, Iowa, Wyoming, and North Carolina were consistently low (see Fig 3G and S4 Fig). There were relatively few counties in which the county-level random effect was consistently low or high. For example, several counties in Southern California were low: San Diego, Imperial, Orange, and San Bernardino. Likewise, several counties were consistently high: San Luis Obispo in California and Snohomish and King in Washington (see Fig 3H and S4 Fig).

Sensitivity analysis for the association between air quality and bipolar disorder In order to correct for multiple testing, we applied false discovery rate (FDR) correction to the p-values obtained from the regression analysis. The association between air quality and bipolar disorder remain statistically significant after FDR correction, whereas a previously observed weak association of major depression with only worst air quality regions (Q7) did not survive the multiple correction (S2 Table). We performed further sensitivity analysis to test the significant association observed between air quality and the rate of bipolar disorder in the US. A validation study of bipolar disorder’s diagnosis in hospital discharge registers suggests that the two-separate discharge diagnosis measure was sufficiently sensitive and specific for us to use in our epidemiological study [42]. We further validated our model by considering a subset of the population with at least 2 or more insurance claims diagnosed as bipolar disorder during the study period of 2003–2013. A total of 906,175 individuals (345,318 males and 560,857 females) met this criterion. Validating the model with this new criterion showed similar trends as reported above (S5A Fig and S6A Fig). Notably, air quality turned out to be the strongest environmental predictor of bipolar disorder. The regions with worst air quality (Q7) showed a 29% increase (95% CrI 16.4%–43.4%, p MCMC <10−4) in the apparent rate of bipolar disorder (see S5A Fig and S6A Fig). Lithium is often considered as a gold standard for treating bipolar disorder [43, 44]. We ran an additional model by redefining the bipolar disorder cohort to include individuals with a history of at least 1 dispensed prescription of lithium (37,964 individuals) in addition to those who had at least 1 insurance claim of bipolar disorder. The results and the trends from these models were comparable to the results reported earlier (S5B and S5C Fig). Random effects at the state and county levels showed dissimilar distribution across all neuropsychiatric disorders (see S6B and S6C Fig).

Model validation and adjustment for spatial autocorrelation The six neuropsychiatric disorders considered in this study showed variable degrees of spatial autocorrelation at the county level. These spatial dependencies could potentially artificially reduce variance in observations and inflate the effect size of the covariates, leading to biased parameter estimates. To probe the importance of the spatial dependency of outcomes, we tested both nonspatial and spatially explicit (conditional autoregressive [CAR]) models. Bayesian analysis of very large datasets with hierarchical mixed-effects models and spatial correction was computationally very expensive. Therefore, for this comparative analysis, we did not stratify data by age and gender groups, and therefore the models do not represent age- and sex-corrected estimates. Parameter estimates, analyses of residual spatial autocorrelation, and Bayesian posterior predictive checks were used to compare model performances. For the nonspatial model, we used a mixed-effect Poisson regression with the same exposure and covariates as used previously (except for age and sex) and measured random effects at the state and county levels. For the spatial model, we used county adjacency information (from the US Census Bureau) to design binary, first-order adjacency weight matrix and corrected for spatial autocorrelation using a CAR model. We tested for spatial autocorrelation among the residuals using Moran’s I test and found no autocorrelation among the residuals. Comparing 2 versions of spatial analysis, we observed slight variations in some of the model estimates after accounting for spatial autocorrelation (S7 Fig and S3 Table). For bipolar disorder, the comparison of best (Q1) and worst (Q7) air quality regions suggests that risk increases by 29.7% (95% CrI 17.3%–43.3%) under nonspatial setting and by 23.4% (95% CrI 12.7%–36.3%) under spatial correction (S7 Fig and S3 Table). It should be noted that correction for spatial dependencies slightly reduced the estimated effect of air quality on the rate of bipolar disorder, but the association remains strong and statistically significant. On the other hand, a marginally higher rate of major depression (only among the worst air quality regions [Q7]) remained consistent across the models. After correcting for spatial autocorrelation, the estimated rate of personality disorder in the worst land quality regions (Q7) increased from 19.7% (95% CrI 9.4%–29.7%) to 25.9% (95% CrI 13.9%–37.7%) compared to the best land quality regions (Q1) (S7 Fig and S3 Table). In general, for all disorders, the correction of spatial dependencies slightly reduced the estimates for ethnicity, population density, and weather variables (S7 Fig and S3 Table). With leave-one-out cross-validation, the comparison of nonspatial and spatially explicit models suggests that the predictive performance decreases marginally in all 6 models after adjusting for spatial autocorrelation. We tested for spatial autocorrelation among the residuals by computing Moran’s I statistics and found no signs of spatial correlation in any of the models, suggesting that first-order binary adjacency weights were sufficiently able to eliminate spatial dependencies. To further evaluate the robustness of the models, we split the data into 2 subsets (subset 1 and subset 2). For each state, we randomly assigned equal numbers of counties to both subsets. The 2 subsets included representative samples from 49 states (excluding Alaska and Hawaii), with subset 1 consisting of 1,532 and subset 2 consisting of 1,557 counties. For each neuropsychiatric disorder, we produced separate models from subset 1 and subset 2 and tested them against each other. In general, with few exceptions, the model estimates from subset 1 and subset 2 were mostly consistent and comparable (S8 Fig). The association between air quality and bipolar disorder remained significant in both the models. Importantly, model 1 suggested a 33.6% increase (95% CrI 16.1%–53.5%) and model 2 suggested a 29.6% increase (95% CrI 11.6%–50.7%) in the rate of bipolar disorder when comparing the worst air quality regions (Q7) with the best air quality regions (Q1). When tested against one another, the 2 independent models showed robust prediction capability, with Bayes R-Square for the bipolar disorder models as follows: subset 1 (0.989) when tested on subset 2 (0.95), and subset 2 (0.987) when tested on subset 1 (0.948). Models for other phenotypes similarly showed strong prediction strength when tested with independent datasets (S9 Fig). These independent model validations indicate robustness of the associations reported earlier in this study.

Childhood individual-level exposure to air pollution and the risk of psychiatric disorders in Denmark We used Danish national registers comprising all individuals born in Denmark between January 1, 1979, and December 31, 2002, who were alive and residing in Denmark at their 10th birthday (1,436,702 unique individuals) to study 4 psychiatric disorders: bipolar disorder, schizophrenia, personality disorder, and depression. We estimated air pollution exposure for all individuals from birth until age 10 and studied the association between childhood exposure to air pollution and 4 psychiatric disorders. We performed principal components analysis (PCA) on 14 air quality indicators to obtain a summarized measure of exposure to the air pollution (see S10 Fig, S11 Fig, and Methods for details). We transformed air pollution exposure into septiles, with Q1 representing the least exposure and Q7 representing the highest exposure to the air pollutants. It is important to highlight here that, though the general concept and pipeline are similar, the exposure composition and the statistical model used for the Denmark data analysis are technically different from the one used for the US analysis (see Methods for details). The high resolution of the Danish national registers made it possible to estimate the exposure to air pollution at the individual level—in contrast with the US data analysis reported earlier, in which the exposure is measured at the county level. These differences were primarily dictated by the availability and resolution of the data. Caution should be made in direct comparison of the results from cross-country analysis. Results from the Cox regression models suggest that, for all 4 psychiatric disorders, the rate of disorders increases with increasing levels of exposure to air pollution. The estimated rate of schizophrenia was 148% higher (95% confidence interval [CI] 119%–180%, p < 2 × 10−16) among individuals in the group with the highest exposure to air pollution (Q7) compared with those with the least exposure (Q1, the referent group; shown in Fig 4 and S4 Table). The estimated rate of bipolar disorder was 29.4% higher (95% CI 9.4%–52.9%, p < 3 × 10−3) and 24.3% higher (95% CI 4.5%–47.9%, p < 0.014) in the exposure categories Q6 and Q7, respectively, compared with Q1. The strongest association was between air pollution and personality disorder, showing a 162% increase (95% CI 142%–183%, p < 2 × 10−16) in the disorder rate among category Q7 compared with category Q1. The estimated rate of major depression increased by 50.5% (95% CI 42.8%, 58.7%], p < 2 × 10−16) among the group with the highest exposure to air pollution (Fig 4). (Note: The complete within-group comparison of estimated rates can be found in Fig 4 and S4 Table.) The association between air quality and the risk of all 4 psychiatric disorders remained statistically significant even after correcting for multiple comparisons (see S4 Table). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 4. Association between air quality and the risk of psychiatric disorders in Denmark. The results from the Danish data analysis in which the individual-level estimates of air quality exposure are divided into septiles, with each septile representing approximately 200,000 individuals. Septile 1 (representing the least exposure) is used as a referent to compare disorder rates in the higher septiles for bipolar disorder, schizophrenia, personality disorder, and major depression. Higher septiles represent individuals with systematically higher exposure to low-quality air. Five different models (labelled M0–M4) were run for each phenotype, briefly as follows: M0: crude model with 7 air-quality–exposure groups; M1: M0 + calendar time using splines; M2: M1 + sex; M3: M2 but restricted to subset of population with no missing covariates; and M4: M3 + socioeconomic status + urbanization. Further, to cross-validate the models, whole data were split into 2 equal subsets (subset A and subset B), separate models were run on each subset, and the parameter estimates were compared. The figure shows estimates from subset A, subset B, and from the model using all the data. The underlying data for this figure can be found in S4 Table. https://doi.org/10.1371/journal.pbio.3000353.g004 To test the robustness of these model estimates, a cross-validation analysis was performed on the Danish dataset. The whole cohort was randomly partitioned into 2 equal-size subsets that were analyzed separately, and results of the analyses were compared (Fig 4). The two subsets provided nearly identical results.

Harmonization of the US and Denmark data analysis In the Denmark analysis, it did not make sense to aggregate data geographically by administrative region when individual-level data at a resolution of 1 square kilometer were available. We did run the analysis over the Denmark cohort using a Poisson model instead of Cox. The results were very similar to the initial Cox regression analysis, as shown in the Supporting Information (S12 Fig). To harmonize the analysis of data from 2 different countries, we adjusted the models built on the Denmark data for potential socioeconomic confounders such as urbanicity, parental educational levels, income, and employment status (all measured at an individual level on their 10th birthday). The information on these covariates was not readily available for the entire study population, so a subset of the dataset was used for the subsequent analysis. The results from the adjusted models were consistent and comparable to the results reported in the earlier models (see Fig 4 and S4 Table). Notably, by adjusting for socioeconomic confounders, the previously estimated rate of bipolar disorder slightly diminished and that of personality disorder increased, but the overall trend of association remained comparable. The air quality index used in the US analysis (designed by the EPA) is a summary measure, obtained from the PCA of mean exposure to the 87 air quality indicators, whereas for Denmark, the exposure is a summary indicator of 14 air quality indicators modeled from birth until a patient’s 10th birthday. In an attempt to harmonize the 2 analyses, we performed a sensitivity analysis by using the same air quality indicator variables across the 2 studies. First, we recomputed the US county-level air quality index with a subset of 6 air components (carbon monoxide [CO], nitrogen dioxide [NO 2 ], ozone [O 3 ], particulate matter smaller than 10 μm [PM 10 ], particulate matter smaller than 2.5 μm [PM 2.5 ], and sulfur dioxide [SO 2 ]) that were available for both the US and Denmark. With a mixed-effect Poisson regression model, we again observed a significant association between the air quality and risk of bipolar disorder in the US. The counties with the worst air quality (Q7) showed an estimated 11.6% increase in the rate of bipolar disorder (S13 Fig). Secondly, we reanalyzed Denmark data with the exposure estimated from 6 air components discussed above. The estimates from these models were again very similar and comparable. Specifically, the rate increase in the highest exposure group (Q7) compared to the least-exposure group (Q1) was as follows: bipolar disorder 31.4% (95% CI 7.4%–60.8%, p = 0.007), schizophrenia 104.3% (95% CI 76.3%–136.8%, p < 2 × 10−16), personality disorder 209.6% (95% CI 183.5%–238%, p < 2 × 10−16), and major depression 68.3% (95% CI 57.9%–79.4%, p < 2 × 10−16) (S4 Table). (Note that we present complete results of this analysis, for all 7 groups of environmental quality, in S4 Table.)