Study subjects

Cases and controls were collected as part of the International Cohort Collection for Bipolar Disorder (ICCBD), a US, UK, and Swedish consortium established to accelerate genomic studies of BD by applying high throughput phenotyping methods7,10. The Massachusetts General Hospital site of the ICCBD aimed to collect DNA from 4500 cases and 4500 controls by linking discarded blood samples to de-identified EHR data. As described in detail elsewhere10, cases and controls were identified by deriving EHR-based phenotyping algorithms applied to the Partners Healthcare Research Patient Data Registry (RPDR), which spans more than 20 years of data from 4.6 million patients. As the first step in building our BD phenotyping algorithms, we created a “datamart” of 52,235 individuals by filtering medical records to identify patients seen at Massachusetts General Hospital, Brigham and Women’s Hospital, or McLean Hospital who had at least one diagnosis of bipolar disorder (ICD- 9 and DSM-IV-TR codes 296.4*–296.8*) or manic disorder (ICD 296.0*–296.1*). Next, four phenotyping algorithms were developed to identify cases and one algorithm to identify controls.

The development and clinical validation of case and control algorithms described here is adapted from Castro et al. 10. The five phenotyping algorithms developed comprised the following:

1. 95-NLP: This BD case algorithm incorporated natural language processing (NLP) of narrative notes using the i2b2 suite of software11. Expert clinicians manually reviewed 612 notes from 209 randomly selected patients to identify gold-standard cases and to extract relevant features from narrative notes to be processed by NLP. We trained a model based on 414 features to predict the probability of BD using a logistic regression classifier with the adaptive least absolute shrinkage and selection operator (LASSO) procedure. The final model, comprising 13 features, achieved an area under the receiver operating curve (AUC) of 0.93, with a sensitivity of 0.53 when the specificity was set to 0.95. 2. Coded-strict: This algorithm was a rule-based classifier that required at least three ICD codes for BD, a predominance of BD diagnoses in the longitudinal record, and either (a) treatment with lithium or valproate within a year of BD diagnosis or (b) treatment at a bipolar specialty clinic. 3. Coded-broad: This algorithm required at least two ICD codes for BD, a predominance of BD diagnoses, and treatment with at least two bipolar medications (lithium, valproate, carbamazepine, or an atypical antipsychotic). 4. Coded-broad-SV: This algorithm was the same as “Coded-broad” except that two or more BD diagnoses were allowed to occur during the same inpatient or outpatient episode of illness. 5. Controls: This algorithm defined controls as those age 30 years or older with no ICD-9 codes or history of medications related to a psychiatric or neurological condition.

As reported earlier, we conducted a direct-interview study to examine the predictive validity of these algorithms. Patients in the Partners Healthcare system who were identified by each algorithm as BD cases or controls were invited by mail to participate. Participants in the validation study were independent of the patient samples on which the algorithms were trained. After informed consent was obtained, participants underwent semistructured diagnostic interviews (SCID-IV) conducted by experienced doctoral-level clinicians blinded to classifier diagnosis. To further preserve clinician blinding, we recruited individuals from MGH clinics who reported a previous diagnosis of schizophrenia or major depressive disorder, disorders commonly considered in the differential diagnosis of BD. The protocol was approved by the Partners HealthCare system institutional review board. A total of 190 participants were interviewed and PPVs for each algorithm were calculated as the proportion of algorithm defined BD cases (or controls) who received a concordant diagnosis by SCID interview. The PPVs for each algorithm using a non-hierarchical approach (where each case was assigned to any algorithm for which they satisfied inclusion criteria) are shown in Table 1 and reported in Castro et al.10.

Table 1 SNP-based heritability (h2g) for EHR-based bipolar disorder from the Partners Healthcare Research Patient Data Registry Full size table

DNA sample collection and genotyping

The phenotyping algorithms were applied to the Partners Healthcare system to ascertain case and control DNA samples by linking phenotypic data to discarded blood samples as previously described10,11,12. In brief, case and control medical record numbers are submitted to the Partners HealthCare Crimson system, which acts as an “honest broker” to match deidentified phenotypic data to discarded blood samples under an approved Institutional Review Board (IRB) protocol. Genotyping was performed in five batches that included case and control samples at the Broad Institute of MIT and Harvard using the Illumina PsychChip, a genomewide array that includes ~250,000 common variants as a backbone for GWAS imputation, ~250,000 rare variants, and ~50,000 additional markers focusing on psychiatrically relevant loci.

Genotype quality control (QC) and imputation

A total of 3772 BD cases and 4141 controls with genomewide data were available for this analysis. The number of genotyped BD cases by algorithm and controls are listed in Table 1 and described in Supplementary Figure 1. We performed QC on each genotyping batch separately with the following steps: (1) we removed single nucleotide polymorphisms (SNPs) with genotype missing rate >0.05 before sample-based QC; (2) we excluded samples with genotype missing rate >0.02, absolute value of heterozygosity >0.2, or failed sex checks; (3) after sample-based QC, we then removed SNPs with missing rate >0.02, with differential missing rate between cases and controls >0.02, or failed Hardy–Weinberg equilibrium test (p-value <1.0 × 10−6 in controls and p-value <1.0 × 10−10 in cases). To merge genotyping batches for imputation and analyses, we performed batch QC by removing SNPs with differential missing rate >0.005 between batches or significant batch association (p-value <5.0 × 10−8 between controls form different batches). All QC were conducted using PLINK v1.913.

The BD cases and controls included individuals from diverse populations. To control for population stratification and ensure the comparability between the current sample and previous European ancestry BD GWAS, we extracted samples with European ancestry for imputation and analyses. We used the International Haplotype Map project (HapMap3) samples as a population reference panel and performed principal component analysis (PCA) with the study samples and HapMap3 samples combined14. We calculated the distance between each study sample and the average European population samples in HapMap3 using PC1 and PC2. We selected the study samples with distance to average European HapMap3 samples <0.01 (Supplementary Figure 2–4)14. We also removed one sample from each pair of related or duplicate samples (\(\hat \pi\)> 0.2).

The final analytic dataset comprised 3330 BD cases (862 95-NLP, 1968 coded-strict, 2581 coded-broad, and 408 coded-broad-SV) and 3952 controls. The sum of the individual cases groups exceeds 3330 due to the non-hierarchical design in which cases were assigned to each phenotype for which they met inclusion criteria. We performed two-step genotype imputation with Eagle2 software for pre-phasing and IMPUTE2 on the European population study samples15,16.

Statistical analysis