Data sets

The genetic data for this study came from the summary statistics of the largest and most recently published GWAS for AN [19] and OCD [22]. There was no sample overlap between these studies according to single nucleotide polymorphism (SNP) based relatedness testing. Estimates of pair wise identity-by-descent were calculated with LD-pruned sets of SNPs using PLINK, and no pair of samples with pi-hat > 0.1 were identified. For AN, we used data from a GWAS of 3495 cases and 10,982 controls. For OCD, we used data derived from a similar recently published GWAS of 2688 OCD cases and 7031 controls. All participants were of European ancestry. Using these data sets, we conducted a meta-analysis of 6183 AN–OCD cases and 18,013 controls. Although records regarding comorbidities are incomplete and exist for ~50% of the OCD and 35% of the AN sample, we have identified 101 definite and 24 probable cases of AN among the OCD sample and 447 AN cases with self-reported OCD diagnoses. More information about these data sets can be found in the Supplementary Information.

AN–OCD cross-disorder GWAS meta-analysis

As a part of the pre-processing step prior to meta-analysis, we identified all variants for which post-imputation dosage information was available to a sufficient degree and at an adequate level of quality at the call level in both data sets. To do this, we identified SNPs that met the following criteria for each data set: (1) minor allele frequency (MAF) > 0.01; (2) imputation quality (INFO) score > 0.6; (3) ≥ 60% of cases and controls from the overall meta-analysis per trait available for each variant; and (4) not an A/T or G/C variant with frequencies between 0.4 and 0.6. A total of 90.0% (9,586,433 of 10,641,224) of the variants in the AN data set and 91.6% (7,707,306 of 8,410,839) of the variants in the OCD data set met these criteria. Following additional pre-processing of the summary statistics files (Supplementary Information), we identified 7,461,827 variants which were present in both data sets.

The meta-analysis was run using the Ricopili pipeline (https://sites.google.com/a/broadinstitute.org/ricopili), which is the standard analysis pipeline for the Psychiatric Genomics Consortium (PGC). This pipeline utilizes METAL [24] and takes into account sample sizes as well as strength in direction of effect in each separate data set (inverse-variance weighted meta-analysis using a fixed-effects model). The meta-analysis summary statistics files for AN and OCD are available on the PGC website (http://www.med.unc.edu/pgc/results-and-downloads).

AN and OCD heritability, intercept and genetic correlation estimates

We used linkage disequilibrium (LD) score regression [25] to evaluate the relative contribution of polygenic effects and confounding factors (e.g., cryptic relatedness or population stratification) to deviation from the null in the genome-wide distribution of GWAS χ2 statistic for AN and OCD. Analysis was performed using pre-computed LD scores from 1000 Genomes European-ancestry samples. To estimate the impact of confounding factors, we evaluated the deviation of the LD score intercept from one, the expected value for a GWAS showing no sign of population stratification. LD score regression was also used to estimate SNP heritability [25]. For AN, OCD, and AN–OCD cross-disorder phenotypes, we estimated liability-based heritability using population prevalence estimates of 0.9%, 2.5% and 3%, respectively. We also computed the genetic correlation between AN and OCD in order to determine whether the correlation is consistent with the previously reported high genetic correlation between the two disorders [23].

Genetic correlation of AN–OCD single and cross-disorder phenotypes with other traits

The summary statistics files from AN, OCD, and AN/OCD cross-disorder analyses were uploaded to LD Hub server (v1.4.1; http://ldsc.broadinstitute.org). For each summary statistics file, we computed pairwise genetic correlations with 231 data sets, each representing a particular GWAS phenotype [26]. Subsequent exploratory analyses of the computed genetic correlations were then carried out, with the goal of describing consistent and unique pairwise correlations with other traits.

We first asked if there were traits that have significant genetic correlations with the AN–OCD cross-disorder data at a multiple testing-corrected level of significance. We applied a stringent Bonferroni correction based on a total of 231 × 3 = 693 separate tests performed, with a p value significance threshold of 7.2 × 10−5. We identified all single tests that passed this threshold, taking note of traits that did not meet the threshold in AN and OCD cohorts alone, but rather in the combined AN–OCD cohort. Second, we assessed the directionality of genetic correlations between AN, OCD, and the tested traits, as well as identified traits where the correlations were in opposite directions. To analyze the concordance of correlation directions between AN and OCD, we constructed a sign test by stratifying traits into bins based on p values (AN and OCD ≥ 0.05; AN or OCD < 0.05; AN or OCD < 1 × 10−3; AN or OCD < 1 × 10−5). For each bin, we counted the number of correlations which were in the same direction for AN and OCD, and binomial tests were performed under the null hypothesis that 50% of correlations should be in the same direction. We extracted traits for which: (1) AN or OCD had a genetic correlation p value that at least trended towards significance (p < 0.1) and (2) AN and OCD genetic correlations had opposite signs. We hypothesized that the traits that came out of this exploration could potentially capture some of the genetic signal specific to each disorder rather than the cross-disorder shared risk.

Additionally, we performed tests to determine if there was evidence for sets of traits for which concordant correlations with AN and OCD were biased in magnitude towards one phenotype over the other. We identified 80 traits that had concordant direction of correlation between AN and OCD, as well as a correlation p value < 0.05 in at least one of the two phenotypes. We binned these traits into categories as defined in LD Hub output and selected a subset of ten categories for formal testing, where we had at least three traits that fit the criteria above. For each category, we tested the null hypothesis that the number of traits for which the magnitude of the correlation is greater in AN matches the number of traits for which the magnitude is greater in OCD using a two-sided binomial test with an expected fraction of 0.5.

Gene-based and pathway-based tests

We used MAGMA v1.06 [27] to conduct gene-based and pathway-based association tests on the AN–OCD cross-disorder phenotype. Association was tested using the SNP-wise mean model, in which the sum of −log(p) values for SNPs located within the transcribed region (defined using NCBI 37.3 gene definitions) was used as test statistic. Included variants had MAF > 0.01, and INFO scores > 0.8. All gene-based tests were performed on loci extending from 35 kb upstream of transcription start site to 10 kb downstream of the transcription end site. MAGMA accounts for gene-size, number of SNPs in a gene, and LD between markers when estimating gene-based p values. LD correction was based on estimates from the 1000 Genome phase 3 European ancestry samples. This yielded a total of 18,279 gene-based tests, with a multiple-test-corrected significance level of 0.05/18,279 = 2.74 × 10−6.

These gene-based p values were used to analyze sets of genes in order to test for enrichment of association signals in genes belonging to specific biological pathways or processes. In the analysis, only genes on autosomes and genes located outside the broad major histocompatibility complex (MHC) region (hg19:chr6:25–35 Mb) were included. For gene sets, we used sets with 10–1000 genes from the Gene Ontology sets curated from MsigDB 6.0 [28, 29], yielding a total of 17,081 separate pathway-based tests, with a multiple-test-corrected significance threshold of 0.05/17,081 = 2.92 × 10-6.

Partitioned LD score regression

Heritability was partitioned into different genomic regions using LD score regression [25, 30]. LD score regression can estimate common SNP heritability by regressing the χ2 association statistics against the LD scores, which are defined as the sum of r2 for each SNP. Indeed, assuming polygenicity, SNPs in high LD should on average be more associated with the trait than SNPs in low LD because they are expected to be in linkage with more causal variants. The relationship between χ2 statistics and LD scores is directly dependent on the proportion of genetic variance of the trait, which allows to estimate heritability.

Partitioned LD score regression uses the same principle except that SNPs are partitioned into diverse functional categories. Indeed, if some categories are enriched in causal variants, the relationship between χ2 statistics and LD scores should be stronger than for categories with few causal variants, allowing to estimate heritability enrichment of diverse functional categories. We used the “baseline model” provided on the partitioned LD score regression website (https://github.com/bulik/ldsc/wiki/Partitioned-Heritability) to compute heritability enrichment for 28 functional categories.

Tissue and cell-type association

We used MAGMA [27] and partitioned LD score regression [30] to look for association between tissue/cell-type specificity in gene expression and gene-level genetic association with AN–OCD. The methods for these analyses were extensively described elsewhere [31]. Briefly, for tissue-level expression, we downloaded the median RPKM across individuals (V6p) from the GTEx portal (https://www.gtexportal.org). For the single-cell RNA-seq, we examined single-cell RNA-seq data from 9970 single cells coming from five different mouse brain regions (cortex, hippocampus, striatum, midbrain, and hypothalamus) allowing the identification of 24 different brain cell types. The prototypical expression of each cell type was obtained by averaging gene expression levels for all cells classified in each of the 24 cell types. We then obtained a measure of specificity for each gene and each cell type/tissue by dividing the expression of each gene in each cell type/tissue by the total expression of the gene.

For MAGMA, we binned the specificity measure for each cell type into 41 bins (0 for genes not expressed, 1 for genes below the 2.5% quantile in specificity, …, 40 for genes between the 97.5% quantile in specificity and the 100% quantile). We then tested for a positive relationship between the specificity bins in each cell type/tissue and the gene-level genetic association of each gene (−10 kb upstream to 1.5 kb downstream). For LD score regression, we binned the specificity measure into 11 bins for each cell type (0 for genes not expressed, 10 for genes between the 90% quantile and the 100% quantile) and tested whether genomic regions around genes (−10 kb downstream to +1.5 kb upstream) in the top 10% most specific genes for each cell type/tissue were enriched for AN–OCD cross-disorder heritability.