UK Biobank cohort

UK Biobank data6 (http://www.ukbiobank.ac.uk) were collected on over 500,000 individuals aged between 37 and 73 years from across Great Britain (England, Wales and Scotland) at the study baseline (2006–2010), including health, cognitive and genetic data.

The Research Ethics Committee (REC) granted ethical approval for the study—reference 11/NW/0382—and the current analysis was conducted under data application 10,279.

Genotyping

Genotyping details for the UK Biobank cohort have been reported previously7,8. Briefly, two custom genotyping arrays were utilised with 49,950 participants typed using the UK BiLEVE Axiom Array and 438,427 participants typed using the UK Biobank Axiom Array7,8. The released genotyped data contained 805,426 markers on 488,377 individuals. Imputed genotypes were supplied with the UK Biobank data with the Haplotype Reference Consortium (HRC) used as the imputation reference panel7.

Downstream quality control steps conducted for the current analysis included removing (1) those with non-British ancestry based on both self-report and a principal components analysis, (2) outliers based on heterozygosity and missingness, (3) individuals with sex chromosome configurations that were neither XX nor XY, (4) individuals whose reported sex did not match inferred sex from their genetic data and (5) individuals with >10 putative third-degree relatives from the kinship table. This left a sample of 408,095 individuals. To remove the possibility of double contributions from sibs, whose parents will have the same AD status, we first considered a list of all participants with a relative (N = 131,790). A genetic relationship matrix was built for these individuals using GCTA-GRM9 and a relationship threshold of 0.025 was applied to exclude related individuals. After removing one person from each pair of related individuals, the sample size was 332,050. Quality control thresholds applied to the GWAS included: minor allele frequency >0.01, imputation quality score >0.3 and restriction to HRC-imputed SNPs, leaving a total of 7,795,605 SNPs for the GWAS.

Phenotypes

Family history of Alzheimer’s disease was ascertained via self-report. Participants were asked “Has/did your father ever suffer from Alzheimer’s disease/dementia?” and “Has/did your mother ever suffer from Alzheimer’s disease/dementia?” Self-report data from the initial assessment visit (2006–2010), the first repeat assessment visit (2012–2013) and the imaging visit (2014+) were aggregated with exclusions made for participants whose parents were: aged under 60 years; dead before reaching age 60 years; without age information. After merging with the genetic data, this left 27,696 cases of maternal AD with 260,980 controls, and 14,338 cases of paternal AD with 245,941 controls. There were 314,278 instances where AD information was available on at least one parent. Given the expected difference in disease prevalence due to sex differences in longevity—AD prevalence was 1.7-fold higher in mothers compared to fathers—GWA studies were performed separately for maternal and paternal AD.

Genome-wide association study

The GWA studies were conducted using BGENIE7. The outcome variable was the residuals from a linear regression model of maternal or paternal AD on age of parent at death or at time of the offspring’s self-report, assessment centre, genotype batch, array and 40 genetic principal components. The predictor variable was the autosomal SNP and an additive model was considered.

The GWAS linear regression coefficients were converted to odds ratios using observed sample prevalences of 0.096 and 0.055 for maternal and paternal AD, respectively10. Subsequently, the log-odds were multiplied by two so that the effect sizes are reported on the same scale as a traditional case–control design5. Briefly, the conversion to odds ratios uses the following equation, derived in Lloyd-Jones et al.10, where k = disease prevalence, p = population allele frequency and β = the estimated SNP regression coefficient on the binary disease scale from the GWAS: \({\mathrm{OR}} = \frac{{\left( {{{k}} + {{\beta }}\left( {1 - {{p}}} \right)} \right) \times \left( {1 - {{k}} + {{\beta p}}} \right)}}{{\left( {{{k}} - {{\beta p}}} \right) \times \left( {1 - {{k}} - {{\beta }}\left( {1 - {{p}}} \right)} \right)}}\). SEs for the log-odds were then calculated based on the adjusted OR and the P-value from the initial GWAS (Supplementary Note 1). The ORs and SEs were then carried forward to a SE-weighted meta-analysis in METAL11, first to create a UK Biobank parental meta-analysis, and then with the stage 2 summary output from the International Genomics of Alzheimer’s Project (IGAP) study3 and the stage 1 output for the SNPs that did not contribute to stage 2. Linkage disequilibrium score (LDSC) regression was used to estimate the genetic correlation between the maternal and paternal AD GWAS results and to test for residual confounding in the meta-analysis by examining the LDSC intercept12,13.

The number of independent loci from the meta-analysis was determined by using the default settings in FUMA14. Independent lead SNPs had P < 5 × 10−8 and were independent at r2 < 0.6. Within this pool of independent SNPs, lead SNPs were defined as those in LD at r2 < 0.1. Loci were defined by combining lead SNPs within a 250 kb window and all SNPs in LD of at least r2 = 0.6 with one of the independent SNPs. A gene-based analysis was carried out on all SNP output using the MAGMA software15 with default settings (SNP-wise (mean) model for each gene), and assuming a constant sample size for all genes. A Bonferroni-adjusted P-value of 0.05/18,251 = 2.7 × 10−6 was used to identify significant genes. The 1000 genomes phase 3 data16 were used to map LD in both the independent locus and MAGMA analyses.

Summary data-based Mendelian randomisation

To test for pleiotropic associations between SNPs and AD and gene expression/DNA methylation in the brain, summary data-based Mendelian randomisation (SMR) was performed17. GWAS summary output from the meta-analysis of UK Biobank and IGAP (sample size specified as 314,278 + 74,046 = 388,324) were included along with expression Quantitative Trait Loci (QTL) summary output from the Common Mind Consortium, which contains data on >600 dorsolateral prefrontal cortex samples, and DNA methylation QTL summary output on 258 prefrontal cortex samples (age >13)18. The reference genotypes were based on the Health and Retirement Study, imputed to the 1000 Genomes phase 1 reference panel. SNP exclusions included: imputation score <0.3, Hardy–Weinberg P-value < 1 × 10−6 and a minor allele frequency <0.01. Related individuals, based on a genomic-relationship matrix cutoff of 0.05, were removed. Two sets of eQTL summary data were considered (1) after adjustment for diagnosis, institution, sex, age of death, post-mortem interval, RNA integrity number (RIN), RIN2, and clustered library batch (2) with additional adjustments for 20 surrogate variables to account for additional possible confounders. Five ancestry vectors were included as covariates in the eQTL analyses. Further details are available at: https://www.synapse.org/#!Synapse:syn4622659. Default parameters for the SMR analysis were used and cis eQTLs/methQTLs were considered for analysis. Bonferroni-corrected P-value thresholds were applied (P < 0.05/2011 = 2.5 × 10−5 for eQTL data set 1, P < 0.05/4380 = 1.1 × 10−5 for eQTL data set 2 and P < 0.05/54,624 = 9.2 × 10−7 for the methQTL data set). The SMR P-value highlights candidate transcripts or methylation sites through which a cis SNP may be acting on the outcome, AD. The heterogeneity in dependent instruments (HEIDI) P-value indicates evidence for a single causal SNP (effect on AD is mediated through the transcript/methylation site if P > 0.05) or different SNPs affecting AD and the transcript/methylation site if P < 0.05.