Collection of samples analysed by Affymetrix Exon Arrays

CNS tissues originating from 137 control individuals was collected by the Medical Research Council (MRC) Sudden Death Brain and Tissue Bank, Edinburgh, UK40, and the Sun Health Research Institute (SHRI) an affiliate of Sun Health Corporation, USA41. A detailed description of the samples used in the study, tissue processing and dissection is provided in the study by Trabzuni et al.10 and in Supplementary Data 1. All samples had fully informed consent for retrieval and were authorized for ethically approved scientific investigation (National Hospital for Neurology and Neurosurgery and Institute of Neurology Research Ethics Committee, 10/H0716/3).

Processing of samples analysed by Affymetrix Exon Arrays

Total RNA was isolated from human post-mortem brain tissues using the miRNeasy 96 kit (Qiagen). The quality of total RNA was evaluated by the 2100 Bioanalyzer (Agilent) and RNA 6000 Nano Kit (Agilent) before processing with the Ambion WT Expression Kit and Affymetrix GeneChip Whole Transcript Sense Target Labelling Assay, and hybridization to the Affymetrix Exon 1.0 ST Arrays following the manufacturers’ protocols. Hybridized arrays were scanned on an Affymetrix GeneChip Scanner 3000 7G. Further details regarding RNA isolation, quality control and processing are reported in Trabzuni et al.10. A full list of the CEL files used in this study is provided in Supplementary Data 2.

Analysis of Affymetrix Exon Array data. All arrays were pre-processed using Robust Multi-array Average quantile normalisation with GC background correction (GC-RMA)42 and log2 transformation in Partek’s Genomics Suite v6.6 (Partek Incorporated, USA). We also calculated the ‘detection above background metric’ (DABG) using Affymetrix Power Tools (Affymetrix). After re-mapping the Affymetrix probe sets onto human genome build 19 (GRCh37) using Netaffx annotation file HuEx-1_0-st-v2 Probeset Annotations, Release 31, we restricted analysis to 174,228 probe sets annotated with gene names, containing at least three probes with unique hybridization and DABG P-values <0.001 in 50% of male or female individuals. The gene-level expression was calculated for up to 17,501 genes by using the median signal of probe sets corresponding to each gene.

Using the age at death and the reported age-related probability of being pre-menopausal in the US Caucasian population27 we predicted that 67% of the female donors (n=24) were post-menopausal, whereas 17% (n=6) were likely to be pre-menopausal. As we were unable to detect any significant differences in gene expression between pre- and post-menopausal women our analysis is limited to sex differences alone.

Sex-biased expression and splicing was investigated in each brain region separately using Partek’s mixed-model ANOVA (equation 1) and alternative splice ANOVA (equation 2, Partek Genomics Suite v6.6) as described below:

Where Y ijkl represents the lth observation on the ith Brain Bank jth Sex kth Scan Date, μ is the common effect for the whole experiment, ε ijkl represents the random error present in the lth observation on the ith Brain Bank jth Sex kth Scan Date. The errors ε ijkl are assumed to be normally and independently distributed with mean 0 and standard deviation δ for all measurements. Brain Bank and Scan Date are modelled as random effects.

Where Y ijklmn represents the nth observation on the ith Brain Bank jth Sex kth Scan Date lth Marker ID mth Sample ID, μ is the common effect for the whole experiment. ε ijklmn represents the random error present in the nth observation on the ith Brain Bank jth Sex kth Scan Date lth Marker ID mth Sample ID. The errors ε ijklmn are assumed to be normally and independently distributed with mean 0 and standard deviation δ for all measurements. Marker ID l is exon-to-exon effect (alt-splicing independent to tissue type).

Gender × Marker ID jl represent whether an exon expresses differently in different levels of the specified Alternative Splice Factor(s). Sample ID (Brain Bank × Gender × Scan Date) ijkm is a sample-to-sample effect. Brain Bank, Scan Date and Sample ID are modelled as random effects.

In order to reduce the likelihood of false positives only probe sets called as present in both male and female samples were analysed for evidence of alternative splicing by sex. In all types of analysis, the date of array hybridisation and brain bank (SHRI or MRC Sudden Death Brain Bank) were included as cofactors to eliminate batch effects as discussed in detail in Trabzuni et al.10 All P-values were corrected for multiple comparisons using the FDR step-down method.

We also investigated the value of integrating data across the different brain regions based on the idea that small yet consistent differences in gene expression may exist between male and female brain samples and while such differences might not be significant in a single brain region they might be detected when all samples are considered together. In order to test this approach we calculated the average expression of each gene-level signal across all regions for each individual. The resulting values were tested for sex-biased expression (including scan date and brain bank as covariates).

In order to ensure that any reported sex differences in gene level expression or splicing could not be explained by any of the other known covariates, we performed additional analyses, where we modelled the effects of cause of death, post-mortem interval, age at death and RIN as well as the factors described above. We found that in fact the findings reported remained substantively the same.

Quantitative RT–PCR

Aliquots of total RNA previously extracted from each brain region and analysed on Exon Arrays were used for validation by quantitative RT–PCR analysis. These experiments were performed on a subset of samples (N=85) and analysed for the expression of genes/transcripts using human-specific TaqMan assays (Applied Biosystems, UK). RPLP0-, TUBB- and UBC-specific assays were used as endogenous controls. Samples were analysed using Fluidigm 96.96 Dynamic (Fluidigm Europe) arrays with assay triplicates in accordance with the manufacturer’s protocol. RNA (100 ng) was used as input, reverse transcription performed using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) in accordance with the manufacturer’s protocol and amplified as described in the ‘Fluidigm Specific Target Amplification Quick Reference Manual’ (Fluidigm, Europe). The C t value (cycle number at threshold) was used to calculate the relative amount of mRNA molecules. The C t value of each target gene was normalized by subtraction of the C t value from the geometric mean of the three endogenous control genes to obtain the ΔC t value. The relative gene expression level was shown as 2−ΔC t . Sex-biased gene expression or splicing candidates were considered confirmed if the P-value calculated by unpaired t-test was <0.05.

Gene set enrichment analysis

GSEA was performed using GSEA v2.0.6 software33 with phenotype permutation. We investigated sex-biased enrichment of 71 gene sets annotated by the KEGG34,35), Reactome36 and BioCarta pathway databases with relevance to CNS function (Supplementary Table S1).

DNA genotyping and imputation

Genomic DNA was extracted from subdissected samples of human post-mortem brain tissue using either Qiagen’s DNeasy Blood & Tissue Kit (Qiagen, UK) or phenol–chloroform. The samples provided by either the MRC Sudden Death Brain and Tissue Bank or San Health Research Institute were genotyped on the Illumina Infinium Omni1-Quad BeadChip and on the Immunochip, a custom genotyping array designed for the fine-mapping of auto-immune disorders43. The other samples were genotyped using the Illumina Infinium HumanHap550 v3 (Illumina, USA). In all cases, the BeadChips were scanned using an iScan (Illumina) with an AutoLoader (Illumina, USA). GenomeStudio v.1.8.X (Illumina, USA) was used for analysing the data and generating SNP calls.

After standard quality controls both genotype data sets were combined and imputed using MaCH44,45 and Minimac using the European panel of the 1,000 Genomes Project (March 2012: Integrated Phase I haplotype release version 3, based on the 2010-11 data freeze and 2012-03-14 haplotypes). We used the resulting ~5.5 million SNPs with good post-imputation quality (Rsq>0.50) and minor allele frequency of at least 5%.

Processing of samples analysed by illumina expression arrays

Subdissected samples from cerebellar and frontal cortex samples originating from 390 control individuals were frozen before processing12,13,43. Total RNA was extracted from subdissected samples using either Qiagen’s miRNeasy Kit (Qiagen,UK) or using a glass-Teflon homogenizer and 1 ml TRIzol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. RNA was biotinylated and amplified using the Illumina TotalPrep-96 RNA Amplification Kit and directly hybridized onto HumanHT-12 v3 Expression BeadChips (Illumina Inc.) in accordance with the manufacturer’s instructions.

Analysis of illumina expression arrays

In order to maximise the number of samples available, this analysis was performed using expression data generated from 390 individuals who were expression-profiled on the HT-12 v3 BeadChip array, some of whom had also been analysed using the Exon Arrays. Raw intensity values for each probe were transformed using the cubic spline normalization method and then log2-transformed for mRNA analysis. After re-mapping the annotation for probes according to ReMOAT46, we restricted the analysis to probes that uniquely hybridized, were associated with gene descriptions, were located on autosomal chromosomes and that passed Illumina Detection P-values of <0.01 in >10% of male or female individuals. We also removed all probes containing SNPs or indels present in the European panel of the 1,000 Genomes Project (March 2012: Integrated Phase I haplotype release version 3, based on the 2010-11 data freeze and 2012-03-14 haplotypes) with a frequency of at least 1%. This resulted in the analysis of 13,425 transcripts in cerebellum and 13,396 transcripts in frontal cortex. The resulting expression data was adjusted for age, post-mortem interval and batch effects.

Identification of sex-biased eQTLs

A combined data set of 390 individuals (121 women and 269 men) was used to identify expression QTLs that behave differently in men and women. For each probe, an expression value that exceeded 3 s.d. from the mean was considered as an outlier and removed from analysis. The outlier detection was run in men and women separately and a total of 1.2% of the expression values was removed. The QTL analysis was run for each probe against a SNP, sex and the interaction term between sex and SNP in MatrixEQTL47. The P-value for the interaction term was used to select combinations of SNPs and probes for further analysis in R ( http://www.r-project.org/). We treated multiple expression QTLs from a tissue as one signal if the SNPs involved were clustered with linkage disequilibrium >0.50 and report the most significant expression QTL. The P-value threshold corresponding to the Bonferroni correction of multiple testing is ~3.7 × 10−12.