In Alzheimer’s disease, aggregates of Aβ and tau in amyloid plaques and neurofibrillary tangles spread progressively across brain tissues following a characteristic pattern, implying a tissue-specific vulnerability to the disease. We report a transcriptional analysis of healthy brains and identify an expression signature that predicts—at ages well before the typical onset—the tissue-specific progression of the disease. We obtain this result by finding a quantitative correlation between the histopathological staging of the disease and the expression patterns of the proteins that coaggregate in amyloid plaques and neurofibrillary tangles, together with those of the protein homeostasis components that regulate Aβ and tau. Because this expression signature is evident in healthy brains, our analysis provides an explanatory link between a tissue-specific environmental risk of protein aggregation and a corresponding vulnerability to Alzheimer’s disease.

Keywords

Our results identify a quantitative correlation between the histopathological staging of AD and the specific expression patterns of the genes corresponding to the proteins that coaggregate in plaques and tangles, together with those of the genes corresponding to the protein homeostasis components that regulate Aβ and tau. Our analysis thus associates the risk of protein aggregation in healthy brain tissues, as determined by their cellular environments, and their corresponding vulnerability to AD.

To test this hypothesis, we used the Braak staging of AD, which is based on the postmortem detection of NFTs ( 11 ), and has been verified by in vivo identification by imaging-based positron emission tomography scans ( 26 – 28 ). Tau can be considered as an effective reporter of a disease state because of its vulnerability to misfolding and aggregation when its environment becomes poorly regulated. The present study, therefore, can also be seen as an investigation of the variety of complex factors involved in such dysregulation.

To implement this idea, we asked whether tissue vulnerability to AD could be associated with failures of the protein homeostasis system ( 16 – 19 ), resulting from its saturation with an excess of aggregation-prone proteins ( 20 – 25 ). We therefore explored the hypothesis that vulnerable tissues are simultaneously characterized by the elevated expression of a specific subset of proteins vulnerable to aggregation in AD—those that coaggregate with Aβ and tau in amyloid plaques and NFTs—and by specific expression signatures of the corresponding protein homeostasis components that further exacerbate the risk of aggregation of these metastable proteins. To obtain a measure of the overall propensity of given tissues to aggregate, we defined a vulnerability score (called Δ score) based on the differential expression of the genes corresponding to aggregation-prone proteins and their regulators. With this vulnerability score, we created a vulnerability map of healthy brains. If our hypothesis is correct, this vulnerability map should recapitulate the histopathological staging of AD.

Alzheimer’s disease (AD), the most common form of dementia, is characterized by the aggregation of amyloid β (Aβ) and tau in amyloid plaques and neurofibrillary tangles (NFTs) ( 1 , 6 ). These assemblies spread across specific brain tissues, particularly through the formation of oligomers ( 7 – 10 ), following a characteristic pattern ( 11 ). However, the mechanisms that govern the selective vulnerability of these tissues are still debated ( 12 – 15 ). To understand why aberrant protein aggregates form in some tissues but not in others, we investigated whether the cellular environments of the most vulnerable tissues favor disease-specific protein aggregation.

For different brain cell types, including neurons, astrocytes, microglia, and endothelial cells, we calculated the relative mRNA expression levels ( 40 ), as measured by the score (see Methods), of genes corresponding to Aβ and tau, and the corresponding aggregation protectors and promoters. For each gene set in neurons, the significance of the difference with the expression distribution for all other brain cell types in combination was calculated using Mann-Whitney U test with Benjamini-Hochberg multiple hypothesis testing correction ( 50 ); ***P < 0.001 and *****P < 0.00001.

We next investigated whether the analysis described above is capable of identifying the types of cells that are most vulnerable to pathological aggregation in AD (see Methods). To address this point, we used single-cell human mRNA sequencing data ( 40 ). By calculating the relative expression of Aβ and tau for neurons, astrocytes, microglia, and endothelial cells, we found that their relative expression was elevated significantly in neurons ( Fig. 5 ). Simultaneously, we found that the relative expression of their aggregation protectors was the lowest and that of their promoters was the highest. These results indicate that neurons exhibit a cellular environment most conducive to Aβ and tau aggregation in comparison with the different brain cell types that we analyzed in this work.

To assess the possibility that the transcriptional signatures observed here are generic to all neurodegenerative disorders, we repeated the Δ analysis for a number of aggregation modulator sets relevant to amyotrophic lateral sclerosis (ALS) and PD (fig. S7). We compared the distributions of mean Δ BI– III scores for aggregation modulators of Aβ and tau (AD), α-synuclein (PD), and TDP43 and SOD1 (ALS), finding a statistically significant deviation from proteome Δ BI–III scores for Aβ and its protectors only (fig. S7A), despite the overlap between the aggregation modulators in the different diseases (fig. S7B).

Box plots of the Δ BI–III score distributions for each pathway category (x axis) in the context of the whole proteome. “All pathways” is the distribution of the Δ BI–III scores of all proteins in the human proteome with at least one KEGG pathway assigned. Boxes represent the first and third quartiles of the distribution, whiskers represent the 1.5 interquartile range, and notches are the standard errors on the median calculated with 10 4 bootstrap cycles. Significance values (**P < 0.01 and *****P < 0.00001) report the statistical significance of the difference with the first box plot (All pathways) calculated using Mann-Whitney U test with Benjamini-Hochberg multiple hypothesis testing correction ( 50 ).

We further investigated other possible causes of tissue vulnerability to AD, particularly the immune response ( Fig. 4 and table S7) ( 37 , 38 ). Our analysis of Δ BI–III scores of biochemical pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database ( 39 ) revealed that genes associated with inflammatory responses are expressed at elevated relative levels in healthy brains, whereas genes involved in autoimmune responses are expressed at lower relative levels in AD-vulnerable tissues ( Fig. 4 and table S7). Because no other immune pathway shows significant variation in expression ( Fig. 4 and table S7), these results support previous suggestions of a role for inflammation in the pathogenesis of AD ( 37 , 38 ). Thus, the vulnerability of specific tissues in AD may result from the sum of a number of factors, including the expression levels of disease-specific, aggregation-prone proteins and their corresponding protein homeostasis complements, as well as the immune system.

( A ) Tissues are colored according to the mean Δ score for expression in healthy brains of the aggregation modulators (protectors and promoters) of Aβ and tau aggregation (left) and to the Braak staging (right) (tables S4 to S6). The mean Δ score for aggregation modulator is calculated as the difference between the mean Δ scores for aggregation promoters and protectors in the region under scrutiny. ( B ) Box plot of the mean Δ scores for aggregation modulators [as calculated in (A)] in perfect-match tissues (see Methods) affected at progressive Braak stages (x axis). *****P < 0.00001; P values were calculated with Mann-Whitney U test with Benjamini-Hochberg multiple hypothesis testing correction ( 50 ). NB, non-Braak.

To investigate the relationship between the levels of expression of protein homeostasis components and the tissue vulnerability to AD at a more granular level, we focused our analysis on tissues where an accurate alignment is possible between AD-staged tissues and the tissue parcellation used in microarray analysis (see Methods and table S5). We found that expression levels of protein homeostasis components that modulate the aggregation of Aβ and tau in healthy brains (table S6) are good predictors of tissue vulnerability to AD and recapitulate the staging of the disease ( Fig. 3 and fig. S5). In addition, we performed a control using proteome-level data, which were available for mouse tissues ( 36 ). We found that the ranking of the tissue-specific risk using protein data is consistent with that using mRNA data (fig. S6).

Proteins known to promote Aβ and tau aggregation (termed “promoters”) are shown within ellipses, and proteins known to protect against it (termed “protectors”) are shown within rectangular boxes (table S4). Proteins whose roles in Aβ and tau aggregation are ambiguous in the literature are shown without frames and not considered in further analysis. Δ BI–III scores are color-coded according to the legend in the lower left corner. Error bars in the plot represent a 95% confidence interval on the mean (X symbol) calculated with 10 5 bootstrap cycles, ***P < 0.001 calculated with a two-tailed t test ( 49 ).

Elevated expression levels of proteins that coaggregate in plaques and tangles may promote aggregation in tissues vulnerable to AD. However, these levels are similar across different Braak regions and therefore do not explain by themselves the gradient of tissue vulnerability to disease (fig. S5). We therefore investigated whether a weak regulation of these proteins might also contribute to their vulnerability to aggregation. To analyze the role of the protein homeostasis components, we performed a literature search to identify molecular chaperones and posttranslational modifiers known to influence Aβ and tau aggregation (table S4). We found that components of the protein homeostasis system known to protect against the aggregation of these two proteins have a mean expression level in tissues most vulnerable to AD lower than that in AD-resistant tissues ( Fig. 2 ). These effects of reduced activity against aggregation are compounded by an elevated relative expression of other protein homeostasis components reported to promote aggregation in highly vulnerable tissues ( Fig. 2 ). These results suggest that the expression patterns of certain protein homeostasis components associated with protein aggregation predispose specific tissues in normal brains to be vulnerable to Aβ and tau deposition.

Our analysis revealed the presence of elevated expression levels of proteins that coaggregate in plaques and tangles in the tissues in which AD is first evident, as measured by the average Δ BI–III score (the Δ score for Braak regions I to III) ( Fig. 1B and tables S2 and S3). We tested the statistical significance of these results by calculating the Δ BI–III scores of 10 6 random sets of genes of equal size and comparing them to that of Aβ and tau aggregation modulators (fig. S3). We also found that genes corresponding to proteins that coaggregate within Lewy bodies ( 33 ) in Parkinson’s disease (PD; see Methods) ( 34 ) have elevated expression in a region of the brain [substantia nigra pars compacta (SNPC)] ( 35 ) that is highly vulnerable to this condition (fig. S4 and table S3), a result that suggests that the paradigm of tissue vulnerability that we propose here can be applied to other neurodegenerative disorders.

( A ) For a given gene, we define a Δ score by the normalized differential expression between the brain regions under scrutiny and all non-Braak regions (see Methods). ( B ) Box plot of Δ BI–III (the Δ score for Braak regions I to III) for the whole proteome and the proteins that coaggregate with Aβ and tau in plaques and tangles. Aβ and tau are shown as square points in their respective distribution. ***P < 0.001; the statistical significance of the difference between the distributions of the coaggregators and that of the proteome was calculated using Mann-Whitney U test with Benjamini-Hochberg multiple hypothesis testing correction ( 50 ).

To quantify differential gene expression between tissues, we defined a Δ score (see Methods, Fig. 1A , and fig. S1); for a given gene, a positive Δ score indicates that the expression in the region under scrutiny is higher than that in tissues not affected by AD (non-Braak regions). Although a relatively weak correlation exists between mRNA and protein levels ( 30 , 31 ), in this work, we consider average values across groups of genes, and hence, we expect stronger correlations to be present. To validate this type of approach, we verified that the patterns of gene expression analyzed here are consistent with the corresponding patterns of protein expression using two independent data sets (fig. S2) ( 31 , 32 ), and when possible, we directly analyzed proteome-level data (see below).

To investigate whether the expression patterns of proteins that coaggregate with Aβ and tau in plaques and tangles underlie tissue-specific vulnerability to AD, we performed a transcriptome-wide microarray analysis across more than 500 healthy brain tissues from the Allen Brain Atlas (see Methods) ( 29 ). Here, we characterized the progression of AD using Braak staging (see Methods) ( 11 ), whereby brain regions are classified according to the temporal appearance of the NFTs, whose deposition correlates with neuronal loss (see Methods and table S1). We analyzed the expression data derived from six healthy human brains of individuals aged 24 to 57 years, with 93% of known genes represented by at least two probes ( 29 ).

Together, our results show that brain tissues vulnerable to the pathological protein aggregation that defines AD are characterized by an elevated expression of a specific subset of aggregation-prone proteins, as well as by suboptimal levels of protein homeostasis components that predispose the aggregation of Aβ and tau. We have observed this expression signature in the brains of healthy subjects at ages at which AD is rarely evident (aged 24 to 57 years). These results indicate that the susceptibility of specific tissues to aggregation is a feature of healthy brains, which is permissive for, although not necessarily causative of, disease onset. We expect additional factors to influence tissue vulnerability, including the connectivity of tissues, because of the increasing evidence of the contribution to disease progression from the cell-to-cell spreading of Aβ and tau toxic oligomers ( 41 – 43 ). Our findings may inspire novel therapeutic approaches for AD, which, rather than trying to prevent a wide range of possible triggering events, could be based on the pharmacological enhancement of our natural defense mechanisms that maintain our proteome in a soluble state in the specific tissues where protein aggregation may take place more readily ( 16 , 21 , 44 ). In summary, our results illustrate how the origins of variable tissue vulnerability to AD may lie within the proteome through the identification in vulnerable tissues of an intrinsic expression signature associated with protein aggregation, observed decades before the typical age of disease onset.

METHODS

Data sources A full list of data sources for the results presented in this work is provided in table S8.

Allen Brain Atlas data We analyzed data from six healthy human brains from individuals aged 24 to 57 years. Samples were taken from more than 500 regions for each hemisphere, and more than 19,700 genes were studied with multiple probes. Microarray data were downloaded from the Allen Brain Atlas (29). Data were downloaded from the Allen Brain Atlas on 19 December 2014. UniProt data (45) for subcellular localization and biological process gene ontology assignments for most proteins were downloaded from www.uniprot.org/downloads on 21 May 2015. Protein IDs were converted between UniProt and Entrez ID (used by the Allen Brain Atlas) using the UniProt ID mapping service (table S2). With this procedure, expression data were assigned to about 90% of the human reference proteome.

Braak staging At progressive clinical stages of AD, conserved patterns of NFT deposition in neural tissues were observed, with increasingly large areas of the brain affected with advancing stages. In the Braak staging of AD (11), tissues were classified according to when, in the progression of AD, NFTs appear in constituent neurons because NFT formation is a pathological hallmark of AD and correlates well with cell atrophy (46). In Braak stages I and II, NFT involvement is confined primarily to the transentorhinal region of the brain. In stages III and IV, limbic regions are also affected, with all regions of the hippocampus exhibiting AD pathology. In stages V and VI, there is extensive neocortical involvement. Even at late stages of AD, some regions of the brain, notably the cerebellum, remain unaffected; we classified these regions as “non-Braak.” In the original paper describing the Braak staging (11), disease stages were discussed sequentially, with the regions affected noted at each stage, in addition to the severity of the pathology in these regions.

Mapping with the Allen Brain Atlas To assign the brain regions from the Allen Brain Atlas to the correct Braak stage, a rubric was developed. First, regions mentioned in the original paper (11), which we refer to as “Braak staging” regions, were assigned to the earliest Braak stage that they are associated with. Next, these regions were matched to the regions characterized in the Allen Brain Atlas. When a region in the Allen Brain Atlas was larger than a Braak staging region, the whole of the Allen Brain Atlas region was allocated to the corresponding Braak stage. For instance, although only two isocortical layers were affected in Braak stage III, all isocortical tissues were assigned to Braak stage III because isocortical expression data were not distinguished by the layer they came from in the Allen Brain Atlas parcellation. For this reason, when investigating the relationship between Braak staging and expression signature, “perfect-match” regions provide the most accurate data. Perfect-match tissues have a high correspondence between their Braak and Allen Brain Atlas perimeters. Assignments from Braak to Allen Brain Atlas regions are listed in table S1. Of the two main types of tissue in the brain, white matter consists mostly of glial cells and myelinated axons, whereas gray matter has a more diverse composition, including neuronal cell bodies, dendrites, myelinated and unmyelinated axons, glial cells, synapses, and capillaries. Thus, because NFTs are not found in AD in the axon hillock or initial axon segment, one would not expect to see them in white matter in AD (47). However, the volume of white matter does shrink in some regions during the progression of AD, where affected neurons have their cell bodies in gray matter and their axons in white matter. This fact implies that NFT appearance, and thus Braak staging, may not be ideal for describing vulnerability to AD in white matter tissues. However, the effect of this caveat on our study is limited because only 2 of the more than 500 regions studied in the Allen Brain Atlas include white matter and is accounted for in fig. S5C.

The Δ score of the expression of a gene Because the expression of a given gene in the Allen Brain Atlas is measured by multiple probes (29), we first normalized the expression reading E p,r for each probe p in each region r in the Allen Brain Atlas as (1)where μ p and σ p are the average and SD of E p,r across all regions, respectively. To quantify the variability of gene expression between tissues, we defined a Δ score (fig. S1) for a given probe p over a brain region R (which is typically made up of several Allen Brain Atlas regions) as (2)where (3)is the average of for the non-Braak regions, that is, those N NB regions that do not map onto any Braak staging regions (11), and (4)represents the average of over the set of Allen Brain Atlas regions mapped onto brain region R under scrutiny (for example, a Braak stage). Then, Δ scores for different probes measuring the same gene were averaged to give a gene Δ score (5)where the average is over the N p(g) probes p(g) used to measure a gene g.

Δ Scores of proteins coaggregating in plaques and tangles We calculated the Δ scores for two protein sets with high aggregation propensity in AD (table S3), that is, those found to coaggregate with Aβ and tau in plaques and tangles, respectively (Fig. 1B) (23). We found the mean Δ BI–III (Braak regions I to III) scores for these two subsets to be 0.3 (P = 2 × 10−4) and 0.1 (P = 2 × 10−4), respectively, which are significantly higher than that of the whole proteome (“Proteome”).

Δ Scores of proteins associated with PD PD is a neurodegenerative disorder whose onset is characterized by the death of dopaminergic neurons in the SNPC region (35). Lewy bodies, protein aggregates primarily composed of α-synuclein (34), are a major pathological hallmark of PD. The calculated mean Δ SNPC score for Lewy bodies coaggregators (table S3) was 0.6 (P = 1 × 10−9) (fig. S4). We also tested α-synuclein, finding a Δ SNPC score of 1.1.

Δ Distribution of KEGG pathways To evaluate differential pathway expression between Braak and non-Braak tissues for Fig. 4, we used the KEGG pathways (39). The Δ BI–III scores were calculated for each KEGG pathway. The Δ BI–III distribution for KEGG pathway members hand-sorted into larger groups (for instance, neurotransmission; see table S7) was compared to that of the human proteome.

Relative expression for cell types Data were obtained from a previous mRNA sequencing study of human brain tissue (40). To evaluate the vulnerability of different brain cell types (Fig. 5), the relative expression was calculated for each gene within each cell type as where E g,c is the expression for each gene g in a given cell type c, μ g,c is the mean expression of that gene in a given cell type c, and σ g,c is the SD of expression of that gene in a given cell type c.

Calculation of the relative expression for aggregation regulators We undertook an unbiased literature search to identify all molecular chaperones and posttranslational modifiers reported to affect the aggregation of Aβ or tau (table S4). These aggregation regulators were sorted into three groups: (i) proteins that protect against aggregation (protectors), (ii) proteins that promote aggregation (promoters), and (iii) proteins whose cumulative role on aggregation was ambiguous in the current literature. Entrez IDs of the molecular chaperones and posttranslational modifiers were identified and are provided in table S4 together with the relevant references. We evaluated relative expression for proteins in groups 1 and 2 by calculating the Δ BI–III scores (Fig. 2).

Evaluation of statistical significance To assess the differences in the distributions of Δ scores between various data sets, we used the nonparametric Wilcoxon/Mann-Whitney U test (48), or a two-tailed t test (49), as specified in the figure captions. Because of the high number of data and hypotheses tested in this study, we adjusted the P values to reduce the false discovery rate (FDR). Specifically, for Figs. 1 and 3 to 5 and figs. S2 and S5 to S7, we used the Benjamini-Hochberg multiple hypothesis testing correction to control the FDR (50) because this method allows the cost paid for the control of multiplicity to be kept relatively low. More generally, from the analysis of the relationship between FDR, sensitivity, and study sample size, it is known that microarray studies can be susceptible to large FDR, which, besides measurement variability, is primarily determined by the proportion of truly differentially expressed genes, the magnitude of the true differences, and sample size (51, 52). Because our work relies on 3700 microarray studies (up to 900 samples from six brains), the FDR rate analysis was performed on a relatively large sample size, allowing for rather sensitive detection of truly differentially expressed genes. We further increased the statistical significance of the results and avoided a high false-negative rate by calculating the significance of the difference of Δ score distributions for groups of genes. In comparison to calculating the significance of the differences of Δ scores of individual genes, this approach greatly reduced the number of hypotheses in our study. These tests were performed using the SciPy and rpy2 modules for Python.