Alzheimer’s disease (AD), a progressive neurodegenerative disorder, is the most common untreatable form of dementia. Identifying molecular biomarkers that allow early detection remains a key challenge in the diagnosis, treatment, and prognostic evaluation of the disease. Here, we report a novel experimental and analytical model characterizing epigenetic alterations during AD onset and progression. We generated the first integrated base-resolution genome-wide maps of the distribution of 5-methyl-cytosine (5mC), 5-hydroxymethyl-cytosine (5hmC), and 5-formyl/carboxy-cytosine (5fC/caC) in normal and AD neurons. We identified 27 AD region–specific and 39 CpG site–specific epigenetic signatures that were independently validated across our familial and sporadic AD models, and in an independent clinical cohort. Thus, our work establishes a new model and strategy to study the epigenetic alterations underlying AD onset and progression and provides a set of highly reliable AD-specific epigenetic signatures that may have early diagnostic and prognostic implications.

DNA methylation at the fifth position of cytosine [5-methyl-cytosine (5mC)] plays an important role in neuronal gene expression and neural development. Aberrant DNA methylation is associated with many neuronal disorders, including AD ( 7 – 10 ). 5mC can be further oxidized to 5-hydroxymethyl-cytosine (5hmC), 5-formyl-cytosine (5fC), and 5-carboxy-cytosine (5caC) by the ten-eleven-translocation (TET) family of dioxygenases. The oxidized products of 5mC accumulate during brain development and early adulthood, suggesting an independent and critical role in neuronal development and function ( 11 , 12 ). Several studies have suggested that dysregulated DNA methylation/demethylation is linked to AD onset and progression ( 13 – 18 ); however, the relationship between altered levels of 5mC, 5hmC, 5fC, and 5caC and AD is not known. This is primarily due to the lack of comprehensive (base resolution) integrated reference maps of these epigenetic marks during neural cell differentiation, maturation, and brain development. It is also, in part, due to the lack of experimental AD models and research strategies to map, analyze, and define AD-specific changes of the methylome and their oxidized derivatives. Therefore, illustrating the DNA 5mC, 5hmC, and 5fC/caC methylomes of early- or late-onset AD will likely uncover unique and common epigenetic pathways, and signatures that may apply to all AD pathogenesis.

Alzheimer’s disease (AD) is clinically characterized by the accumulation of amyloid-β (Aβ) plaques and neurofibrillary tangles, synaptic and neuronal loss, and cognitive decline ( 1 ). Mutations in the amyloid precursor protein (APP), presenilin 1 (PSEN1), or PSEN2 genes lead to autosomal dominant AD, with disease onset occurring before the age of 65 ( 2 ), while the presence of two copies of apolipoprotein ɛ4 (APOE4) is associated with late-onset AD (onset >60 years) ( 3 , 4 ). Recently, large-scale genome-wide association studies have identified several additional genes involved with the onset and progression of AD ( 5 ); however, the identified genes have not had a meaningful prognostic value for the onset of AD when compared to the abovementioned genes. Altogether, these genetic studies provide a framework of AD-associated gene networks and a classic panel for genetic screening of familial AD, which accounts for less than 20% of the AD patient population ( 6 ). On the other hand, the identification of effective biomarkers for the early detection and diagnosis of sporadic AD, which accounts for the vast majority of AD cases, remains a major challenge.

RESULTS

Description of the study model and subjects The overview of our experimental paradigm is outlined in Fig. 1. We used new base-resolution mapping and analytical technologies, such as oxidative bisulfite deep sequencing (OXBS-seq) and methylase-assisted bisulfite deep sequencing (MAB-seq), to characterize dynamic genome-wide distributions of 5mC, 5hmC, and 5fC/5caC modifications. We established cell culture models of neurogenesis by directing the differentiation of induced pluripotent stem cells (iPSCs) derived from healthy control [wild type (WT)] and AD patient–derived cells to neural progenitor cells, and then to cortical neurons, as previously described (19). The cell lines include a normal cell line (WT), two early-onset familial AD (EOAD) cell lines (PSEN1 and PSEN2), and a late-onset familial AD (LOAD) cell line (APOE4), which are the strongest genetic risk factors for developing AD (5). AD cell lines either carrying mutations in PSEN1, PSEN2, or homozygous APOE4 served as “AD-in-dish” models to mimic EOAD and LOAD, respectively. To ensure generalizability of the epigenetic features and signatures of healthy controls and AD-associated epigenetic alterations found in cell culture models, we extended our epigenomic studies to postmortem brain tissues of a normal donor and sporadic AD cases. Last, we validated the AD-specific 5mC signatures defined in our study using publicly available normal and AD cohorts. Fig. 1 Schematic of our study model. iPSC, induced pluripotent stem cells; NPC, neural precursor cells; N, neurons; APOE4, apolipoprotein E isoform 4; PSEN1, presenilin 1; PSEN2, presenilin 2; WT, wild type; MAB-seq, methylase-assisted bisulfite sequencing; OXBS-seq, oxidative bisulfite sequencing; DMRs, differentially methylated regions; DHMRs, differentially hydroxymethylated regions; DFCRs, differentially formyl/carboxylated regions.

Identification of 5mC, 5hmC, and 5fC/caC landscapes during neurogenesis Pluripotency genes, such as NANOG, OCT4, and SOX2, are expressed in iPSC-WT. Expression of neural precursor cell (NPC) markers, including FOXG1, NES, and TBR2, confirmed the complete induction of iPSCs to NPCs. Subsequently, successful differentiation of NPCs to neurons was confirmed with the expression of neuronal specific marks, such as CUX1, MAP2, TBR1, and TUJ1 (Fig. 2, A and B). Similarly, all iPSCs derived from AD patients can also be differentiated to NPCs and neurons (fig. S1, A to C). Fig. 2 Global methylation trends in iPSCs and iPSC-derived NPCs and neurons in WT cell lines. (A and B). Pluripotency, neural stem cell, and neural cell markers determined by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and immunofluorescence. Scale bar, 200 μm. (C) Distribution of 5mC/5hmC/5fC/caC levels in WT iPSCs, NPCs, and neurons. The y axis indicates the levels of 5mC, 5hmC, or 5fC/caC at each reference cytosine (at least 10 reads required). (D) Single-base profiles of 5mC, 5hmC, and 5fC/caC for the pluripotency gene OCT4 and the neural-specific gene MAP2 in iPSCs and the consecutive stages of iPSC-derived NPCs and neurons. Genomic coordinates: for OCT4, chr6:31,132,042-31,139,528; for MAP2, chr2:210,002,875-210,783,411. In all samples, the scale for 5mC, 5hmC, and 5fC/caC was 0 to 100%. Until recently, uncovering the exact patterns of 5mC, 5hmC, and 5fC/caC at base resolution has proven to be a challenging task due to the lack of technologies that allow us to distinguish 5mC from its oxidized modifications (5hmC and 5fC/caC) at a given single cytosine base. To circumvent this problem, we developed an in-house protocol and analytical algorithm to identify genome-wide distribution of sites and levels of 5mC, 5hmC, and 5fC/caC based on recently reported OXBS-seq (identifies 5mC and 5hmC, where the combined levels range from 0 to 100%) and MAB-seq (identifies 5fC/caC, where levels range from 0 to 100%) (20, 21). Globally, the levels of 5mC were higher in both NPCs and neurons (89.6 and 89.3%, respectively; P < 2.2 × 10−16) compared to iPSCs (82.5%), indicating that these 5mC sites tend to be more frequently methylated in the differentiated neural stages than in iPSCs (Fig. 2C, left). These findings are consistent with previously published reports (22, 23). Levels of 5hmC were lower (P < 2.2 × 10−16) in NPCs (8.3%) and neurons (7.7%) than in iPSCs (9.8%) (Fig. 2C, middle). The levels of 5fC/caC were higher in the iPSCs (36%) than in neurons (34%, P < 2.2 × 10−16) (Fig. 2C, right). We observed a surge of 5fC/caC level (48%) in NPCs, which is in agreement with previous studies in mice showing that levels of 5fC/5caC transiently increase during NPC stage and rapidly decline as cells commit to neural lineages (24), highlighting the critical roles of this modification in lineage commitment. We should note that due to its technical nature, MAB-seq is unable to distinguish between 5fC and 5caC. Therefore, the identified cytosine modification by MAB-seq reflects the total modification of 5fC/caC. Detailed description and quantification of genome-wide sites of 5mC, 5hmC, and 5fC/caC at base resolution are provided in Supplementary Text and table S1. Region- and gene-specific examples of 5mC, 5hmC, and 5fC/caC at base resolution are shown in Fig. 2D.

Disease and developmental differences in 5mC, 5hmC, and 5fC/caC profiles Using similar analytical approaches as in WT cells, we have also successfully mapped and analyzed at base resolution 5mC, 5hmC, and 5fC/caC distribution in AD patient–derived cell lines [for technical details and description of each sequencing (SEQ) dataset, see table S1]. Overall, global levels of 5mC and 5hmC in EOAD models and PSEN1 and PSEN2 cell lines followed the same patterns as the WT cells, which were characterized by gain of 5mC and loss of 5hmC in neurons compared to the iPSCs, whereas levels of 5fC/caC decreased in neurons compared to the iPSCs (Fig. 3, A and B). Intriguingly, in both PSEN1 and PSEN2 cell lines, levels of 5fC/caC failed to peak at the NPC stage as opposed to the trend in the WT NPCs. In the LOAD model cell line APOE4, we noted that 5mC levels increased during differentiation; however, the differences between iPSCs and neurons were not as prominent when compared to the WT or EOAD cell lines (Fig. 3C). Changes in 5hmC levels were also marginal between the different stages, while, similar to the EOAD models, 5fC/caC failed to peak in the NPC stage, which appears to be a common feature in all AD cell lines (Fig. 3, A to C). Furthermore, both PSEN1 and PSEN2 cell lines show similar global patterns of 5mC, 5hmC, and 5fC/caC during differentiation of iPSCs to neurons that appear to be independent of PSEN1 or PSEN2 gene mutations, but common to the EOAD model. Changes in methylation of cytosine at the base level for 5mC, 5hmC, and 5fC/caC across our different cell line models as well as during the differentiation process in normal and disease settings are depicted in Fig. 3D. Fig. 3 Disease and developmental differences in WT and AD neurons. (A to C) Distribution of 5mC/5hmC/5fC/caC levels in PSEN2, PSEN1, and APOE4 patient-derived cell lines. The y axis indicates the levels of 5mC, 5hmC, or 5fC/caC at each reference cytosine (at least 10 reads required). (D) Representative regions showing 5mC, 5hmC, and 5fC/5caC levels at single-base resolution in WT and AD-patient derived cell lines at the human leukocyte antigen (HLA) gene cluster (chr6:32,440,096-32,643,715).

Differentially methylated DNA regions demarcate key AD genes To determine the loci-specific similarities and differences of epigenetic profiles between WT and AD cells, we next performed more detailed comparative analysis of the 5mC, 5hmC, and 5fC/caC profiles between WT and PSEN2-AD cell models. We focused our initial comparative analysis between WT and PSEN2 lines, as the N141I mutation in PSEN2 has been well characterized and gives rise to autosomal dominant early-onset AD in patients younger than 65 years old (2). In both WT and PSEN2, levels of 5mC increased in NPCs and neurons compared to iPSCs (Fig. 4A, top). Gain of 5mC levels was marked by loss of 5hmC levels in both WT and PSEN2 lines during neural differentiation (Fig. 4A, middle). Of note, at any given stage, we observed that both 5mC and 5hmC of PSEN2 were slightly, but significantly, higher (P < 2.2 × 10−16) in iPSCs and neurons, but lower (P < 2.2 × 10−16) in NPCs compared to the WT lines. The dynamic changes of 5fC/caC levels in WT and PSEN2 were markedly different from those of 5mC and 5hmC during neural differentiation. The peak in 5fC/caC levels in the NPC stage as observed in our WT line appears to be an epigenetic feature that is critical in neurodevelopment (24). However, this pattern fails to be established in PSEN2 cells (Fig. 4A, bottom), suggesting an aberrant 5fC/caC methylome in the PSEN2 AD cell lines. Fig. 4 iPSCs and iPSC-derived NPCs and neurons in PSEN2 lines are characterized by distinct methylomes compared to the WT. (A) Total 5mC, 5hmC, and 5fC/caC levels in iPSCs and iPSC-derived NPCs and neurons in WT and PSEN2. The x axis indicates the levels of 5mC, 5hmC, or 5fC/caC at each reference cytosine (at least 10 reads required). ***P < 2.2 × 10−16, determined by Wilcoxon rank sum test. (B) Levels of 5mC, 5hmC, and 5fC/caC in WT and PSEN2 cell lines at TSS and gene body in iPSCs, NPCs, and neurons. For each cytosine modification, gene body was normalized to 0 to 100%. Normalized density is plotted from 5 kbp upstream of TSSs to 5 kbp downstream of TESs. (C) Stacked bars show the differentially 5mC, 5hmC, and 5fC/caC regions presented as gains and losses during directed differentiation of iPSCs to neurons in WT and PSEN2 lines. (D) Representative regions showing 5mC, 5hmC, and 5fC/5caC levels at single-base resolution in WT and PSEN2 cell lines. In all samples, the scale for 5mC and 5fC/caC was 0 to 100%, while for 5hmC, the scale was 0 to 70%. Genome coordinates for the images shown were as follows: PCDHB8, chr5:140,556,693-140,560,759; ANK1, chr8:41,731,112-41,755,347; HLA-DPA1, chr6:33,030,742-33,049,833. We next analyzed the distribution of 5mC, 5hmC, and 5fC/caC across averaged RefSeq genes in WT and PSEN2 cell lines. Profiles of 5mC in both WT and PSEN2 cells were similar at all differentiation stages, characterized by a large drop around the transcription start site (TSS) and an increase in gene body toward transcription end site (TES), dropping again around TES (Fig. 4B, top row). 5hmC profiles also showed a dip at TSS, followed by a gradual accumulation after TSS, finally surging at TES at all stages in both cell lines (Fig. 4B, middle row). On the other hand, 5fC/caC profiles are more complex than those of 5mC and 5hmC during differentiation in the two cell lines (Fig. 4B, bottom row). In iPSCs and NPCs, the profiles were characterized by a peak at TSS. Of note, in NPCs, the peak was greatly diminished in PSEN2 cells. In neurons, the peaks observed at TSS in iPSC and NPC disappear and become a dent. To identify the regions that contain differential levels of 5mC (DMR), 5hmC (DHMR), and 5fC/caC (DFCR) during differentiation (iPSCs to neurons) in both WT and PSEN2 cells, we analyzed stage-specific differences in DMRs, DHMRs, and DFCRs between WT and PSEN2 cell lines (Fig. 4C). The detailed characterization of each stage-specific group of DMRs, DHMRs, or DFCRs between WT and PSEN2 lines is depicted in the Supplementary Text. Gene ontology (GO) enrichment analyses for the genes identified in DMRs (8713), DHMRs (1052), and DFCRs (2147) revealed a significant enrichment on neurodevelopmental processes, including forebrain development (P < 10−6), synapse organization (P < 10−9), and differentiation of neurons (P < 10−8), highlighting the link between 5mC, 5hmC, and 5fC/caC methylome changes and disease-critical neural gene programs. The top 10 most highly significant gene sets ranked by P value are presented in fig. S2 (A to C). Focusing only on the pathways identified in the GO enrichment analysis that contained known AD risk genes revealed that changes in the epigenetic component overlap with diverse and critical biological pathways, including immune response, metabolism, and oxidative stress response, all of which have been shown to be disrupted in AD (table S2, A to C). Other relevant genes identified in DMRs, DHMRs, or DFCRs include ANK1, BACE2, BIN1, CLU, HLA-DPA1, and PCDHB8, which are known AD risk genes and have well-established roles in AD pathology. These findings demonstrate that changes in 5mC, 5hmC, and 5fC/caC epigenetic profiles are directly associated with a network of key AD susceptibility genes in PSEN2 AD cell models. For instance, we observed loss of 5mC at TSS and gene body of PCDHB8 at all stages in PSEN2 cells compared to the WT cells. This gene codes for neural cadherin–like protein and has been reported as an AD risk gene. ANK1 is a well-established gene with a critical role in AD pathology, and few studies have reported its epigenetic deregulation in AD (17, 25). ANK1 is generally characterized by accumulation of 5hmC in the PSEN2 model in comparison to the WT. An example from the DFCRs is the HLA-DPA1 gene. The role of HLA-DPA1 has also been studied in the context of AD, and genetic polymorphisms of this gene increase risk for AD (26). We observed stark differences in 5fC/caC levels along the entire HLA-DPA1 gene in the PSEN2 cell model compared to the WT (Fig. 4D). Collectively, our findings highlight the pivotal role of proper timing and landscape of the three DNA methylation states in regulation of AD-critical genes. Because the patterns of methylation of all states are disrupted in AD patient–derived lines at all differentiation stages, it is postulated that the dysregulation of 5mC, 5hmC, and 5fC/caC on these key AD genes may intrinsically and directly be linked to AD pathology.

Discovery of AD-specific 5mC, 5hmC, and 5fC/caC signatures To verify the aforementioned global changes in DNA methylation at all three states in additional AD cell lines, we applied the same in-depth analysis approaches to characterize DMRs, DHMRs, and DFCRs in another EOAD cell model, PSEN1, and the LOAD model APOE4. The comparison analysis identified a total of 2197 DMRs, 22 DHMRs, and 128 DFCRs, which were shared among all three AD cell models. To broaden our findings from cell culture models and to ensure that the changes of DNA methylation states are disease specific and not due to the aging process itself or a cell culture artifact, we also mapped and analyzed 5mC, 5hmC, and 5fC/caC of DNA samples derived from postmortem brain tissues of healthy donor and AD patients. Last, we overlapped our two sets of DMRs, DHMRs, and DFCRs (originating from cell lines or brain tissues), which resulted in 1447 DMRs, 7 DHMRs, and 23 DFCRs. We further extracted these shared DMRs/DHMRs/DFCRs to a final number of 71 AD-specific epigenetic signature regions located in autosomes, which were defined according to the following criteria: (i) All signature loci must be consistently presenting the same trends of gain or loss of 5mC, 5hmC, or 5fC/caC across all disease models compared to the control, and (ii) because 5mC is the most abundant modification, we applied more stringent criteria using a methylation difference of <−1 or >1 as a cutoff to reduce confounding background signal, while cutoff for 5hmC and 5fC/caC was <−0.1 or >0.1. These 71 regions were associated with 56 different genes (table S3). Because biological reproducibility is key, we applied another filter to our regional signatures based on methylation variation (using a cutoff of ≤±1 fold change for 5mC and ≤±1.5 fold change for 5hmC and 5fC/caC) between the samples we used in our study. This led to 19 signature regions for the 5mC modification, of which 17 were marked by loss of 5mC and 2 were marked by gain of 5mC in AD samples compared to the controls (Fig. 5A). Using the same approach, we narrowed down our 5hmC signature regions to five, of which four were marked by loss of 5hmC and one was characterized by gain of 5hmC in AD samples versus controls (Fig. 5B). Similarly, for the 5fC/caC group of signatures, we identified the final three signatures, where two had loss of 5fC/caC, while one had gain of 5fC/caC in AD samples compared to normal controls (Fig. 5C). Most of the 5mC signatures (8 of 19) were located in intronic regions, followed by 5 of 19 in promoters, 3 of 19 in distal intergenic regions, 2 of 19 in exons, and 1 of 19 in 3′ untranslated regions (3′UTRs). The 5hmC gene signatures were physically located as follows: one of five in distal intergenic regions and the rest were equally distributed in promoters (two of five) and 3’UTRs (two of five). All 5fC/caC (three of three) signatures were located in distal intergenic regions (table S4). The genes harboring epigenetic signatures can be classified into four major subgroups according to their diversified biological function: (i) neurodevelopment and neuronal transcription factors, (ii) critical cellular processes, (iii) RNA and associated proteins (including noncoding RNAs), and (iv) cell signaling. Additional information for each signature region is provided in table S4 and the Supplementary Text. Collectively, we have identified 27 regional signatures of the methylated DNA cytosine that are specifically associated with AD. Fig. 5 Epigenetic signatures associated with AD. (A to C). The outermost circle (1) presents chromosome ideograms (in megabases). Changes in 5mC (A), 5hmC (B), or 5fC/caC (C) levels are shown as differences of 5mC, 5hmC, or 5fC/caC levels between normal and AD. Red bars indicate gain of 5mC, 5hmC, or 5fC/caC in AD versus controls, while blue bars mark loss of 5mC, 5hmC, and 5fC/caC in AD versus normal. The second layer of the circle (2) shows genes associated with AD epigenetic signatures. Genes in red letters fall in the ultraconserved regions. Layers 3 to 7 show 5mC, 5hmC, and 5fC/caC ratios, respectively, in AD versus control: (3) PSEN2 neurons/WT neurons, (4) PSEN1 neurons/WT neurons, (5) APOE4 neurons/WT neurons, (6) AD–frontal lobe/normal brain, and (7) AD–left frontal lobe/normal brain. (D) 5mC, 5hmC, and 5fC/caC signature regions narrowed down to individual CpG sites. Using methylation variation difference of ≤±1 fold change for 5mC and ≤±1.5 fold change for 5hmC and 5fC/caC, we identified 19 regions for 5mC signatures, 5 regions for 5hmC signatures, and 3 regions for 5fC/caC signatures. We then further filtered our 5mC, 5hmC, and 5fC/caC signatures by using methylation difference of ≤−0.1 or ≥0.1 fold change as a final cutoff. This yielded 27 CpG sites in the 5mC signatures, of which 21 and 6 CpG sites were marked by loss or gain of methylation, respectively, in AD samples versus controls. In the 5hmC group, we identified three CpG sites and one CpG site, which were marked by loss or gain of 5hmC, respectively, in AD samples compared to controls. In the three signature regions of 5fC/caC, we identified an equal number of CpG sites (four in each group) that were marked by gain or loss of 5fC/caC levels in AD versus controls. Lastly, because our approaches allow us to investigate these cytosine modifications at single-base resolution, we next aimed to narrow down our regional signatures to individual CpG sites based on the following criteria: (i) identify the CpG sites that overlapped in the signature regions in all study samples, (ii) sort CpG sites that have the same trends of methylation (gain or loss) in AD versus controls, and (iii) apply cutoff of methylation fold-change difference of ≤−0.1 and ≥0.1 to avoid false positives. This led to a total of 39 AD-specific CpG sites that were consistently modified in early-onset, late-onset, familial, and sporadic AD models, which contain 27 5mC-, 4 5hmC-, and 8 5fC/caC-specific sites (Supplementary Text, Fig. 5D, and table S5). Finally, most of our 5mC, 5hmC, and 5fC/caC signatures were directly associated with genes involved in neuronal functions and/or have been implicated in AD.