Investigators have long suspected that pathogenic microbes might contribute to the onset and progression of Alzheimer’s disease (AD) although definitive evidence has not been presented. Whether such findings represent a causal contribution, or reflect opportunistic passengers of neurodegeneration, is also difficult to resolve. We constructed multiscale networks of the late-onset AD-associated virome, integrating genomic, transcriptomic, proteomic, and histopathological data across four brain regions from human post-mortem tissue. We observed increased human herpesvirus 6A (HHV-6A) and human herpesvirus 7 (HHV-7) from subjects with AD compared with controls. These results were replicated in two additional, independent and geographically dispersed cohorts. We observed regulatory relationships linking viral abundance and modulators of APP metabolism, including induction of APBB2, APPBP2, BIN1, BACE1, CLU, PICALM, and PSEN1 by HHV-6A. This study elucidates networks linking molecular, clinical, and neuropathological features with viral activity and is consistent with viral activity constituting a general feature of AD.

This study presents novel evidence linking the activity of specific viruses with AD. This has been enabled by comprehensive molecular profiling of large patient cohorts, facilitating the integration of diverse biomedical data types into an expansive view spanning multiple disease stages, brain regions, and -omic domains. This has also allowed us to direct our analysis in an entirely data-driven manner and benefit from a form of data capable of implicating specific viral species. Our results offer evidence of complex viral activity in the aging brain, including changes specific to AD, particularly implicating Herpesviridae, HHV-6, and HHV-7. Taken together, these data provide compelling evidence that specific viral species contribute to the development of neuropathology and AD.

We designed this study to map and compare biological networks underlying two distinct AD-associated phenotypes using multiple independent datasets collected from human subjects. We began with a computational network characterization of a specific endophenotype of AD: brains meeting neuropathological criteria for AD from individuals who were cognitively intact at the time of death (), which we refer to here as “preclinical AD” (). We presumed that a network model of preclinical AD (and its comparison with networks built from cognitively intact persons without neuropathology) might provide novel insights into the molecular context of neuropathology in the absence of clinical symptoms. Alternatively, since these individuals had eluded cognitive decline despite significant AD pathology, we reasoned that this might illuminate protective or resilience mechanisms. Functional genomic analysis of preclinical AD network alterations revealed multiple lines of evidence consistent with viral activity. We then directly evaluated viral activity in a multiscale network analysis of four large, multi-omic datasets (comprising samples from individuals with “clinical AD” as well as neuropathologically and cognitively normal controls) that included next generation sequencing data, enabling direct examination of viral DNA and RNA sequences.

Recent reports demonstrate that diverse classes of microbes can stimulate amyloid-beta (Aβ) aggregation and deposition as part of an intra-CNS anti-microbial innate immune response whereby the amyloidosis triggered by various microbes results in the coating of the infectious particles by the growing amyloid aggregate (). These microbes coated with aggregated Aβ become unable to interact with cell surfaces, thereby arresting the infectious process.

Beginning with Crapper McLachlan in 1980 (), several investigators have proposed that AD is an SSPE-like illness, caused by a slow virus form of herpes simplex (). Hundreds of reports have associated AD with diverse bacterial and viral pathogens (), most frequently implicating Herpesviridae (particularly HSV-1 [], EBV, HCMV, and HHV-6 []). The results of these studies, taken in aggregate, are suggestive of a viral contribution to AD, though findings offer little insight into potential mechanisms, and a consistent association with specific viral species has not emerged.

Important roles for microbes and antimicrobial defenses in the pathogenesis of Alzheimer’s disease (AD) have been postulated or evaluated for at least six decades, beginning with Sjögren in 1952 (). “Slow virus” was one of the early names used for the illness that eventually came to be known as prion disease, referring to the hypothesis that conventional viruses might be capable of acting to cause not only acute encephalitis, but also a progressive neuronal destruction process that might engender less inflammation because of its slowly progressive nature (). Measles (MV) is a conventional virus that can act through acute inflammatory and slow neurodegenerative processes, occasionally re-emerging as a fatal brain disease known as subacute sclerosing panencephalitis (SSPE) up to a decade after a typical acute MV infection ().

We aggregated the results across several analyses to summarize the systems-level impact of individual viral species on AD biology ( Figure 8 A). This included multiregional viral RNA differential abundance, multiregional correlations of RNA abundance with AD traits, associations between vQTL sets and AD genetics, viral DNA differential abundance, and AD risk gene enrichment scores for virus-host subnetworks. Viral species implicated in any of these analyses were assigned a combined score based on the summed -log10(p value) of each individual association. This view indicated that multiple viruses impact on AD associated biology, across multiple -omic domains. The most strongly implicated viruses, were Roseoloviruses HHV-6A, HHV-7, and HHV-6B. HHV-6A in particular was robustly prioritized on the basis of: (1) RNA and (2) DNA differential abundance, (3) RNA abundance association with AD traits, (4) vQTL markers association with AD traits, (5) HHV-6A/host network eQTL enrichments for AD risk loci, and (6) inducing the expression of a significant fraction of AD risk genes within the HHV-6A/host network.

(B) Findings from this study indicate complex relationships between viral and host factors that are likely to be relevant across a range of timescales and organ systems. Key biological processes that have been highlighted are shown, along with top candidate molecular mediators.

These results indicate novel molecular mediators and associated pathways that might help shed light on mechanisms of viral pathogenicity, such as a potential role for AARSs in HHV-6A co-option of host protein synthesis machinery. These findings may also suggest novel molecular mediators and mechanisms for more widespread changes we have observed, such as dysregulation of G4 activity.

We also found that HHV-6A-induced expression of multiple AARS enzymes (VARS and SARS), responsible for charging their cognate tRNAs with valine and serine, respectively. Increased d-serine has been reported in the CSF of patients with AD, as well cellular and mouse AD models (). Some viruses possess endogenous tRNA and AARS sequences, which appear critical to viral protein synthesis throughout the infectious cycle (). Multiple viruses recruit host tRNAs into virions for use as primers in reverse transcription () and some such as HIV also selectively incorporate host AARSs as well (). Recent works have also demonstrated the role of tRNA in shifting the G4 conformational equilibrium toward a hairpin conformer ().

We constructed virus-host protein networks to evaluate proteomic consequences of viral activity in AD. We generated protein expression profiles for a subset of APFC samples (n = 152), performing liquid-chromatography-mass spectrometry (LC-MS) and using MaxQuant () for quantifying label free protein. Using a similar procedure as outlined for generation of virus-host RNA networks, we identified proteins that are associated with vQTLs and that are regulated by (or that regulate) each virus. Of the viruses that we found as differentially abundant in the APFC RNA data, we detected interactions with host proteins for HSV-1 (14 proteins), HSV-2 (34 proteins), and HHV-6A (28 proteins) ( Figure S3 Table S11 ). Protein regulators of cellular nucleotide pools, especially purine biosynthesis (NUDT16 and GMPR2), guanine nucleotide binding proteins (GNAS, GNAO1, GNG3, and GNG5), aminoacyl-tRNA synthetases (SARS, VARS, and AARS), mitochondrial function (MT-ND5 and MFF), nuclear organization (NCL and LMNA), and cytoskeletal disruption (CAMK2D, LMNA, MYLK, PRKCB, and TF) are among the most prominent biological themes of the networks. For instance, we found that several viruses alter expression of nucleotide regulating proteins, including HSV-2 induction of reductase enzyme GMPR2, which catalyzes conversion of G to A nucleotides, and HHV-6A induction of inosine diphosphatase NUDT16, which depletes the cellular pool of non-canonical purines IDP and ITP. Collectively, this suggested a picture of virally induced dysregulation of nucleotide pool metabolism, especially purine bases, consistent with several metabolomics studies in AD (). This is notable, given our observations of G4 activity as key regulatory features among preclinical AD driver genes, which are primarily features of guanine-enriched sequences. Nucleotide pool depletion or imbalance is known to induce replicative stress, mediated through inadequate unwinding of stabilized G4 sequences, with associated genomic and epigenetic instability ().

These observations indicate the potential for viral perturbation of host TFs and TF-Target networks in the context of AD and offer an explanatory mechanism that could account for the large number of virus-host interaction detected for species such as HHV-6A and HSV-2. The described kinase-TF enrichments represent a potential upstream mechanism whereby HHV-6A modulation of kinase activity could alter the activity of specific endogenous TFs, thus perturbing host regulatory programs in the manner reflected in the HHV-6A host networks.

This unanimous virus/TF network ( Figure 7 D) includes 26 associations between four different viruses (HSV-2, HHV-6A, HCMV, and KSHV) and 14 TFs. The majority of connections (20 of 26) indicate associations where a viral feature exerts an agonist-like effect on TF targets. We found multiple virus-TF interactions where the TF was also directly detected as being regulated by the virus, including agonist-like interactions between HHV-6A and OLIG1, TCF12, SOX8, and ZEB2. In each of these cases, the TF was upregulated by the virus, consistent with the agonist-like associations inferred from the TF network enrichments. We hypothesized that a potential mechanism to link viral perturbation of multiple TF networks might be mediated through viral mimicry or modulation of host kinase activity (). We compared the set of four HHV-6A regulated TF (that were also directly detected as regulated by HHV-6A) with a library of kinase coexpression networks () and identified significant (FDR < 0.1) enrichments with seven different human kinases ( Figure 7 E). This includes four kinases that were also detected as directly upregulated by HHV-6A. For example, Neurotrophic Receptor Tyrosine Kinase 2 (NTRK2) expression is linked with neuronal survival in AD () and associated with AD risk (). FYN tyrosine kinase, linked with synaptic dysfunction in AD via a range of Aβ- and/or tau-related mechanisms (), is also bound directly by the HHV-6A U24 protein potentially disrupting interactions with endogenous ligands ().

Within each virus-host network, we compared the “virus to host” genes with the target genes for each TF, across all seven TF-target networks to identify significantly (FDR < 0.1) associated virus/TF pairs ( Figure 7 A). For each enriched pair, we examined the concordance of effect that the virus and TF demonstrate on the genes driving the enrichment ( Figure 7 B). We calculated the Pearson’s correlation between the individual virus-host gene correlations and the TF/target gene correlations to identify instances where the virus and TF both exerted a similar effect (“TF agonist-like,” Pearson Corr > 0, FDR < 0.1), and instances where they exerted an opposing effect (“TF antagonist-like,” Pearson Corr < 0, FDR < 0.1). We then identified the set of virus/TF pairs that were associated in all seven TF-networks and that demonstrated a consistent status as TF agonist-like/antagonist-like in each case ( Figure 7 C), reasoning that these “unanimous virus/TF associations” would represent the most robust candidates for considering viral perturbation of a TF network.

Given our earlier findings of widespread changes in C2H2-TF in the preclinical networks, we were interested in determining whether the virus-host networks might indicate viral targeting (directly or indirectly) of specific TFs ( Figure 7 ). We constructed a diverse collection of TF-target networks, generated from the MSBB, MAYO, and ROSMAP consortium data, using the Transcriptional Regulatory Network Analysis (TReNA) approach comprising seven TF-target networks in total (reflecting tissue-specific networks within each cohort), which collectively model transcriptional relationships between 569 unique TFs and 14,583 target genes.

(E) Kinase enrichment analysis of the most strongly implicated TF identified several kinases that we also detected as regulated by HHV-6A, with known associations to AD and HHV-6A, indicating a potential mechanism for viral co-option of TF networks in AD.

(B–D) Examination of the concordance of effect exerted by associated virus and TF upon target genes (B), and summarization of results across all TF-target networks (C) to identify unanimous virus/TF associations (D).

Collectively, these findings support the role of miR-155 as a key node in host response to AD-relevant viral perturbation, and as a potential mediator of neuronal loss. This is also consistent with a contribution of viral perturbation in driving the preclinical AD transcriptional phenotype given that our prioritization of miR-155 was informed by findings in the preclinical AD networks. The finding that miR-155-KO causes increased Aβ plaque deposition in the presence of APP/PS1 mutations also suggests a pathway linking viral perturbation with AD-associated neuropathology. This line of evidence is also consistent with findings linking viral infection with antimicrobial innate immune response, proamyloidogenesis, and microbe entombment ().

Given the context of HHV-6A-induced inhibition of miR-155 within the NLN, we wondered whether we would observe transcriptomic evidence for mechanisms associated with neuronal loss. Molecular and functional enrichments of the miR-155-KO DEGs ( Table S10 ) highlighted multiple enrichments for apoptosis regulatory pathways and neuronal signature gene sets ( Figure 6 N).

Given the convergence of multiple analyses upon miR-155 ( Figures 6 G and 6I), we crossed miR-155-KO mice () with a standard APP/PS1 amyloidosis strain () to evaluate the effect of miR-155 depletion on molecular and tissue pathology. At age 4 months, we observed that the brains of miR-155-KO/APP/PS1 mice displayed larger, more frequent cortical amyloid plaques ( Figures 6 K and 6L) and a higher level of Aβ42 (p value: 0.03, Mann-Whitney test) as compared with APP/PS1 mice. We hypothesized that if HHV-6A was inhibiting the expression of miR-155, then this might cause the de-repression of certain miR-155 mRNA targets ( Figure 6 M). We generated RNA-seq on cortical samples comparing miR-155-KO versus wild-type mice to identify the differentially expressed genes (DEG, FDR < 0.1, Table S10 ) and found a significant overlap between upregulated DEGs, and the set of host genes we had identified as upregulated by HHV-6A (FDR < 0.03), suggesting that some component of our detected HHV-6A network is mediated through an effect on miR-155, or a viral ortholog of miR-155 (which has not been described for HHV-6A).

These findings show that computational deconvolution of RNA-seq in AD recapitulates the expected neuronal loss. The negative correlations between STG neuronal fraction and HHV-6A sequences (that are also at increased abundance in the STG), in combination with the findings of causal testing, and the association of the NLN cis-eQTLs with AD risk are consistent with HHV-6A exerting an effect on neuronal fraction in AD.

We were also interested to find that MIR155 Host Gene (MIR155HG) was within the HHV-6A NLN and associated with multiple low AD risk p value eQTLs. We had prioritized miR-155 during our analysis of the preclinical AD networks ( Figures 6 H and 6I, also Table S1 ) due to strong associations between its mRNA targets and a variety of multiregional AD and preclinical AD transcriptomic changes as well as “gained in preclinical AD” drivers. In addition to diverse associations with viral biology including EBV (), HHV-6A (), KSHV (), and MDV (), miR-155 has also been reported as a regulator of T cell response in AD (), a mediator of inflammation-induced neurogenic dysfunction and apoptosis (), and an effector of TREM2-APOE regulation of a microglial neurodegenerative phenotype (). Reports of viral tropism have also demonstrated that HHV-6 and HHV-7 can infect microglia (), and macrophages (). In the NLN, we found that HHV-6A suppressed miR-155, as described in recent reports of miR-155 inhibition by HHV-6A in infected T cells ().

The HHV-6A virus level NLN includes many genes with roles in cellular viral response, as well as associations to biological mechanisms that could help account for neuronal death, including upregulation of Poly(ADP-Ribose) Polymerase 1 (PARP1) by HHV-6A (and negative regulation of neuronal fraction by PARP1), consistent with its reported activation by EBV (), HIV (), and HSV-1 (), and induction of caspase-independent apoptosis in the latter (). AD risk-associated haplotypes for PARP1 have also been reported () and suggested as a mechanism mediating cell death following cytotoxic response (). The NLN gene with the lowest eQTL AD risk p values ( Figure 6 G) is N-acylethanolamine acid amidase (NAAA), a close homolog of Acid Ceramidase (AC), suggested to regulate neuronal apoptosis in AD ().

We applied causal inference testing to evaluate the hypothesis that abundance of HHV-6A exerts a significant effect on neuronal fraction in the STG and detected multiple instances consistent with the HHV-6A virus and the HHV-6A U3/U4 gene mediating such an effect ( Figures 6 E and 6F, Table S9 ). We reasoned that part of this effect might be through the direct impact of viral products on biological mechanisms underlying the neuronal loss; however, there might also be informative, indirect effects mediated through a host subnetwork (i.e., host genes regulated by HHV-6A, and which also causally regulate neuronal fraction). We iterated over the set of host genes we detected as regulated by each HHV-6A sequence in the STG, and for each gene, tested whether it also correlated with neuronal fraction, and whether the vQTL marker for that host gene was associated with neuronal fraction in the STG. We performed causal inference testing and detected genes that are regulated by HHV-6A virus (121 genes), HHV-6A unique region (86 genes), and HHV-6A U3/U4 (6 genes) and that exert an effect on neuronal fraction ( Figure 6 G, Table S9 ). To determine whether these “neuronal loss networks” (NLN) might overlap with AD GWAS risk loci, we used the VEGAS approach described above and found a significant enrichment within the cis-eQTL of the HHV-6A virus level NLN (FDR < 7e-3).

We found multiple correlations between cell fractions and viral RNA abundance ( Figure 6 C, Table S9 ), with associations detected in the APFC, STG, and IFG, collectively implicating all cell types with five viruses. Only a single virus (HHV-6A) was associated with cell fraction changes in multiple tissues (APFC and STG; the same two tissues in which HHV-6A was observed at increased abundance in AD), although the specific cell profiles were different between tissues. We were most interested in focusing on viruses or viral genes that are negatively correlated with neuronal fraction, particularly if they were also detected as differentially abundant in that same brain tissue ( Figure 6 D). Only HHV-6A met these criteria demonstrating strong negative correlation with neuronal fraction in the STG, as well as being highlighted in the viral RNA analyses of the MSBB, MAP, MAYO TCX cohorts, and meta-analysis.

The most consistent AD-associated change ( Table S8 ) was a significant (FDR < 0.1) decrease in neuronal fraction, and an increased endothelial cell fraction in the PHG, IFG, and STG, which we confirmed using an alternative cell signature enrichment method, xCell () ( Table S8 ). We observed strong negative correlation between neuronal fractions and CDR in all four brain tissues, and with Braak score and neuritic plaque density in three (PHG, IFG, STG) ( Figure 6 B, Table S8 ).

Progressive neuronal dysfunction and eventual death is a hallmark feature of AD and is the primary driver of the striking cortical atrophy associated with the disease. Diverse findings implicate aberrant regulation of apoptosis (), necroptosis (), and autophagy () although findings are conflicting and a unified understanding of the mechanisms underlying neuronal loss is lacking. Given the potential for viral infection to modulate cellular death pathways through these mechanisms and others (), we were interested in understanding whether viral abundance might be associated with specific brain cell-type fractions within our dataset, particularly neurons. We used CIBERSORT () to deconvolute each MSBB RNA-seq sample into estimated fractions for major brain cell types (neurons, astrocytes, microglia, endothelial cells, and oligodendrocytes), based on comparison with a reference panel of single-cell RNA-seq in cortical samples from neurologically normal, middle-aged adults () ( Figure 6 A). Our goal was to identify significant associations between estimated cell fractions, AD traits, and viral abundance.

(M) We hypothesized that HHV-6A inhibition of miR-155 might cause a disinhibition of miR-155 targets. Cortical RNA-seq of miR-155-KO versus WT mice demonstrated significant overlap between genes upregulated (FDR < 0.1) in the absence of miR-155, and host genes detected as upregulated by HHV-6A in the virus/host networks.

(H and I) We initially prioritized miR-155 during our analysis of the preclinical AD networks, due to strong associations between known miR-155 gene targets and a variety of multiregional AD and preclinical AD transcriptomic changes, as well as differential drivers of the preclinical AD network.

(G) Neuronal loss network of host genes regulated by HHV-6A, which also exerts an effect on neuronal fraction, including MIR155HG. Gene label size is inversely proportional to the smallest AD risk p value for its associated cis-eQTL.

To investigate the broader context of how virus-host interactions might overlap with AD genetic risk, we integrated cis-eQTL data with AD GWAS summary statistics ( Figures 5 E and 5F). Our hypothesis was that identifying virus-host networks that are enriched for AD GWAS loci could provide a natural means to prioritize the relevance of individual viruses to AD, as well as provide useful functional context for specific viruses. Our approach was to use cis-eQTL identified in any of the four brain tissues within the MSBB data, to define the set of markers that are associated with the host genes in each virus/host network (“virus network eQTLs”) and then determine whether virus network eQTLs are enriched for AD risk-associated loci () using the versatile gene-based association study (VEGAS) approach (). We observed that multiple viruses have host networks that are enriched for AD risk-associated eQTLs ( Figure 5 G), most consistently for HHV-6A, but also HCMV, HSV-1, Aravan, HHV-6B, and VZV. This suggests that loci that alter the expression of genes that regulate, or are regulated by specific viruses, are in aggregate associated with AD risk.

We evaluated the set of genes causally regulated by each virus, against a set of known AD-associated genes, including risk genes for early- and late-onset AD, as well as AD associated traits (such as β-amyloid plaque density, rate of disease progression, neurofibrillary tangle density) from multiple human genetics disease resources (). We found that multiple viruses interact with AD risk genes. HHV-6A stood out as notable with significant overlap (FDR < 3e-3) between the set of host genes it collectively induces across all tissues and AD-associated genes ( Figure 5 D, Table S7 ). This includes several regulators of APP processing and AD risk-associated genes, including gamma-secretase subunit presenilin-1 (PSEN1), BACE1, amyloid beta precursor protein binding family B member 2 (APBB2), Clusterin (CLU), Bridging Integrator 1 (BIN1), and Phosphatidylinositol Binding Clathrin Assembly Protein (PICALM). We also found that several other viruses regulate, or are regulated by, AD risk genes, including: (1) HAdV-C-induced expression of Complement Receptor 1 (CR1), and inhibition of Solute Carrier Family 24 Member 4 (SLC24A4), (2) inhibition of KSHV by Fermitin Family Member 2 (FERMT2), and (3) inhibition of HSV-2 by Translocase of Outer Mitochondrial Membrane 40 (TOMM40). These findings indicate multiple points of overlap between virus-host interactions and AD risk genes.

Host genes that are most commonly perturbed by viruses are shown ( Figure 5 C), ranked according to the number of unique viruses that we detected perturbing that gene across all four regions surveyed. This includes several genes implicated in regulation of APP processing and AD, including β-site amyloid precursor protein cleaving enzyme 1 (BACE1), FYN Proto-Oncogene, Src Family Tyrosine Kinase (FYN), and Peroxisome Proliferator Activated Receptor Gamma (PPARG). Several of these genes are also associated with pro- and anti-viral signaling, including positive regulation of interferon-λ1 genes by FYN during viral infection (), negative regulation of viral replication by PPARG (), and promotion of viral translation by SRSF6 ().

We generated virus-host gene regulatory networks to understand the biological context of viral activity across samples. We constructed informative models by detecting the set of host genes that are correlated with vQTL/virus pairs within each tissue, iterating over each candidate trio (each trio comprising a vQTL marker, virus feature abundance, and host gene expression), and then testing the mathematical conditions required to demonstrate causal mediation of an association between a DNA marker and a trait () ( Figure 5 A). We built tissue-specific virus-host gene networks for all detected viral features ( Table S6 ) identifying interactions in all four brain tissues ( Figure 5 B), comprising a total of 4,110 “virus to host” interactions, and 2,255 “host to virus” interactions, collectively associating 16 different viral species with 4,929 host genes. Only three viruses had detected interactions in all four tissues: HSV-1, HSV-2, and HHV-6A. Several viruses only had interactions detected in a single tissue: Aravan Virus, HCV-2, Variola Virus, and Wesselbron Virus, which may reflect higher regional tropism for the species or the potential for contamination to be driving the detection of these viruses in a subset of samples.

These results indicate a significant overlap between the genetic basis for AD traits, and the abundance of specific viral species and viral genomic features. This supports our broader hypothesis that viral activity plays a role in the development and progression of AD and is consistent with a role for HHV-6A in particular.

We employed the sequence kernel association testing () approach to understand whether vQTLs are genetic risk markers for AD status, clinical dementia rating or neuropathology (AD traits) ( Figure 4 D, Table S5 ). Iterating over each viral gene feature, we combined vQTL from all regions into a single set, and calculated a P value associating each vQTL set with each AD trait. We detected multiple viral features with significant associations to AD traits ( Figures 4 E and 4F), supporting the hypothesis that DNA variants that predispose to AD also predispose to abundance of key viral species—most notably HHV-6A. We also found multiple viral genes with vQTL sets that associate with multiple AD traits, including the HHV-6A unique region and the HSV-1 Neurovirulence protein ICP34.5.

We hypothesized that cis-eQTL associations could represent a potential mechanism linking vQTL with viral abundance, whereby a vQTL might alter the expression of a nearby host gene that has the potential to impact, or be impacted by viral abundance. We performed a cis-eQTL analysis across this same cohort, detecting significant associations (FDR < 0.1) between host gene expression, and markers within 1 MB of gene boundaries. Thirty-five percent of the unique vQTL markers (263 of 747) were associated with at least one cis-gene, including several with associations to AD and other dementias, for instance, rs4942746 is a vQTL for HSV-1 LAT and is also a cis-eQTL for Integral Membrane Protein 2B (ITM2B), an endogenous inhibitor of Aβ aggregation and associated with familial British () and Danish () dementia.

We identified 1,672 vQTL associations across the four regions assayed (APFC: 883, STG: 479, PHG: 175, IFG: 135). This represented 747 non-independent vQTL markers that collectively associate with 16 separate viruses. The viruses with the largest number of separate vQTL markers were human adenovirus-A (HAdV-A), HHV-6A, HSV-2, and HSV-1 (222, 103, 91, and 87 markers, respectively). The vQTL associated with the most viruses and regions ( Figure 4 B) (rs71454075) falls within the glycoprotein Mucin 6, Oligomeric Mucus/Gel-Forming gene (MUC6), expressed particularly in gastric epithelium, and with cytoprotective roles against pathogens, acids, and proteases (). The genes that collectively overlapped vQTLs across the most regions and viruses ( Figure 4 C) indicated biological themes that plausibly relate to individual variability in virome composition, including mucosal immunity (MUC6 and NTRK1), innate immunity, and anti-viral sensing (ISG20L2, MORC3, NTRK1, and TRIM7).

We identified host DNA markers () across all four brain regions with paired DNA and RNA samples (APFC: n = 174, STG: n = 86, PHG: n = 80, IFG: n = 147) that significantly associated with normalized viral abundance, for any viral species detected within that region ( Figure 4 Table S4 ). DNA markers with a permutation-based FDR < 0.25 were classified as vQTLs for that specific viral gene in the context of that region.

Given the prolonged preclinical course of AD, a primary question for us was whether these viral species represent a truly informative, causal component of AD or instead reflect an “opportunistic passenger” of a neurodegenerative process driven by other factors. To help address this, we integrated WES data with RNA-seq for each donor within the MSBB. Our goal was to identify host DNA variants that significantly associate with viral abundance, which we refer to as “viral quantitative trait loci” (vQTL). These vQTL might then be used in a causal inference paradigm () to resolve directed regulatory interactions in virus-host networks and, for the fraction of vQTLs that are also AD risk-associated, evaluate whether viral abundance is an authentic risk factor for AD. Causal inference approaches like this have been used to elucidate molecular networks impacted by DNA loci across biological contexts as diverse as cardiometabolic disease (), COPD (), and meditation ().

We extended our analysis to identify significant associations between virus level and viral gene level RNA abundance and AD-relevant clinical and neuropathological traits (“AD traits”) ( Figure 3 Table S3 ), including a consensus-based clinical dementia rating score () (CDR), multi-regional neuritic amyloid plaque density () (APD), and Braak and Braak score () (Braak score). We identified several viral genes that significantly associate (FDR < 0.1) with multiple AD traits, including the HHV-7 DR1 gene and HHV-6A unique region, both demonstrating a positive association with all three AD traits within the APFC.

We looked for evidence of viral DNA in whole-exome sequencing (WES) data that was generated on STG samples (n = 286) in the MSBB cohort, applying a similar procedure to that used in evaluating viral RNA abundance ( Figure S1 ). We detected viral DNA for multiple viruses, and identified an increased abundance of HHV-6A ( Figures 2 C and 2D, Table S2 ). This was primarily due to reads mapping to HHV-6A Region 8009-151234, which comprises the “unique” region of HHV-6A (and is consistent with our findings in the RNA sequences). Chromosomal integration of HHV-6A into host subtelomeric regions is well described (), occurring via a mechanism involving homologous recombination between telomeric repeats and the DR. Excision and reactivation of integrated HHV-6A is associated with preferential loss of the entire DR, facilitating viral circularization and rolling circle replication (). This may indicate that the HHV-6A DNA that we find as more abundant in AD reflects HHV-6A that has undergone reactivation from a chromosomally integrated form, although we have not evaluated this directly.

We utilized the PA and PSP samples available within the MAYO TCX cohort to perform comparisons against these additional neurodegenerative disorders, to understand whether our findings were specific to AD or perhaps reflect a more general feature of neurodegenerative processes. In a comparison of AD versus PA, we found an increased abundance of HHV-6A and HHV-7 ( Figure S2 ). When we compared AD versus PSP, we found increased HHV-7 in AD, but reduced HHV-6A. Taken together, these observations suggest that elevated HHV6-A and HHV-7 are not ubiquitous features of neurodegenerative disease, although HHV-6A may also be relevant to other diseases such as PSP.

To understand whether our observations of altered viral abundance in AD would also be preserved in additional cohorts, we incorporated post-mortem brain RNA-seq data from three additional, independent consortium studies: (1) Religious Orders Study(ROS), a longitudinal clinico-pathological study comprising 300 samples from the dorsolateral prefrontal cortex (DLPFC) of individuals with AD and healthy controls, (2) Memory and Aging Project () (MAP) comprising 298 samples from the DLPFC of individuals with AD, as well as healthy controls, and (3) Mayo Temporal Cortex () (MAYO TCX) comprising 278 samples from the temporal cortex of individuals with AD, pathological aging (PA) (), progressive supranuclear palsy (PSP), and healthy controls. Our goal was to process these additional samples using the procedure described above and integrate the results ( Table S2 ) into a meta-analysis of viral abundance in AD ( Figures 2 E and 2F). Our main finding was a consistently increased abundance of HHV-6A and HHV-7, driven mainly by the “unique” region of each virus (the full viral sequence excluding the ∼15 kb flanking DRand DRterminal repeats). We also observed an increased abundance of the HSV-1 latency-associated transcript (LAT), supporting an increased rate of HSV-1 infection (latent or otherwise) in AD. These additional validations support the main findings originally observed in the MSBB cohort, that multiple viruses are at increased abundance in AD, across multiple brain tissues, with prominent roles for Roseoloviruses HHV-6A and HHV-7.

We identified differential abundance of multiple viral species in the APFC and STG ( Figure 2 A, Table S2 ). The most consistent difference we saw was an AD-associated increase in the abundance of two closely related Roseoloviruses, HHV-6A and HHV-7, across the APFC and STG. The third Roseolovirus, HHV-6B had a discordant profile with increased abundance in AD in the STG, and reduced in the APFC. Viral gene-level differential expression ( Figure 2 B, Table S2 ) identified increased abundance across the APFC and STG regions of the HHV-6A U3/U4 genes (positional homologs of human cytomegalovirus [HCMV] US22, with transactivating effects on other viral species []) and the HHV-7 direct repeat terminal gene, DR1.

Multiregional comparison between AD versus controls of viral RNA and DNA, summarized to the level of full viral sequences (A and C) and viral genomic features (B and D). Meta-analysis of differential abundance of viral RNA in AD (E and F), incorporating post-mortem brain RNA-seq data from the MSBB, ROS, MAP, and MAYO TCX consortium studies also revealed increased HHV-6A and HHV-7 in AD. p values shown in cells with FDR < 0.1, features with a meta-analysis FDR < 0.1 shown in (E) and (F).

We estimated viral abundance at two levels. We summarized RNA reads to the level of the entire viral sequence, with the aim of estimating “total viral transcription.” We then summarized the RNA reads to the level of individual genomic features based on counting reads that overlap any of the genomic features included in the NCBI annotations for that viral sequence (). Throughout this study, we have evaluated viral abundance according to these two levels separately.

To evaluate differential viral abundance in AD, we initially performed RNA-seq on samples from a cohort of AD and Control brains from the Mount Sinai Brain Bank (MSBB) and quantified differences in viral sequences between groups. We began by profiling MSBB sample transcriptomes across four brain regions, (superior temporal gyrus [STG], n = 137, anterior prefrontal cortex [APFC], n = 213, inferior frontal gyrus [IFG], n = 186, and parahippocampal gyrus [PHG], n = 107), which we used to quantify the presence and abundance of 515 viral species known or suspected to infect humans as a primary host (). We applied a viral mapping approach ( Figure S1 ), based on a modified ViromeScan workflow (), optimized for detection specificity, rather than sensitivity to allow us to discriminate between viruses with highly homologous regions, and to ensure we were not falsely including human-derived transcripts when summarizing viral abundance.

Patterns of miRNA target enrichments identified by the differential network analysis of preclinical AD versus CON, as well as multiregional differential gene expression changes in clinical AD samples, offered an additional line of evidence for virally mediated network activity. We looked for significant overlap between experimentally validated gene targets of human miRNAs () and multiregional differential gene expression signatures from preclinical AD samples (), clinical AD samples (), as well as “Lost in preclinical AD” and “Gained in preclinical AD” drivers. This identified a number of miRNA with gene networks overlapping many of these AD contexts, particularly miR-155, a multifunctional miR with associations to malignancy, innate immunity, and DNA virus activity. Interactions between miR-155 and viral biology are well established, including perturbation of miR-155 by EBV to stabilize viral latency (), inhibition of miR-155 by HHV-6A, () and coding of a miR-155 functional ortholog by Kaposi’s sarcoma-associated herpesvirus (KSHV) () and Marek’s Disease Virus (). Considering these findings, we sought to directly evaluate viral DNA and RNA sequences in the context of clinical AD. We performed this investigation using four large, independent multiomic datasets from individuals with clinical AD as well as neuropathologically and cognitively normal controls.

Identification of strong C2H2-TFs and G4 functional patterns in the differential network analysis suggested a potential role for virus-mediated network activities in AD ( Figure 1 D, see Table S1 ). Enrichments among the network drivers (gained and lost) in preclinical AD implicated viral infection susceptibility risk genes, as well as host differential gene expression changes associated with viral infection. We also noted findings around C2H2-TF and G4 sequences that are implicated in a range of proviral and antiviral contexts, including SP1: (1) binding with Epstein-Barr Virus (EBV) protein Rta to regulate host and viral SP1 target genes (); (2) regulating Human Immunodeficiency Virus 2 (HIV-2) LTR transcription (); and (3) mediating antiviral effects against Human Cytomegalovirus (HCMV) (). In addition, G4 sequences are recognized as important regulatory features for viral pathogens such as EBV and HSV-1(), dynamically impacting translation of viral proteins ().

We constructed, mapped, and compared differences between distinct gene regulatory networks to investigate functional molecular changes underlying the etiology of AD ( Figure 1 ). We used laser-captured neuronal gene expression data to construct probabilistic causal networks (PCN) representing preclinical AD and also healthy control (individuals without cognitive impairment or neuropathology) () states. We focused on samples derived from brain regions associated with the most profound neuronal loss: the entorhinal cortex () (EC) and the hippocampus () (HIP). We built each PCN using a modified “inductive causation with latent variables” procedure (), constructing separate preclinical AD and control (CON) networks including paired samples from EC and HIP for each donor, and for each network, nominated genes that regulate the expression of unexpectedly large subnetworks as “network drivers” (see STAR Methods and Table S1 ). Edge reproducibility in PCNs has been demonstrated to be strongly dependent on sample size; however, detection of highly connected nodes is more robust across a range of sample sizes (). We therefore focused initially on characterizing the set of drivers present only in the CON network (“Lost in preclinical AD”) and those present only in the preclinical AD network (“Gained in preclinical AD”) as a means to prioritize differences between preclinical AD and CON states. We found that promoters of both sets of drivers are strongly enriched (compared with the rest of the network) for a shared set of C2H2 zinc finger transcription factor (C2H2-TF) binding motifs ( Figure 1 E), especially SP1, MAZ, NRF1, and EGR1. This prompted us to evaluate other co-regulatory features associated with C2H2-TF activity that might explain a general shift in their collective activity. We found that G-quadruplex (G4) sequences are strongly enriched among the promoters of both sets of drivers in the network, but that the “Lost in preclinical AD” drivers have significantly more G4 motifs within their introns, exons, and 3′-UTR on both coding and non-coding strands ( Figures 1 F and 1G). We concordantly found a strong negative correlation between gene G4 density (G4 motif count, normalized by gene length), and gene expression in the EC in preclinical AD and clinical AD samples ( Figure 1 H). Given the complex roles for G4 in dynamically regulating mRNA transcription, stability, translation, and localization (), we hypothesized that a global shift in G4 regulation or stability could explain the differences in C2H2-TF regulatory programs—for example, changes in expression and network influence of genes with especially high G4 density in locations like the 3′-UTR that are associated with alternative polyadenylation and miRNA-mediated regulation ().

(I–L) To investigate the possibility of an AD-associated virome, we incorporated clinical, neuropathological, RNA, DNA, and proteomic data from individuals with clinical AD (and controls) (I) across four brain regions to identify and characterize viral activity associated with AD biology (J and K), highlighting Roseoloviruses HHV-6A and HHV7 (L).

(F and G) G-quadruplex sequence motifs were strongly enriched in the promoters of the drivers “lost in preclinical AD,” and those “gained in preclinical AD” (F); however, drivers “lost in preclinical AD” had much higher G-Quadruplex density in introns, exons, and 3′-UTR (G).

(B–E) Regulatory networks built across EC and HIP samples showed differences in gene drivers for preclinical AD versus control networks (B) including many AD associated genes (C), although drivers exclusive to each network shared functional characteristics, such as association with viral biology (D), and promoter enrichments for the same C2H2 zinc finger transcription factors binding motifs (E).

(A) This study was performed in two phases, an initial exploration into the earliest network changes associated with preclinical AD, which identified multiple shifts in network biology consistent with viral perturbations and prompted a systematic, multiomic evaluation of viral biology on larger datasets focused on clinical AD.

Discussion

Zhang et al., 2013 Zhang B.

Gaiteri C.

Bodea L.G.

Wang Z.

McElwee J.

Podtelezhnikov A.A.

Zhang C.

Xie T.

Tran L.

Dobrin R.

et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Developing a sophisticated understanding of the causal basis for AD is complicated by its protracted preclinical course, and the inability to routinely sample brain tissue. Distinguishing the earliest drivers of disease from the “opportunistic passengers” of a multi-decade neurodegenerative process is especially formidable given the profound changes in transcriptomic, proteomic, and histopathological profiles ().

We report a multi-stage study that aims to reconcile nascent changes in preclinical AD with findings made in the context of AD. Our strategy began by examining transcriptomes from brain regions that undergo the earliest changes in AD with the goal of identifying novel biology that could offer a frame for understanding the more dramatic changes seen in later stages of AD. Exploration of the preclinical AD networks sensitized us to pervasive dysregulation of C2H2-TF, G4 sequences, and a possible role for viral perturbations in driving network changes. This informed a focused evaluation of viral perturbations in clinical AD. We examined four, large multi-omic datasets that included next-generation sequence data that enabled direct examination of viral sequences. We observed the presence of many viral species in the aging brain and linked multiple viral species with AD biology, including regulation of AD genetic risk networks, AD gene expression changes, and association with clinical dementia rating and neuropathology burden. We found prominent roles for Roseoloviruses HHV-6A and HHV-7, both implicated across multiple domains, and in 3 independent cohorts. Importantly, the inclusion of the MAYO TCX dataset allowed us to perform comparisons between AD and other neuropathological controls. These additional comparisons suggest that HHV-6A and HHV-7 are not ubiquitous features of neuropathology and appear at least partly specific to AD. Comparison against additional neuropathological diseases and brain regions would help clarify this specificity, for instance although PSP is associated with accumulation of cortical neurofibrillary tangles, the most severe manifestations are typically seen throughout the basal ganglia, brainstem, and cerebellar structures, regions that were not profiled in this study, yet which might harbor different viral species and abundances than those observed.

Lin et al., 2002 Lin W.R.

Wozniak M.A.

Cooper R.J.

Wilcock G.K.

Itzhaki R.F. Herpesviruses in brain and Alzheimer’s disease. Carbone et al., 2014 Carbone I.

Lazzarotto T.

Ianni M.

Porcellini E.

Forti P.

Masliah E.

Gabrielli L.

Licastro F. Herpes virus in Alzheimer’s disease: relation to progression of the disease. Agostini et al., 2016 Agostini S.

Mancuso R.

Baglio F.

Cabinio M.

Hernis A.

Guerini F.R.

Calabrese E.

Nemni R.

Clerici M. Lack of evidence for a role of HHV-6 in the pathogenesis of Alzheimer’s disease. Additional focused sequencing efforts are required to gain further resolution on the role of HHV-6 and HHV-7 in AD. Previous reports suggest an increased prevalence of HHV-6 in AD based on PCR of post-mortem brain tissue () and seroprevalence (), though subsequent serological studies did not find increased HHV-6 seropositivity (). Importantly, not all of the methods employed in these studies distinguished between HHV-6A and HHV-6B, which may account for some discrepancy. We found considerable inter-regional variability in AD-associated viral abundance, finding differences in four of the six regions assayed. Additionally, we found a discordant profile for HHV-6B (increased in STG, reduced in APFC), indicating the importance of distinguishing HHV-6B from HHV-6A (increased in the STG, APFC, DLPFC, and TCX). Given the near universal seropositivity for HHV-6 in the general population, seropositivity is likely too non-specific to reliably distinguish AD-relevant states of viral activity within the brain.

The miR-155 network offers a targeted area for further investigation of viral activity and neuronal loss in AD. We first identified miR-155 during investigation of the preclinical AD networks and later identified that HHV-6A negatively regulates neuronal fraction in the STG and that the cis-eQTLs of the host genes that mediate this are enriched for AD risk loci, leading to the re-emergence of miR-155 (via MIR155HG) in our analysis. Our finding that 4-month-old miR-155-KO x APP/PS1 mice develop increased cortical amyloid plaque density and increased levels of Aβ oligomers provides further evidence linking miR-155 to AD pathology. These findings support the view of miR-155 as a regulator of complex anti- and pro-viral actions, offer a mechanism linking viral activity with AD neuropathology, and support the hypothesis that viral activity contributes to AD.

The integrated findings of this study suggest that AD biology is impacted by a complex constellation of viral and host factors acting across different timescales and physiological systems ( Figure 8 B). This includes host mucosal defense and modulation of innate immune response by virus and host. It also includes disturbance of core biological processes, including some that are well described in AD (e.g., APP processing, cytoskeletal organization, mitochondrial respiration, protein synthesis, and cell-cycle control) and some that are less well characterized (e.g., widespread shifts in G4 activity and C2H2-TF regulatory programs). We note potential mechanisms (and candidate molecular mediators) that we find perturbed by viral species and that have known impacts on these altered processes, for instance, virally driven changes in protein synthesis machinery, tRNA synthetase activity, and nucleotide pool maintenance, which collectively exert complex effects on G4 regulation and C2H2-TF activity.

Our interpretation of the changes seen in the preclinical AD networks rests partly on the true disease relevance of neuropathology in the absence of cognitive impairment. We cannot readily discriminate molecules involved in disease progression from molecules that are responsible for resilience and for maintaining brain function in the face of advanced AD pathology. Despite the uncertainty around the eventual health trajectory of these donors, our reasoning was that by that conditioning our analysis on changes in AD vulnerable brain regions, we might still find instructive biological themes, even if uncertainty remained around whether those changes were disease associated, opportunistic, or somehow adaptive. Despite this reasoning, and supportive circumstantial evidence (e.g., miR-155), we have not yet confirmed in an equivalent dataset whether the viral findings associated with clinical AD have predecessors in the preclinical AD context.

Tanaka-Taya et al., 2004 Tanaka-Taya K.

Sashihara J.

Kurahashi H.

Amo K.

Miyagawa H.

Kondo K.

Okada S.

Yamanishi K. Human herpesvirus 6 (HHV-6) is transmitted from parent to child in an integrated form and characterization of cases with chromosomally integrated HHV-6 DNA. Prusty et al., 2017 Prusty B.K.

Gulve N.

Rasa S.

Murovska M.

Hernandez P.C.

Ablashi D.V. Possible chromosomal and germline integration of human herpesvirus 7. Investigating the subcellular distribution of viral DNA, especially for the key species of interest (HHV-6A and HHV-7) would add valuable context to these findings. Unlike most viruses discussed in this study, HHV-6 () and HHV-7 () can integrate into subtelomeric regions of host chromosomes during latent infection and are excised into an episomal form during lytic infection and replication. Characterizing the extent and distribution of integrated versus episomal Roseolovirus in AD would be an important step in further understanding the mechanisms that connect viral abundance with molecular aspects of AD biology and would have implications for therapeutic targeting of latent viral reservoirs relevant to AD.

It is important to note that the findings reported in this study are not sufficient to definitively demonstrate that viral activity causally contributes to the onset or progression of AD, which would be most naturally established in a prospective, intervention-based study. We do report on multiple streams of indirect evidence, however, that enabled us to partially address this with the available data, including: (1) causal inference testing that supports a role for HHV-6A in contributing to neuronal loss in AD, (2) AD GWAS risk loci enrichments in virus-host network eQTLs, (3) emergence of molecules such as miR-155 from preclinical AD networks and virus-host networks, and (4) relative specificity of HHV-6A and HHV-7 for AD, compared with other neurodegenerative diseases. Follow-up studies that evaluate the onset and progression of AD phenotypes in virally infected AD model systems would be one approach to better delineate the causal and mechanistic relationships that link pathogen activity with the evolution of AD-associated behavioral, molecular, and neuropathological changes.

In summary, we find evidence that links the activity of specific viral species with molecular, genetic, clinical, and neuropathological aspects of AD. Interpretation of these findings in light of the disturbances in G4 and C2H2-TF regulation in the preclinical AD samples that prompted our evaluation of viral activity is supportive of an important role for viral activity, especially Roseoloviruses HHV-6A and HHV-7, in the development and progression of AD.