Human subjects

All studies were conducted following approval by the Institutional Review Board of UConn Health Center (IRB Number: 14-194J-3). Following informed consent, blood samples were obtained from 172 healthy volunteers residing in the Greater Hartford, CT, USA region recruited by the UConn Center on Aging Recruitment and Community Outreach Research Core (http://health.uconn.edu/aging/research/research-cores/). For older adults 65 years and older, recruitment criteria were selected to identify individuals who are experiencing “usual healthy” aging and are thus representative of the average or typical normal health status of the local population within the corresponding age groups5 Selecting this type of cohort is in keeping with the 2019 NIH Policy on Inclusion Across the Lifespan (NOT-98-024)39, increasing the generalizability of our studies and the likelihood that these findings can be translated to the general population40. Subjects were carefully screened in order to exclude potentially confounding diseases and medications, as well as frailty. Individuals who reported chronic or recent (i.e., within two weeks) infections were also excluded. Subjects were deemed ineligible if they reported a history of diseases such as congestive heart failure, ischemic heart disease, myocarditis, congenital abnormalities, Paget’s disease, kidney disease, diabetes requiring insulin, chronic obstructive lung disease, emphysema, and asthma. Subjects were also excluded if undergoing active cancer treatment, prednisone above 10 mg day, other immunosuppressive drugs, any medications for rheumatoid arthritis other than NSAIDs or if they had received antibiotics in the previous 6 months. Beyond these steps to exclude specific chronic conditions we also undertook further additional efforts to exclude older adults with any significant frailty. Since declines in self-reported physical performance are highly predictive of frailty, subsequent disability and mortality41, all subjects were also questioned as to their ability to walk ¼ mile (or 2–3 city blocks). For those who self-reported an inability to walk ¼ mile41, the “Timed Up and Go” (TUG) test was performed and measured as the time taken to stand up from the sitting position, walk 10 feet and return to sitting in the chair42. Scoring TUG > 10 s was considered an indication of increased frailty and resulted in exclusion from the study43. Information on medications in Table S1 illustrates that as expected medication usage did increase with age. Nevertheless, these medications all reflected their use for common and controlled chronic conditions unlikely to influence our findings such as hypertension, hyperlipidemia, hypothyroidism, degenerative joint disease, seasonal allergies, headaches, atrial fibrillation, depression, anxiety, or ADHD (attention deficit hyperactivity disorder). Finally, smoking history data are not typically collected in these studies—including ours—since smoking is a rare habit among older adults.

Ethics

The study was conducted following approval by the Institutional Review Board of UConn Health Center (IRB Number: 14-194J-3). All study participants provided written informed consent at baseline using institutional review board-approved forms. Individual-level human genomic data (ATAC-seq and RNA-seq) are shared in a federal controlled-access database through dbGaP. Our study complies with Tier 1 characteristics for “Biospecimen reporting for improved study quality” (BRISQ) guidelines.

Flow cytometry data generation and analyses

PBMCs were isolated from fresh whole blood using Ficoll-Paque Plus (GE) density gradient centrifugation. For the analysis of the frequencies of Naive T cells (CD45RA+CCR7+), Central Memory T cells (CM; CD45RA−CCR7+), Effector Memory T cells (EM; CD45RA−CCR7−), and Effector Memory RA (EMRA; CD45RA+CCR7−), B cells and Monocytes, PBMCs were stained with fluorochrome-labeled antibodies specific for CD3 (UCHT1) Biolegend-1:100 (cat# 300436), CD4 (RPA-T4) Biolegend 1:80 (cat# 558116), CD8 (SCFI21Thy2D3) Beckman Coulter 1:80 (cat# 6604728), CD45RA (HI100) BD biosciences 1:80 (cat# 560674), CD19 (HIB19) BD biosciences 1:100 (cat#-555415), CD14 (MSE2) BD biosciences 1:80 (cat# 557923), CCR7 (150503) BD biosciences 1:20 (cat# 561271). The stained cells were acquired with BD Fortessa and analyzed with FlowJo software (TreeStar). Flow data is analyzed using GLM to quantify the association between cell proportions and age, sex, and their interaction (age and sex). For the analyses of major cell populations, we used age (continuous variable) as a covariate, whereas for T cell subsets we used age group (old vs. young) as a variable. We excluded one outlier individual (subject 104) from downstream analyses.

ATAC-seq library generation and processing

ATAC-seq44 data was generated from fifty thousand unfixed nuclei using Tn5 transposase (Illumina, Nextera DNA sample prep kit) for 30 min at 37 °C. The resulting library fragments were purified using Qiagen MinElute kit (Qiagen). Libraries were amplified by 10–12 PCR cycles, purified using a Qiagen PCR cleanup kit (Qiagen), and finally sequenced on an Illumina HiSeq 2500 with a minimum read length of 75 bp to a minimum depth of 30 million reads per sample. At least two technical replicates (average = 2.4 replicates) were processed per biological sample. Table S2 summarizes the depth, peak number, and fragments in reads (FrIP) scores for ATAC-seq samples. ATAC-seq sequences were quality-filtered using trimmomatic45, and trimmed reads were mapped to the GRCh37 (hg19) human reference sequence using bwa-mem46. After alignment, technical replicates were merged and all further analyses were carried out on these merged data. For peak calling, MACS247 was used with no-model, 100 bp shift, 200 bp extension, and bampe option. Only peaks called with a peak score (q-value) of 1% or better were kept from each sample, and the selected peaks were merged into a consensus peak set using Bedtools multiinter tool48. Only peaks called on autosomal chromosomes were used in this study. We further filtered consensus peaks to avoid likely false positives by only including those peaks overlapping more than 20 short reads in at least one sample, and peaks for which the maximum read count did not exceed 500 counts per million (cpm) to account for regions that are potential artifacts. Finally, we excluded peaks overlapping ENCODE blacklisted regions downloaded from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeMapability/. An additional quality-control step was developed to filter out samples with a consistently poor signal, consisting of an algorithm to discover and characterize a series of relatively invariant benchmark peaks, defined as a set of peaks expected to be called in all samples. Samples that consistently miss calls for a significant portion of these benchmark peaks are flagged as having poor quality. A benchmark peak is defined based on three criteria, namely (1) that it remains approximately invariant between the two groups of interest (i.e., young and old samples), (2) that it captures a substantial number of reads, and (3) that it is called in most samples. For each peak, the absolute value of the log of the ratio of healthy old to healthy young mean normalized read counts (log fold change, logFC) was used to assess the first criteria, whereas the maximum read count over all samples (maxCt) is used to assess the second one. In this study, a peak was considered apt for benchmarking when (1) its absolute logFC was in the bottom quartile of the distribution over all peaks, (2) its maxCt was in the top decile of the distribution over all peaks, and (3) the peak was called in at least 90% of the samples. Using these parameters, 640 (out of 86,300) peaks were selected as benchmark; only samples for which at least 92.5% of these peaks were called were selected for analyses, which excluded 8 samples from further analyses. We examined the effects of each of these parameter choices and found that the same samples were consistently chosen as poor quality for a range of values chosen to assess the benchmark criteria. After re-applying the peak selection criteria to the remaining 100 samples, we arrived at a peak count of 86,145 peaks. Among the samples that passed the QC step, we studied the distribution of FRIP scores, depth of sequencing, and peak numbers (that are correlated measures), which revealed a wide range of values. Neither of these measures were correlated significantly with age in linear regression models (FRIP: Pearson r = 0.11, p = 0.29; Depth: Pearson r = 0.081, p = 0.42; Peak number: Pearson r = 0.1, p = 0.32). However, on the average samples from men had higher values compared to samples from women (FRIP: 0.218 vs. 0.178; Depth: 103 M vs. 72 M; Peak number: 37,046 vs. 29,170). To account for these differences, prior to statistical analyses, ATAC-seq read counts were normalized to each sample’s effective library size (i.e., the sum of reads overlapping peaks) using the trimmed mean of M-values normalization method (TMM)49. In addition, we used the effective library size and significant surrogate variables as co-variates in differential analyses (see Differential analyses for details). Accordingly, we noted that SV1 in these analyses correlate significantly with library depth (Pearson r = −0.68 p = 6.8e−15) and number of peaks (Pearson correlation r = −0.31, p = 0.0016) in old-young comparisons and with library depth (Pearson r = −0.56, p = 1.7e−09) and FRIP score (Pearson r = 0.2, p = 0.044) in men-women comparisons.

RNA-seq library generation and processing

Total RNA was isolated from PBMCs using the Qiagen RNeasy (Qiagen) or Arcturus PicoPure (Life Technologies) kits following manufacturer’s protocols. During RNA isolation, DNase I treatment was additionally performed using the RNase-free DNase set (Qiagen). RNA quality was checked using an Agilent 2100 Bioanalyzer instrument, together with the 2100 Expert software and Bioanalyzer RNA 6000 pico assay (Agilent Technologies). RNA quality was reported as a score from 1 to 10, samples falling below threshold of 8.0 being omitted from the study. cDNA libraries were prepared using either the TruSeq Stranded Total RNA LT Sample Prep Kit with Ribo-Zero Gold (Illumina) or KAPA Stranded mRNA-Seq Library Prep kit (KAPA Biosytems) according to the manufacturer’s instructions using 100 ng or 500 ng of total RNA. Final libraries were analyzed on a Bioanalyzer DNA 1000 chip (Agilent Technologies). Paired-end sequencing (2 × 100 bp) of stranded total RNA libraries was carried out in either Illumina NextSeq500 using v2 sequencing reagents or the HiSeq2500 using SBS v3 sequencing reagents. Quality control (QC) of the raw sequencing data was performed using the FASTQC tool, which computes read quality using summary of per-base quality defined using the probability of an incorrect base call50. According to our quality criteria, reads with more than 30% of their nucleotides with a Phred score under 30 are removed, whereas samples with more than 20% of such low-quality reads are dropped from analyses. Benchmarking is also applied on RNA-seq data using the same benchmark parameters as ATAC-seq, which resulted in 304 benchmark genes, none of the RNA-seq samples were dropped due to poor quality. Reads from samples that pass the quality criteria were quality-trimmed and filtered using trimmomatic45. High-quality reads were then used to estimate transcript abundance using RSEM51. Finally, to minimize the interference of non-messenger RNA in our data, estimate read counts were re-normalized to include only protein-coding genes. Table S2 summarizes the quality control measures for our PBMC RNA-seq samples.

Differential analysis

To identify differentially open chromatin regions from ATAC-seq and differentially expressed genes from RNA-seq data, the R package edgeR was used to fit a GLM to test for the effect of aging between healthy young and healthy old samples by sex, as well as the effect of sex by age group. In addition to sex and age group (old vs. young), our models included the base-2 log of effective library size to ensure peakwise normalization. We isolated a batch effect correlated to time period whereby samples were collected and libraries were prepared, and used ComBat to adjust the data for this effect. Finally, we used surrogate variable analysis (SVA52) to capture unknown sources of variation (e.g., localized batch effects, subject-level heterogeneity, variation in library preparation techniques) statistically independent from age group assignments. SVA decomposes the variation that is not accounted for by known factors like age group or sex, into orthogonal vectors that can then be used as additional covariates when fitting a model to test for differential accessibility or expression. Using the built-in permutation-based procedure in the R package sva, we choose to retain one SV to include as covariate in the GLM model for PBMC ATAC-seq and none for RNA-seq data analyses53. GLM models where implemented using a negative binomial link function, including both genome-wide and peak-specific dispersion parameters, estimated using edgeR’s “common,” “trended,” and “tagwise” dispersion components, calculated using a robust estimation option. Benjamini-Hochberg p-value correction was used to select differentially open peaks at a false discovery rate (FDR) of 5%. To generate a set of model-adjusted peak estimates of chromatin accessibility (i.e., batch-, and SV-adjusted) for downstream analyses and visualization, we used edgeR to fit a “null” model excluding the sex and age group factor, and then subtracted the resulting fitted values from this model from the original TMM-normalized reads.

Peak annotation and downstream analyses

Multiple data sources were used to annotate ATAC-seq peaks with regard to functional and positional information. HOMER54 was used to annotate peaks as “promoter” (i.e., within 2 kb of known TSS), “intergenic”, “intronic”, and other positional categories. For functional annotation of peaks, we used a simplified scheme integrating public chromatin states calculated for major PBMC subpopulations with ChromHMM from Roadmap Epigenomics20, Blueprint Epigenome, and a third reference data55: First, we intersected the chromHMM-generated states with our set of consensus peaks, and solved conflicting cases where multiple chromatin states overlap the same ATAC-seq peak so that each peak was assigned a single annotation, according to the following priority rules: if a peak overlaps both an active TSS and enhancer region, which state takes priority depends on whether the peak is proximal (i.e., within 1000 bp of the nearest TSS), in which case it is annotated as a promoter, or distal (distance to nearest TSS greater than 1000 bp), when it is annotated as an enhancer instead. For all other states, rules apply as follows: Active Enhancer > Genic Enhancer > Bivalent TSS > Weak Enhancer > Bivalent Enhancer > PolyComb repressed > TSS Flanking > Transcription > ZNF Genes and repeats > Heterochromatin > Quiescent/Low signal. Then, to facilitate interpretation and visualization, we simplified the original sets of chromatin states to a scheme with 6 pooled meta-states, namely (1) TSS, collecting active, flanking, and bivalent TSS states; (2) Enhancer, pooling active, weak and bivalent enhancer states; (3) Repressed PolyComb, combining both weak and strong PolyComb states; (4) Transcription, including both weak and strong transcription states, (5) the quiescent chromHMM state; and (6) other states (ZNF, heterochromatin) combined together. To annotate peaks as cell-specific for a given subset obtained from one of the three datasets listed above, we determined for each peak whether it was annotated as an active promoter or an active enhancer in a single-cell population or lineage, and in such cases labeled the peak accordingly as cell- or lineage specific. For example, if a peak is annotated as an active enhancer in both naive and memory CD4+ T cells but as another state (e.g., repressed) in every other subset, then the peak would be considered CD4+ T-specific. For gene-based analyses, HOMER was used to assign each ATAC-seq peak to the nearest TSS, as measured from the peak center. To improve confidence on these annotations, gene-based analyses were further restricted to include only peaks located within 100 kb of their corresponding TSS. ATAC-seq peaks were also annotated using gene sets provided by curated immune function-related co-expression modules13. These gene sets comprise 28 modules defined from multiple compiled transcriptomic data sets, which were originally annotated based on expert knowledge of representative functions of the gene ensemble in each module. In this study, we have preserved and used these annotations to test for enrichment of these modules in gene sets of interest, such as the set of genes associated to chromatin peaks gaining or losing accessibility with aging. We assessed enrichment using the hypergeometric test followed by Benjamini-Hochberg FDR adjustment for P-values. Further functional enrichment analyses were carried out using Wikipathways pathways56, immune modules22, gene sets from DICE database10 and single-cell RNA-seq data in PBMCs. Gene sets from PBMC scRNA-seq data is driven from one-vs-all cell cluster comparisons. First, we identified 17 clusters from PBMC scRNA-seq data (n = 26 samples) using the Louvain clustering in the ScanPy toolkit57. These clusters were manually inspected and assigned to different cell types by studying the expression of known marker genes. For each cluster, differentially expressed genes were identified using one-vs-all approach based on T-test followed by FDR. Marker genes for a cluster (or cell population) were found by using FDR ≤ 5% and logFC > log2(1.25) using differential analyses results. Similarly, marker genes from the DICE database are obtained using their differential analyses results and the same cutoffs for cell-specific genes (FDR ≤ 5% and logFC > log2(1.25)). Gene sets used in our differential analyses are provided as a supplementary table (Table S14). Similarly, peak sets from ChromHMM states used in the enrichment analyses are provided as a supplementary table (Table S15). For each gene set, we tested for enrichment using the hypergeometric test, against a background defined by the set of genes that are expressed, as determined by RNA-seq data, or potentially expressed, as given by promoter accessibility, in PBMCs. We used the Benjamini-Hochberg FDR multiple test correction to assess significance of hypergeometric p-values.

ATAC-seq and RNA-seq comparisons

Gene expression (RNA-seq, see above) data were generated for a subset of subjects with ATAC-seq profiles (summarized in Table S1). These data were normalized to protein-coding transcripts, and annotated to ENSEMBL GRCh37 gene symbols. Genes for which at least three normalized reads per million were obtained in at least two samples were considered as expressed, all others removed prior to analysis. This resulted in a total estimate of 14,157 expressed genes in PBMCs. We built a data set comprising paired ATAC-seq and RNA-seq samples by matching promoter peaks to nearest gene (TSS) annotations, defining promoters as the regions within 1000 bp flanks of each TSS. Whenever multiple expressed genes were matched to the same promoter peak, the peak with the maximum fold change was chosen for visualization.

Inferring chronological aging trends

We used ARIMA models as implemented in R on age-ordered normalized chromatin accessibility and expression data separately for each sex, to select genomic features (ATAC-seq peaks or mRNA-seq transcripts) to uncover significant chronological trends, as opposed to unstructured “noise” fluctuations undistinguishable from stationary data. Input data from all subjects in the study (see Table S1) were normalized as described above in “Differential Analyses”, i.e., corrected for known batch effects using ComBat, and model-adjusted to correct for library size and unknown batch effects (i.e., surrogate variables) using EdgeR. Thus, ARIMA-modeled data contains variation due to age, sex, and error variance unaccounted for by batch and library size effects. For model fitting, genomic features were converted into z-ordered objects (using R package zoo) after averaging values obtained for the same sex and chronological age (in years), and into time series objects. Formatted time series were then analyzed using auto.arima (from the forecast R package), using stepwise search and Akaike Information Criterion (AIC) to select the best model. This algorithm scans a wide range of possible ARIMA models and selects the one that satisfies the optimality criterion, i.e., the smallest AIC. Only non-seasonal models with a maximum difference (D) of 2 were considered in this study, such that a non-stationary trend the data can be modeled after integrating a stationary series once or twice. Overall, best-fitting ARIMA models based on chromatin accessibility or gene expression data included models with 0–7 AR, 0–5 MA, and 0–7 AR + MA parameters for accessibility and 0–4 AR, 0–3 MA, and 0–4 AR + MA parameters for expression. From these models, we chose for inspection only the ones carrying a “significant” trend, defined as those where (1) the difference order from stationary was higher than zero, and (2) at least one AR or MA coefficient was included in the model (i.e., D > 0 ∩ AR + MA > 0). These criteria allow sampling a wide variety of trend patterns if they were present in the data. For accessibility data, 14,594 (18.7%) and 13,591 (17.4%) peak models had D = 1 (none had D = 2) in females and males respectively, out of which 13,297 (17%) and 13,295 (17%), respectively, also were estimated to have AR + MA > 0, and hence chosen for further analyses. Out of the chosen peaks, 12,941 and 11,566 had only MA (9082 in females, 8696 in males) or AR (2,562 in females, 2574 in males) coefficients, whereas only 4196 (31.6%) and 4,181 (31.4%) were estimated to have two or more non-zero (AR + MA) coefficients in females and males, respectively. A total of 3197 (4.1%) peaks were chosen in both sexes, whereas the remaining peaks were treated as sex-specific. For expression data, 1367 (11.1%) and 1665 (13.5%) models had D = 1 (none had D = 2) in females and males respectively, out of which 1068 (8.6%) and 1471 (11.9%), respectively, also were estimated to have AR + MA > 0, and hence used in further analyses. Out of these, 1359 and 832 had only MA (511 in females, 903 in males) or AR (480 in females, 372 in males) coefficients, whereas only 503 (36.8%) and 554 (33.3%) were estimated to have two or more non-zero (AR + MA) coefficients in females and males, respectively. A total of 180 (1.5%) peaks were chosen in both sexes, whereas the remaining peaks were treated as sex-specific.

Temporal peak/gene analyses

After selecting genomic features with a significant trend, we used their estimated parameters to fit the original data to reduce noise and facilitate the discovery of common trend patterns across multiple features. Fitted values were computed at the same chronological ages observed in the data for each sex, converted to z-scores for comparability, and then submitted to k-means clustering for aggregation into k sets of similar chronological aging patterns. Chosen features for each sex were clustered separately, and additionally features chosen as common to both sexes were concatenated as a single sequence of two age-ordered ages prior to normalization. This concatenation approach was designed to facilitate the detection of features with consistent chronological patterns across sexes. For clustering, we used the R-stats version of the k-means algorithm (kmeans), starting from nstart = 1000 random sets of k cluster centers and running each for max.iter = 100 iterations. To choose k, we opted for a biologically informed criterion as opposed to a purely statistical one, after noting that criteria such as maximization of within- vs. between-cluster variance tended to select a large (>20) number of slightly different clusters, likely emphasizing small differences in ARIMA model parameters with little biological significance. Instead, we computed optimal clusters for values of k between 2 and 15 and calculated enrichment statistics for cell-specific features (i.e., cell-specific chromatin states for ATAC-seq, and cell-specific expressed genes for RNA-seq data) for each cluster relative to all trending features, and asked whether adding clusters to the k = 2 cluster case resulted in a gain of information demonstrated by cluster enrichment patterns qualitatively different from the patterns observed in absence of the extra clusters. In some cases, finding a cluster configuration satisfying this condition of biological discrimination required applying hierarchical clustering to collapse distinct k-means clusters with the same biological signal. For chromatin accessibility, conservative application of these criteria led to the selection of k = 5 and k = 6 clusters for females and males, respectively, which were collapsed into three clusters in both cases (one closing, two opening with aging), and k = 3 for peaks trending in both sexes (one closing and one opening in both sexes, plus one opening in females while closing in males). The same numbers of final clusters were defined based on gene expression.

TF motif enrichment analyses

To further explore functional associations of these clusters, we tested for enrichment of known and de novo TF motifs in each chromatin accessibility cluster for female, male, and sex-pooled peak data, relative to all trending clusters for the same set of peaks. We used the software HOMER54 to both detect de novo motifs and to test for enrichment of 1388 motifs obtained from JASPAR CORE 2018 database58 and from Jolma et al.59 SELEX-derived position weight matrices (PWMs), corresponding to 680 distinct TFs. We used HOMER to assign a TF to each de novo motif found, based on sequence similarity to the 1388 known motifs. From the pooled set of known and de novo motifs tested for each cluster, we used Benjamini-Hochberg method on enrichment p-values to estimate FDR, and applied a 5% FDR significance threshold to select for the final set of enriched motifs. Since some motifs have similar consensus sequences and PWMs, prior to visualization of these results we grouped motifs into “families” defined as sets of motifs with nearly identical sequences, as determined by pairwise comparison of all 1388 motif using MEME/TOMTOM60, which tests whether two motif sequences are more different than expected by chance. Using stringent 0.001% q-value and 5 E-value cutoffs, we identified 429 motif families with extremely similar consensus sequences, 319 of which were singletons (maximum family size = 226 motifs).

Breakpoint analyses

We investigated systemic chronological signatures in temporal peaks by testing, in each cluster, for the existence of “breakpoints,” i.e., short age intervals characterized by significant differences in accessibility in the intervals preceding and following the age interval. For each age t in the sampled age interval from t min to t max , we tested for mean difference in accessibility between subjects with ages in the intervals t min -w vs. t min + w, where w represents a variable window span parameter, and plotted the observed p-values (i.e., −log10 P, or loginvp) as a function of age (t) to identify maxima that suggested the presence of discrete breakpoints. These tests were carried out on normalized and model-adjusted accessibility data corresponding to the ATAC-seq peaks associated to each sex-specific and common cluster identified as trending by ARIMA as described above. Since there were many more peaks than subjects for any given comparison, we used PCA to reduce the dimensionality of each cluster to n = 3 PCs, and used MANOVA on these three dimensions to compute p-values at each tested value of t. For any given value of w, offset values for t min and t max were adjusted to match the age of available samples in the study. For example, a window span of 5 years required a t min = 27 if the youngest available subject was 22 years old, and a t max = 83 if the oldest available subject was 90 years old. For a given value of w, results from tests contrasting younger vs. older intervals would vary depending on sample size volume and imbalance, with statistical power increasing with the size of the window span. To take advantage of this effect, we deployed a multi-scale algorithm where we carried out tests using w values ranging from 10 to 20 years in order to identify breakpoints that were maximally supported under multiple window spans. Due to sample sparsity and variation, however, tests carried out under varying values of w may be unevenly affected by edge effects and influencing outlier points, which may result in strong significance of a comparison because of the presence of outliers and the partial overlap of a sampled interval with a breakpoint, limiting the ability of the method to precisely discover where such breakpoints may lie. To limit these effects and increase the robustness of the tests, we smoothed the loginvp distributions by fitting LOESS regressions to each comparison (i.e., each set of tests with the same w value) under a range of smoothing bandwidth parameters (i.e., bw = 0.25, 0.30, …, 0.70, 0.75). We combined the resulting 11 p-values at every sampled age using the Fisher’s method, reapplied LOESS smoothing to the resulting distribution, and used numerical differentiation to determine whether each age was predicted to be minimum or a maximum. Finally, we marked every maximum as a significant breakpoint candidate if it satisfied both a parametric criterion, i.e. significance of the Fisher method-combined p-values (χ2 test), and a heuristic criterion, namely whether the distance between this local maximum and the nearest minimum equaled or exceeded 25% of the value of the global maximum. The procedure described above results in a smoothed loginvp distribution for each w value, each comprising a series of points including maxima and minima, such that slightly different maxima can be estimated for different w values. Finally, we used Gaussian mixture modeling on the distribution of these maxima, as implemented in R-Mclust package, to group loginvp maxima obtained from different window spans into cohesive breakpoint intervals, whose medians and ranges we report herein for each cluster. Since breakpoints are independently calculated for each cluster, observed overlaps are likely the result of aging-related events with a genome-wide impact.

Analyses of 500FG and MI data

We obtained publicly available ELISA data measuring serum protein levels by the 500 Human Functional Genomics (500FG) consortium11. We only retained individuals who are matching our cohort in terms of the age span, which resulted in data from 267 individuals. These individuals are grouped together using the age brackets defined in our study: Young 22–40 years old, middle-aged: 41–64 years old, older: 65 years old. We compared data from (1) men and women at all age brackets; (2) young men to old men; and (3) young women to old women using Wilcoxon Rank Sum non-parametric test (two sided). Note that flow cytometry data from this same cohort was not publicly available; hence cannot be included into the analyses. Similarly, publicly available flow cytometry data from Milieu Intérieur Consortium12 cohort was obtained. We used data that is already processed by this study and just used individuals whose ages are matching our cohort, which resulted in data from 892 individuals. We built linear models using R (lm function) to quantify the association between each flow cytometry measurement to age group (young, middle-aged, older), sex (F, M) and their interaction (age*sex). Significant associations with age (at FDR 5%) and with sex (at FDR 10%) are plotted in Figs. S7D-E. No significant associations were detected with the interaction of sex and age at FDR 10%.

R Shiny application

The R Shiny package61 was used to create a webpage for the interactive visualization of the data from this study. Plots were generated using ggplot262, using graph esthetics used throughout the manuscript figures. Statistics for box plots were calculated using the wilcox.test function in R and presented without correction for the multiple (n = 5) comparisons (two sided). Statistics for scatterplots were calculated by fitting a linear model with the lm function in R using the formula (measurement ~ age:sex + sex). For ATAC-seq data, only the peak closest to the gene TSS was considered.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.