The CellAge database

The CellAge website can be accessed at http://genomics.senescence.info/cells/. Figure 1a presents the main CellAge data browser, which allows users to surf through the available data. The browser includes several columns with information that can be searched and filtered efficiently. Users can search for a comma-separated gene list or for individual genes. Once selected, a gene entry page with more detailed description of the experimental context will open.

Fig. 1 a The CellAge database of CS genes. The main data browser provides functionality to filter by multiple parameters like cell line and senescence type, and select genes to view details and links with other aging-related genes on the HAGR website. b Breakdown of the effects all 279 CellAge genes have on CS, and the types of CS the CellAge genes are involved in. Genes marked as “Unclear” both induce and inhibit CS depending on biological context. Numbers above bars denote the total number of genes inhibiting, inducing, or having unclear effects on CS. c Functional enrichment of the nonredundant biological processes involving the CellAge genes (p < 0.05, Fisher’s exact test with BH correction) (Additional file 1: Table S3). GO terms were clustered based on semantic similarities Full size image

CellAge was compiled following a scientific literature search of gene manipulation experiments in primary, immortalized, or cancer human cell lines that caused cells to induce or inhibit CS. The first CellAge build comprises 279 distinct CS genes, of which 232 genes affect replicative CS, 34 genes affect stress-induced CS, and 28 genes affect oncogene-induced CS. Of the 279 total genes, 153 genes induce CS (~ 54.8%), 121 inhibit it (~ 43.4%), and five genes have unclear effects, both inducing and inhibiting CS depending on experimental conditions (~ 1.8%) (Fig. 1b). The genes in the dataset are also classified according to the experimental context used to determine these associations.

We have also performed a meta-analysis to derive a molecular signature of replicative CS and found 526 overexpressed and 734 underexpressed genes [32]. These gene signatures are also available on the CellAge website. Of the 279 CellAge genes, 44 genes were present in the signatures of CS (15.8%). This overlap was significant (p value = 1.62e−08, Fisher’s exact test). While 13 of the CellAge inducers of CS significantly overlapped with the overexpressed signatures of CS (8.5%, p = 2.06e−06, Fisher’s exact test), only 7 overlapped with the underexpressed signatures (4.6%, p = 5.13e−01, Fisher’s exact test). The CellAge inhibitors of CS significantly overlapped with both the overexpressed signatures of CS (n = 7, 5.8%, p = 4.08e−02, Fisher’s exact test) and underexpressed signatures of CS (n = 17, 14%, p = 2.06e−06, Fisher’s exact test).

CellAge gene functions

High-quality curated datasets enable systematic computational analyses [33, 34]. Since we are interested in learning more about the underlying processes and functionality shared by human CS genes, we started by exploring functional enrichment within the CellAge dataset.

Using the database for annotation, visualization and integrated discovery—DAVID Version 6.8 [35, 36], we found that genes in CellAge are enriched with several clusters associated with Protein Kinase Activity, Transcription Regulation, DNA-binding, DNA damage repair, and Cell cycle regulation in cancer. In particular, genes that induce senescence were more associated with promoting transcription, while genes that inhibit senescence were more associated with repressing transcription. Furthermore, we found that inducers of senescence were significantly associated with VEGF and TNF signalling pathways (p < 0.01, Fisher’s exact test with Benjamini-Hochberg correction) (Additional file 1: Table S1 and S2). WebGestalt 2019 was used to determine which nonredundant biological processes the CellAge genes are involved in, and REVIGO was used to cluster related processes (p < 0.05, Fisher’s exact test with BH correction) [37, 38]. A total of 298 categories were significantly enriched and clustered: Signal transduction by p53 class mediator; Aging; Protein localization to nucleus; DNA-templated transcription, initiation; Epithelial cell proliferation; Cell growth; Rhythmic process; Cellular carbohydrate metabolism; Reactive oxygen species metabolism; Cytokine metabolism; Adaptive thermogenesis; Organic hydroxy compound metabolism; Methylation; Generation of precursor metabolites and energy (Fig. 1c; Additional file 1: Table S3).

Evolutionary conservation of CellAge genes in model organisms

Next, we looked at the conservation of CellAge genes across a number of mammalian and non-mammalian model organisms with orthologues to human CellAge genes using Ensembl BioMart (Version 96) [39] in order to understand the genetic conservation of CS processes. There was a significantly higher number of human orthologues for CellAge genes than for other protein-coding genes in mouse, rat, and monkey, while non-mammalian species did not show significant conservation of CellAge genes (two-tailed z-test with BH correction) (Additional file 1: Table S4; Additional file 2: Fig. S1A). Interestingly, previous studies have found that longevity-associated genes (LAGs) are substantially overrepresented from bacteria to mammals and that the effect of LAG overexpression in different model organisms was mostly the same [40]. It remains unclear what the evolutionary origin of most of the CellAge genes is or why they are not present in more evolutionarily distant organisms. Unique evolutionary pressures could have played an important role in the evolution of CellAge genes in mammals. However, somatic cells in C. elegans and Drosophila are post mitotic and lack an equivalent CS process, which could explain why the CellAge genes are not conserved. We further compared the conservation of CellAge inducers and inhibitors of CS and found that while the inducers were significantly conserved in the mammal model organisms, the inhibitors were not (Additional file 2: Fig. S1B).

We also report the number of orthologous CellAge genes present in 24 mammal species using the OMA standalone software v. 2.3.1 algorithm [41] (Additional file 2: Fig. S1C). From 279 CellAge genes, we report 271 orthogroups (OGs) (Additional file 3). Twenty-two OGs were conserved in the 24 mammals, including the following genes: DEK, BRD7, NEK4, POT1, SGK1, TLR3, CHEK1, CIP2A, EWSR1, HDAC1, HMGB1, KDM4A, KDM5B, LATS1, MORC3, NR2E1, PTTG1, RAD21, NFE2L2, PDCD10, PIK3C2A, and SLC16A7 (Additional file 1: Table S5). Within the long-lived mammalian genomes analyzed (human, elephant, naked mole rat, bowhead whale, and little brown bat), we found 128 OG CellAge genes (Additional file 3; genomes available in Additional file 1: Table S6). However, finding OGs is dependent on genome quality and annotations, and higher-quality genomes would likely yield more OGs.

For the evolutionary distances, we found that the long-lived species had similar distances to the other species, meaning the branch lengths for long-lived species are distributed throughout the phylogeny as expected in a random distribution (Additional file 2: Fig. S1D). This was the case when we analyzed the concatenated tree for the 271 CellAge OGs as well as when we analyzed the 22 individual CellAge genes conserved among all 24 mammalian species (Additional file 4).

CellAge vs human orthologues of longevity-associated model organism genes

To understand how senescence is linked to the genetics of aging processes, we looked at the intersection of CellAge genes and the 869 genes in the human orthologues of model organisms’ longevity-associated genes (LAGs) dataset, collected based on quantitative changes in lifespan [34]. Like CellAge, where genes are classified based on whether their upregulation induces, inhibits, or has an unknown impact on CS, the longevity orthologues dataset also provides information on the effect of upregulation of its genes, namely whether it promotes (pro, 421) or inhibits (anti, 448) longevity (Additional file 1: Table S7; Additional file 2: Fig. S2).

The CS inducers statistically overlapped with the anti-longevity genes and not with the pro-longevity genes (anti: n = 9, ~ 6%, p = 1.42e−02; pro: n = 6, ~ 4%, p = 1.40e−01, Fisher’s exact test with BH correction). We noted an inverse result with the inhibitors of CS, where there was a much greater overlap between the CellAge inhibitors and the pro-longevity genes, resulting in the smallest p value of all the overlaps (n = 18, ~ 15%, p = 2.61e−10, Fisher’s exact test with BH correction). However, there was also a significant overrepresentation of genes inhibiting the CS process within the anti-longevity genes (n = 7, ~ 6%, p = 2.41e−02, Fisher’s exact test with BH correction). It is possible that some of the pathways the CS inhibitors are associated with increase longevity, whereas other pathways have anti-longevity effects. Overall, these results highlight a statistically significant association between CS and the aging process and suggest a potential inverse relationship between CS and longevity, at least for some pathways. Gene overlaps are available in Additional file 1: Table S8.

CellAge genes differentially expressed with age

In another work, we performed a meta-analysis to find molecular signatures of aging derived from humans, rats, and mice [42]. To investigate how the expression of CellAge genes changes with age, we looked for CellAge genes which either induce (153) or inhibit (121) senescence within the list of aging signatures. The genes overexpressed with age (449) had a significant overlap with the CellAge genes (CS inducers: n = 17, ~ 11%, p = 6.58e−07; CS inhibitors: n = 9, ~ 7%, p = 6.35e−03, two-tailed Fisher’s exact test with BH correction) while the genes underexpressed with age (162) did not (CS inducers: n = 0, p = 8.57e−01; CS inhibitors: n = 3, ~ 3%, p = 1.64e−01). The overexpressed genetic signatures of replicative CS (526) also significantly overlapped with the overexpressed signatures of aging (n = 60, ~ 11%, p = 1.18e−23), but not the underexpressed signatures of aging (n = 3, ~ 1%, p = 8.79e−01). Finally, the underexpressed signatures of replicative CS (734) did not significantly overlap with the overexpressed (n = 18, ~ 3%, p = 8.79e−01) or underexpressed (n = 9, ~ 1%, p = 3.26e−01) signatures of aging.

Given that 112 (40%) of CellAge genes have only been confirmed to control CS in fibroblasts, we repeated the above analyses using a subgroup of CellAge genes that have been shown to affect CS in other cell types. A total of 91 CellAge inducers of CS and 72 inhibitors were overlapped with the signatures of aging. The same overlaps were still significant after FDR correction, indicating that the differential expression of CellAge genes with age cannot exclusively be attributed to fibroblast idiosyncrasies (CS inducers overexpressed: n = 10, ~ 11%, p = 1.50e−04; underexpressed: n = 0, p = 1. CS inhibitors overexpressed: n = 6, ~ 8%, 1.34e−02; underexpressed: n = 2, ~ 3%, p = 1.98e−01).

Using all protein-coding genes from the meta-analysis as a background list [42], we further examined the CS inducers overexpressed with age for functional enrichment using WebGestalt 2019 to determine if specific biological processes were enriched [38]. In parallel, we performed this analysis using the genes which overlapped between CellAge inhibitors and genes overexpressed with age. In total, 71 GO terms were significantly enriched for the overlap between CellAge senescence inducers and age upregulated genes (p < 0.05 Fisher’s exact test with BH correction) (Additional file 1: Table S9). Because many of the enriched GO terms were redundant (e.g., wound healing and response to wound healing, regulation of cytokine production and cytokine production), they were clustered based on semantic similarity scores using REVIGO [37]. We found groups enriched for regulation of apoptotic processes, response to lipid, epithelium development, rhythmic process, circadian rhythm, cytokine metabolism, and cell-substrate adhesion (Additional file 2: Fig. S3A). A total of 71 enriched GO terms for the overexpressed signatures of CS overexpressed with age were clustered using REVIGO, resulting in enriched terms relating to regulated exocytosis, aging, response to beta-amyloid, and cell proliferation (Additional file 1: Table S10; Additional file 2: Fig. S3B). No GO terms were significantly enriched for the inducers of CS underexpressed with age, the inhibitors of CS differentially expressed with age, the underexpressed signatures of CS differentially expressed with age, or the overexpressed signatures of CS underexpressed with age.

Tissue-specific CS gene expression and differential expression of CS genes in human tissues with age

The Genotype-Tissue Expression (GTEx) project contains expression data from 53 different tissue sites collected from 714 donors ranging from 20 to 79 years of age, grouped into 26 tissue classes [43]. We asked if CellAge genes and differentially expressed signatures of CS were expressed in a tissue-specific manner [42] and determined how CS gene expression changes across different tissues with age [32].

We first examined tissue-specific CS expression and found that CellAge genes were either expressed in a tissue-specific manner less than expected by chance, or in line with expectations; in other words, the majority of CellAge genes tended to be expressed across multiple tissues (Additional file 1: Table S11; Additional file 2: Fig. S4A). Testis was the only tissue with significant differences between the actual and expected number of tissue-specific CellAge genes expressed (less tissue-specific genes than expected by chance, p < 0.05, Fisher’s exact test with BH correction). The underexpressed signatures of CS were significantly less tissue-specific in the testis and liver, while the overexpressed signatures of CS were significantly less tissue-specific in the brain, liver, pituitary, and skin, and more tissue-specific in blood. We also compared the ratio of tissue-specific to non-tissue-specific genes in the CS datasets to all protein-coding genes. While ~ 25% of all protein-coding genes are expressed in a tissue-specific manner, only ~ 10% of CellAge genes and ~ 11% of signatures of CS are expressed in a tissue-specific manner (Additional file 2: Fig. S4B), significantly less than expected by chance (p = 2.52e−12 and 3.93e−48 respectively, Fisher’s exact test with BH correction).

Then, we examined the differential expression of CS genes with age in different tissues. Using a previously generated gene set of differentially expressed genes (DEGs) with age in 26 tissues on GTEx [32, 43], we found overlaps with 268 CellAge inducers and inhibitors of CS present in the gene expression data (Fig. 2a). The process of finding DEGs with age filters out lowly expressed genes, which explains the 11 missing CellAge CS regulators. Overall, senescence inducers were overexpressed across different tissues with age, although none of the overlaps were significant after FDR correction (Fisher’s exact test with BH correction, p < 0.05) (Additional file 1: Table S12). There was the opposite trend in the inhibitors of CS, where there was noticeably less overexpression of CS inhibitors with age, although these overlaps were also not significant after FDR correction. A total of 1240 differentially expressed signatures of CS were also overlapped with the GTEx aging DEGs in 26 human tissues, including 9 tissues previously analyzed (Fig. 2b) [32]. The overexpressed signatures of CS were significantly overexpressed across multiple tissues with age, and only significantly underexpressed with age in the brain and uterus (p < 0.05, Fisher’s exact test with BH correction) (Additional file 1: Table S13). Furthermore, the underexpressed signatures of CS trended towards being overexpressed less than expected by chance across multiple tissues with age, although these overlaps were only significant after FDR adjustment in the colon and nerve, while the underexpressed signatures of CS were significantly overexpressed more than expected in the uterus. Finally, the underexpressed signatures of CS were underexpressed with age more than expected by chance in the colon, lung, and ovary, and underexpressed with age less than expected by chance in the brain. We also compared the ratio of differentially expressed to non-differentially expressed CS genes in at least one tissue with age to the equivalent ratio in all protein-coding genes (Additional file 2: Fig. S5A and S5B) (see Overlap Analysis in Methods). We found that ~ 64% of all protein-coding genes did not significantly change expression with age in any human tissues, while ~ 19% were overexpressed and ~ 17% were underexpressed (~ 7% were both overexpressed and underexpressed across multiple tissues) (Additional file 1: Table S14 and S15). For the CellAge genes, the number of inducers of CS significantly overexpressed with age in at least one tissue was significantly higher than the genome average (n = 50, ~ 30%, p = 1.5e−3, Fisher’s exact test with BH correction). The inducers of CS underexpressed with age and the inhibitors of CS differentially expressed with age were not significantly different from the protein-coding average. We also compared the number of signatures of CS differentially expressed with age in at least one tissue to the protein-coding genome average. The overexpressed signatures of CS were significantly differentially expressed with age compared to all protein-coding genes, whereas the number of underexpressed signatures of CS was underexpressed with age more than expected by chance.

Fig. 2 Differential expression of a CellAge inducers and inhibitors of CS and b differentially expressed signatures of CS in human tissues with age. Red values indicate that there were more genes differentially expressed with age than expected by chance (−log 2 (p-val)). Blue values indicate that there were less genes differentially expressed with age than expected by chance (log 2 (p-val)). Asterisks (*) denote tissues with significantly more CS genes differentially expressed with age (p < 0.05, Fisher’s exact test with BH correction, abs(50*log 2 FC) > log 2 (1.5)) (Additional file 1: Table S12 and S13). c Comparison of the median log 2 FC and distribution of log 2 FC with age between the CS genes and all protein-coding genes in human tissues. Red tiles indicate that the median log 2 FC of the CellAge and CS genes is higher than the median log 2 FC of all protein-coding genes for that tissue, while blue tiles indicate that the median log 2 FC of the CS genes is lower than the median genome log 2 FC. Asterisks (*) indicate significant differences between the log 2 FC distribution with age of CS genes and the log 2 FC distribution with age of all protein-coding genes for that tissue (p < 0.05, Wilcoxon rank sum test with BH correction) (Additional file 1: Table S16). d CellAge genes differentially expressed in at least two tissues with age. Gray tiles are genes which had low basal expression levels in the given tissue and were filtered out before the differential gene expression analysis was carried out [32]. Colored tiles indicate significant differential expression with age (p < 0.05, moderated t-test with BH correction, abs(50*log 2 FC) > log 2 (1.5)). Numbers by gene names in brackets denote the number of tissues differentially expressing the CellAge gene with age. Red gene names specify that the CellAge gene was significantly overexpressed with age in more tissues than expected by chance, while blue gene names show the CellAge genes significantly underexpressed with age in more tissues than expected by chance (p < 0.05, random gene expression tissue overlap simulations) (Additional file 1: Table S17 – S20). Liver, pancreas, pituitary, spleen, small intestine, and vagina did not have any significant CS DEGs with age Full size image

The overall fold change (FC) with age of the CS genes was also compared to the FC with age of all protein-coding genes for each tissue in GTEx (Fig. 2c; Additional file 1: Table S16). The median log 2 FC with age of the CellAge CS inducers and the overexpressed signatures of CS was greater than the genome median for the majority of tissues on GTEx, although the difference in log 2 FC distribution with age between the inducers of CS and all protein-coding genes was only significant in seven tissues (Wilcoxon rank sum test with BH correction, p < 0.05). The median log 2 FC with age of the CellAge inhibitors of CS and the underexpressed signatures of aging was smaller than the genome median in the majority of tissues, showcasing the opposite trend to the inducers of CS and overexpressed signatures of CS. However, the only tissues with significantly different distributions of log 2 FC with age for the inhibitors of CS were the skin and esophagus, where the median log 2 FC distribution was significantly less than the genome average, and the salivary gland, where the median log 2 FC distribution was significantly more than the genome average. We also found that the distribution of log 2 FC with age of the differentially expressed signatures of CS significantly changed in opposite directions with age in 14 tissues. Interestingly, this trend was present even in the adrenal gland and uterus, where the signatures of CS changed with age in the opposite direction to the majority of other tissues.

The expression of the majority of CS genes does not change with age (Additional file 2: Fig. S5A), yet a significant number of CS genes trend towards differential expression with age across multiple tissues in humans (Fig. 2). We ran 10,000 simulations on the GTEx RNA-seq data to determine the likelihood of a CS gene being differentially expressed with age in more than one tissue by chance (see Simulation of CS Gene Expression in Human Aging in Methods) (Additional file 2: Fig. S5C; Additional file 5). The likelihood of a CellAge gene being overexpressed with age in more than three tissues and underexpressed with age in more than two tissues by chance was less than 5% (CS gene expression simulations) (Fig. 2d; Additional file 1: Table S17; Additional file 2: Fig. S5C). CS inducers overexpressed in significantly more tissues with age than expected by chance included CDKN2A, NOX4, CPEB1, IGFBP3. ABI3, CDKN1A, CYR61, DDB2, MATK, PIK3R5, VENTX, HK3, SIK1, and SOX2, while PTTG1, DHCR24, IL8, and PIM1 were underexpressed in significantly more tissues (Additional file 1: Table S18; Additional file 2: Fig. S5D). ZMAT3 and EPHA3 were the two CS inhibitors overexpressed in significantly more tissues with age than expected by chance, while CDK1, AURKA, BMI1, BRCA1, EZH2, FOXM1, HJURP, MAD2L1, SNAI1, and VEGFA were underexpressed in significantly more tissues. We also performed simulations to determine the likelihood of gene expression signatures of CS being differentially expressed with age in multiple human tissues by chance (Additional file 1: Table S19): less than 5% of the genes in the CS signatures are expected by chance to be overexpressed with age in more than three tissues or underexpressed with age in more than two tissues. A total of 46 CS signature genes (29 overexpressed, 17 underexpressed) were overexpressed with age in significantly more tissues than expected by chance, and 139 CS signature genes were underexpressed in more tissues than expected by chance (26 overexpressed genes in CS, 113 underexpressed genes in CS) (Additional file 1: Table S20).

Do CS and longevity genes associate with aging-related disease genes?

A previous paper [34] grouped 769 aging-related diseases (ARDs) into 6 NIH Medical Subject Heading (MeSH) classes [44] based on data from the Genetic Association Database [45]: cardiovascular diseases (CVD), immune system diseases (ISD), musculoskeletal diseases (MSD), nutritional and metabolic diseases (NMD), neoplastic diseases (NPD), and nervous system diseases (NSD). The same approach was used to build the HAGR aging-related disease gene selection tool (http://genomics.senescence.info/diseases/gene_set.php), which we used to obtain the ARD genes for each disease class and overlap with the CellAge genes.

There were links between the CellAge genes and NPD genes, which is expected given the anti-tumor role of senescence (Additional file 1: Table S21). Without accounting for publication bias (i.e., some genes being more studied than others), all ARD classes are significantly associated with CellAge genes, with lower commonalities with diseases affecting mostly non-proliferating tissue such as NSD. NPD genes are even more overrepresented in the GenAge human dataset, which could suggest commonality between aging and senescence through cancer-related pathways. Both the strong association of NPD genes with GenAge and senescence, and the strong link between GenAge and all ARD classes is interesting. Indeed, longevity-associated genes have been linked to cancer-associated genes in previous papers [46]. Considering age is the leading risk factor for ARD [47, 48], the results from GenAge support the previously tested conjecture that there are (i) at least a few genes shared by all or most ARD classes; and (ii) those genes are also related to aging in general [34]. We also looked for genes that are shared across multiple disease classes and are also recorded as CS genes. CellAge genes shared across multiple ARD classes included VEGFA and IFNG (5 ARD classes), SERPINE1, MMP9, and AR (4 ARD classes), and CDKN2A (3 ARD classes). Results are summarized in Additional file 2: Fig. S6.

Are CS genes associated with cancer genes?

Cellular senescence is widely thought to be an anti-cancer mechanism [49]. Therefore, the CellAge senescence inducers and inhibitors of senescence were overlapped with oncogenes from the tumor-suppressor gene (TSG) database (TSGene 2.0) (n = 1018) [50] and the ONGene database (n = 698) [51] (Additional file 1: Table S22 – S27). The number of significant genes overlapping are shown in Fig. 3a, while the significant p values from the overlap analysis are shown in Fig. 3b (p < 0.05, Fisher’s exact test with BH correction).

Fig. 3 a Overlap between CellAge inducers and inhibitors, and oncogenes and tumor-suppressing genes. b Adjusted p value and odds ratio of the overlap analysis. The number of overlapping genes in each category was significant (p < 0.05, Fisher’s exact test with BH correction). p values are shown in gray writing for each comparison. Data available in Additional file 1: Table S22 – S27 Full size image

The significant overlap between CellAge genes and cancer indicates a close relationship between both processes. Specifically, the overlap between CellAge inhibitors and oncogenes, and the overlap between CellAge inducers and TSGs were more significant, with lower p values and larger odds ratios (Fig. 3) [52]. This analysis was repeated after filtering out CellAge genes that were only shown to induce senescence in fibroblasts. The overlaps were still significant after FDR correction, indicating that the overlap between CellAge and cancer genes is not specific to genes controlling CS in fibroblasts (CS inducers with oncogenes: n = 10, p = 9e−05; with TSGs: n = 23, p = 4e−12. CS inhibitors with oncogenes: n = 17, 1e−12; with TSGs: n = 8, p = 9e−04, p < 0.05, Fisher’s exact test with BH correction) (Additional file 2: Fig. S7).

Gene ontology (GO) enrichment analyses were performed using WebGestalt to identify the function of the overlapping genes [38]. Overlapping genes between CellAge senescence inducers and TSGs were enriched in GO terms related to p53 signalling and cell cycle phase transition (Additional file 2: Fig. S8A). The enriched functions of overlapping genes between CellAge senescence inducers and oncogenes were mainly related to immune system processes and response to stress (Additional file 2: Fig. S8B). Overlapping genes between CellAge senescence inhibitors and TSGs were enriched in only 5 terms, which are cellular response to oxygen-containing compound, positive regulation of chromatin organization, and terms relating to female sex differentiation (Additional file 2: Fig. S8C). Finally, overlapping genes between CellAge senescence inhibitors and oncogenes were related to processes such as negative regulation of nucleic acid-templated transcription, cellular response to stress, and cell proliferation (Additional file 2: Fig. S8D). All of the functional enrichment data can be found in Additional file 1: Table S28 – S31.

Network analyses

The CellAge genes form both protein-protein and gene co-expression networks. The formation of a protein-protein interaction (PPI) network is significant in itself given that only ~ 4% of the genes in a randomly chosen gene dataset of similar size are interconnected [53]. In order to have a more holistic view of CS, we were interested in the topological parameters of the networks that CS genes form. For this, several types of networks were constructed using the CellAge genes as seeds: the CS PPI network, along with two CS gene co-expression networks built using RNA-seq and microarray data. Biological networks generally have a scale-free topology in which the majority of genes (nodes) have few interactions (edges), while some have many more interactions, resulting in a power law distribution of the node degree (the number of interactions per node) [31, 54]. As expected, the node-degree distribution of the above networks does confirm a scale-free structure (Additional file 2: Fig. S9). Additional file 1: Table S32 presents the network summary statistics for the resulting networks.

The network parameters we looked at were as follows: Degree, Betweenness Centrality (BC), Closeness Centrality (CC), and Increased Connectivity (IC). The degree is the number of interactions per node and nodes with high degree scores are termed network hubs. BC is a measure of the proportion of shortest paths between all node pairs in the network that cross the node in question. The nodes with high BC are network bottlenecks and may connect large portions of the network which would not otherwise communicate effectively or may monitor information flow from disparate regions in the network [31]. CC is a measure of how close a certain node is to all other nodes and is calculated with the inverse of the sum of the shortest paths to all other nodes. Lower CC scores indicate that nodes are more central to the network, while high CC scores indicate the node may be on the periphery of the network and thus less central. The IC for each node measures the statistical significance for any overrepresentation of interactions between a given node and a specific subset of nodes (in our case CellAge proteins) when compared to what is expected by chance. Taken together, genes that score highly for degree, BC, CC, and IC within the senescence networks are likely important regulators of CS even if up until now they have not been identified as CS genes.

Looking at the topology of CS networks, the PPI network, microarray-based co-expression network, and RNA-seq co-expression network all possess comparable scale-free structures. However, gene co-expression data is less influenced by publication bias. This is particularly important considering published literature often reports positive protein-protein interactions over protein interactions that do not exist [55]. The lack of negative results for protein interaction publications complicates the interpretation of PPI networks even more, as the absence of edges in networks does not necessarily mean they do not exist. On the other hand, RNA-seq and microarray co-expression data, while not influenced by publication bias, does not give indications of actual experimentally demonstrated interactions (physical or genetic). Furthermore, RNA read counts do not directly correlate to protein numbers, with previous studies reporting that only 40% of the variation in protein concentration can be attributed to mRNA levels, an important aspect to consider when interpreting RNA-seq data [56]. Finally, the microarray network was constructed using the COXPRESdb (V6), which contains 73,083 human samples and offered another degree of validation [57]. Although RNA-seq reportedly detects more DEGs including ncRNAs [58], GeneFriends [59] contains 4133 human samples, far less than the microarray database from COXPRESdb.

The protein-protein interaction network associated with CS

We only used interactions from human proteins to build the CellAge PPI network. The network was built by taking the CellAge genes, their first-order partners and the interactions between them from the BioGrid database. The CellAge PPI network comprised of 2487 nodes across four disjointed components, three of which only comprised of two nodes each, and the main component containing 2481 nodes.

The genes with the highest degree scores were TP53, HDAC1, BRCA1, EP300, and MDM2. These same genes also ranked in the top five CC. Expectedly, several of these genes also possessed the highest BC: TP53, BRCA1, HDAC1, and MDM2 (with BAG3, a gene with a slightly smaller degree also within the top 5). On the other hand, the genes ranked by top 5 IC were CCND1, CCND2, CDKN2A, SP1, and EGR1. Of note among these nodes, EP300, MDM2, CCND2, and EGR1 were not already present in CellAge. Additional file 2: Fig. S10 summarizes the gene intersection across the computed network parameters, while Additional file 1: Table S33 identifies potential senescence regulators not already present in CellAge from the PPI network. We found that from the top 12 PPI candidates, 11 have been recently shown to regulate senescence in human cell lines and will be added to CellAge build 2.

Within the main PPI network component, a large portion of CS genes and their partners formed a single large module with 1595 nodes. Using DAVID version 6.8, we found the terms enriched within the module; the top five are: Transcription, DNA damage & repair, cell cycle, Proteasome & ubiquitin, and ATP pathway [35, 36] (Additional file 1: Table S34). These results are all in line with previously described hallmarks of cellular senescence [60].

It is prudent to note that centrality measures in PPI networks must be interpreted with caution due to publication bias that can be an inherent part of the network [61, 62]. The top network genes identified from the PPI network are likely to be heavily influenced by publication bias [63]. Looking at the average PubMed hits of the gene symbol in the title or abstract revealed a mean result count of approximately 2897 per gene, far higher than the genome average (136) or existing CellAge genes (712) (Additional file 2: Fig. S11).

Unweighted RNA-Seq co-expression network

We used CellAge genes that induce and inhibit CS and their co-expressing partners to build a cellular senescence co-expression network. The network consists of a main connected network with 3198 nodes, and a number of smaller “islands” that are not connected to the main network (Fig. 4a).

Fig. 4 a Cluster analysis of the RNA-Seq Unweighted Co-expression Network. The 171 seed nodes obtained from CellAge and their first order interactors. The colours represent the breakdown of the network into clusters. The algorithm revealed 52 distinct clusters, of which we color and order the 19 clusters with the best rankings for modularity, or in the case of module 17–19, size. The CellAge nodes are colored in dark purple, appearing throughout the network. Larger nodes have higher betweenness centrality. In order of decreasing modularity, the main function clusters of the modules were related to; Spermatogenesis (Module 1), Synapse (Module 2), Cardiac muscle contraction (Module 3), Cell Cycle (Module 4), Secreted (Module 5), Tudor domain (Module 6), ATP-binding (Module 7), Symport (Sodium ion transport) (Module 8), DNA damage and repair (Module 9), transit peptide: Mitochondrion (Module 10), Steroid metabolism (Module 11), Transcription regulation (Module 12), Protein transport (Module 13), Mitochondrion (Module 14), Heme biosynthesis (Module 15), Innate immunity (Module 16), Signal peptide (Module 17), Keratinocyte (Module 18), and Transcription repression (Module 19) (Enrichment results in Additional file 1: Table S35, genes in Additional file 1: Table S36). b RNA-Seq Unweighted Co-expression Network, local clustering. Red/Orange represents nodes with high clustering coefficient, whereas pale green represents nodes with lower clustering coefficient. Degree is also weighted using node size. CellAge nodes are colored purple, and GenAge Human nodes are also shown and highlighted in bright green. The right-hand panel is an enlarged view of the left-hand panel Full size image

The main interconnected network included 130 CellAge genes. Among these, we also found that 14% of them are also human aging-related genes, reported in GenAge - Human dataset, whereas the remainder of the smaller networks only comprised of 1.6% longevity genes [64]. Next, we looked at a number of centrality parameters to see how CellAge genes are characterized compared to the entire network. CellAge genes had a mean BC of 0.00363, whereas the remainder of the genes had a BC of 0.00178, revealing that if CellAge genes are removed, modules within the network may become disconnected more easily. While nodes scoring highly for BC in PPI networks are likely bottleneck regulators of gene expression, this is not necessarily true for co-expression networks. In this case, nodes can also have high BC scores if they are co-activated via various signalling pathways. Although BC alone is not enough to determine which genes are regulating CS, taking BC into account with other network topological parameters can be a good indicator of gene function. Aside from high BC, CellAge genes also had a lower local clustering coefficient of 0.58, compared to a mean of 0.76 across non-CellAge genes, indicating that locally, CellAge genes connect to other genes less than the average for the network. This can also be seen at the degree level, where CellAge genes averaged only 53 connections, compared to an average of 103 connections in non-CellAge genes. Finally, the mean CC score was not significantly different between CellAge nodes and other genes in the network (0.148 in CellAge vs 0.158). CellAge genes were therefore more likely to be bottlenecks in signalling across different modules and occupy localized areas with lower network redundancy, suggesting that perturbations in their expression might have a greater impact on linking different underlying cellular processes.

The topological analysis of the main network component as a whole revealed a more modular topology than the PPI network, resulting in genes tending not to appear in multiple measures of centrality. There were 23 nodes with significant IC with senescence-related genes, including PTPN6, LAPTM5, CORO1A, CCNB2 and HPF1. No node from the top 5 IC was present in the top 5 genes with high BC, CC, or Degree. Overall, the primary candidates of interest included KDM4C, which had a significant IC and was at the top 1% of CC and top 5% of BC, along with PTPN6, SASH3 and ARHGAP30, which all had significant IC values and were at the top 5% of BC. We found that KDM4C and PTPN6 have been shown to regulate CS in human cell lines, and will be added to build 2 of CellAge [65, 66].

Previous studies have advocated that measures of centrality are generally important to identify key network components, with BC being one of the most common measures. However, it has also been postulated mathematically that intra-modular BC is more important than inter-modular BC [67]. Therefore, by isolating network clusters of interest and identifying genes with high BC or centrality within submodules, we propose to identify new senescence regulators from the co-expression network.

Using the CytoCluster app (see Networks in Methods) [68], we found 54 clusters in the network, of which we represent the top clusters colored according to modularity (Module 1–16) or size (Module 17–19) (Fig. 4a). Reactome pathway enrichment for all main clusters highlighted cell cycle and immune system terms in the two largest clusters [35, 36]. The largest cluster of 460 nodes (17 CellAge nodes, Module 4), possessed a high modularity score and was strongly associated with cell cycle genes, including the following general terms: Cell Cycle; Cell Cycle, Mitotic; Mitotic Prometaphase; Resolution of Sister Chromatid Cohesion; and DNA Repair. The second largest cluster (Module 16), however, had weak modularity (ranking 26); it comprised of 450 nodes (19 CellAge nodes) and was enriched for immune-related pathways including: Adaptive Immune System; Innate Immune System; Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell; Neutrophil degranulation; and Cytokine Signaling in Immune system. Cluster 4 and Cluster 5 were not enriched for Reactome Pathways. A visual inspection showed a number of bottleneck genes between Module 1 and Module 16, consistent with the role of the immune system in clearance and surveillance of senescence cells and the secretion of immunomodulators by senescent cells [69] (Additional file 1: Table S35).

We were also interested in visualizing areas in the network with a high local clustering coefficient, as this parameter represents areas with many neighborhood interactions and, therefore, more robust areas in the network. It was found that the two clusters of interest, enriched for cell cycle terms and immune system terms, overlapped with regions of lower clustering coefficient, potentially implying parts of the biological system with less redundancy in the underlying process. Figure 4b depicts regions of high local clustering coefficient in the network (orange) and regions less well connected locally (green).

Unweighted microarray co-expression network

We also made an unweighted microarray co-expression network built from the COXPRESdb database of microarray gene co-expression (V6) [57] (Additional file 2: Fig. S12). Compared with the RNA-seq co-expression network, the microarray network is significantly smaller, and only included 34% of the CellAge genes (Additional file 1: Table S32). However, we found that SMC4 was an important bottleneck in the microarray network, being in the top 5% CC and IC (Additional file 2: Fig. S12D and S12E). SMC4 was not independently associated with senescence despite being part of the condensing II complex, which is related to cell senescence [70]. Furthermore, SMC4 is associated with cell cycle progression and DNA repair, two key antagonist mechanisms of cell senescence development [71, 72]. SMC4 has been linked to cell cycle progression, proliferation regulation, and DNA damage repair, in accordance to the most significantly highlighted functional clusters in the module 2 and in the whole Microarray network (Additional file 1: Table S39 and S40; Additional file 2: Fig. S13) [73, 74]. There was limited overlap between the microarray co-expression network and the RNA-seq co-expression network, although this is not surprising considering the higher specificity and sensitivity, and ability to detect low-abundance transcripts of RNA-seq [75].

Experimental validation of senescence candidates

We set out to test if candidate genes from our network analyses are indeed senescence inhibitors using a siRNA-based approach, whereby knockdowns enable the p16 and/or the p21 senescence pathway to be induced, leading to senescence [76]. We tested 26 potential senescence inhibitor candidates, 20 of which were chosen using GeneFriends, a guilt-by-association database to find co-expressed genes [59]. For this, we used the CellAge CS inhibitors as seed genes, with the assumption that genes co-expressed with senescence inhibitors would also inhibit senescence, and generated a list of the top co-expressed genes with CS inhibitors based on RNA-seq data (Additional file 1: Table S41). Furthermore, CellAge has multiple ways of partitioning genes, including the type of senescence the genes are involved in (Fig. 1b). We decided to look for genes co-expressed with stress-induced premature senescence (SIPS) inhibitors. We generated a list of genes that are co-expressed with the CellAge SIPS genes (Additional file 1: Table S42). We chose to validate five additional genes that were both co-expressed with the CellAge SIPS and are present as underexpressed in our signature of CS [32]. Finally, we chose SMC4 from the microarray network due to its interaction with other senescence genes within the network, its association with cell cycle progression, and the fact that it is underexpressed in senescent cells, indicating it may be inhibiting senescence in replicating cells. The genes chosen, along with experimental validation results are shown in Fig. 5, while the justification for our validation and Z-scores are shown in Additional file 1: Table S43 and S44 respectively.

Fig. 5 Experimental validation of 26 senescence candidates. a–e Representative images of fibroblasts following transfection with cyclophilin B siRNA (top row), CBX7 siRNA (middle row), or GFT3C4 siRNA (bottom row). a DAPI (blue) and Ki67 (green). b DAPI (blue) and Cell Mask (red). c DAPI (blue), p16 (green) and p21 (red). d DAPI (blue) and IL-6 (red). e Brightfield images following staining for SA-β-galactosidase. Size bar, 100 μm. f Heatmap of multiparameter analysis of proliferation markers (cell number and % Ki67 positive), senescence-associated morphology (cellular and nuclear area) and senescence markers (% p16 positive, p21 intensity, perinuclear IL-6 and perinuclear SA-β-galactosidase). Colors illustrate the number of Z-scores the experimental siRNA is from the cyclophilin B (cycloB) negative control mean. Data are ranked by whether or not the siRNA is a top hit (siRNAs between the thick horizontal lines), and then by the cell number Z-score. Red values indicate Z-scores that are “senescence-associated measures.” The CBX7 positive control is also shown for comparison. Data presented are from at least two independent experiments each performed with a minimum of three replicates. All Z-scores are available in Additional file 1: Table S44 Full size image

Next, we performed transient siRNA transfections of normal human fibroblasts using the 26 candidates and identified those siRNAs that generated the induction of a senescence phenotype, using multiparameter analysis of morphological measures and a panel of senescence markers. Senescence induction is associated with a loss of proliferation, as measured by a decrease in Ki67 index and cell number, and changes in cellular morphology, as measured by an increase in cell and nuclear area. We also quantitated changes in p16 and p21 (key senescence effectors [76]), interleukin 6 (IL-6, a common SASP marker) and SA-β-galactosidase. Knockdown of cyclophilin B, a housekeeper, acted as a negative control [2], while knockdown of CBX7, a potent senescence inhibitor, was included as a positive control for senescence induction [77]. Of the 26 genes tested, 80.7% (21/26) resulted in a decrease in Ki67 positive nuclei greater than 1 Z-score (i.e., direction of change also observed for the CBX7 siRNA positive control, Fig. 5; Additional file 1: Table S44); 80.7% (21/26) increased p16; 96.2% increased p21 (25/26); 65.4% increase IL-6; and 65.4% (17/26) increase SA-β-galactosidase. Of the siRNAs that resulted in a decrease in Ki67 index, 61.9% (13/21) were classified as top hits as they concomitantly decreased cell number and altered at least one morphological measure. 92.3% (12/13) of the top hits activated both the p16 and p21 pathway, 84.6% (11/13) upregulated the SASP factor IL-6, while 61.5% (8/13) generated an increase in the percentage of SA-β-galactosidase positive cells. In general, we have shown the power of networks in predicting gene function, with 13 “top hits” (GTF3C4, C9orf40, HAUS4, MCM7, TCEB3, CDC25A, CDCA4, CKAP2, MTHFD2, NEK2, IMMT, MYBL2, and NIPA2).