Building the skeletal muscle data compendium

From ArrayExpress [18], microarray datasets from human skeletal muscle biopsies were selected and manually curated based on the original publications, including available supplemental data (see “Methods” section). The selected experiments contain data from 2852 microarrays from 20 different array platforms (Figure S1 in Additional file 1). Affymetrix-manufactured arrays dominate, represented by 11 different array types and in total 2475 arrays. Using a controlled vocabulary, sample and experimental parameters selected for re-annotation were defined. We retrieved the original papers along with supplemental material to re-annotate each microarray using our newly defined parameters and their value ranges (Table 1). To define a generic control set, representing a normal, healthy population, a set of 1188 “super controls” were selected. In this group, samples were excluded if the individual had any kind of disease, was obese (BMI > 30), or was subjected to any severe intervention.

To avoid the strong bias introduced by differences in individual probe sequences when combining data from different array platforms [24], we restricted this study to data from each platform independently. We used a subset of the compendium based on the two most common platforms: 568 arrays from the Affymetrix HG-U133A, and 1174 arrays from the HG-U133 + 2 platform. The probe effects were addressed by normalizing each dataset with RPA [19, 20], which models the affinity of each individual probe, assuming it to be a stochastic variable with a normal distribution with probe set-specific mean and variance rather than a constant, as in many other normalization methods including RMA and MAS5. To avoid biases introduced by genetic diversity in the studied individuals, we removed all probes mapping to known human SNPs with a minor allele frequency higher than 5 % in a Western European population. Out of 604,258 probes on the HG-U133 + 2 array, 4840 probes were removed; on the HG-133A array, 2157 out of 247,965 probes were removed. Oligonucleotide probes were summarized to gene level probe sets rather than transcript specific ones, also to minimize biases introduced by probe sequences and their representation on different arrays. After quality control [21], 1236 arrays from the two platforms remained: 758 from HG-U133 + 2 and 485 from HG-U133A. The two resulting data matrices contain data for 19,597 genes tested on the HG-U133 + 2 array and a subset of 11,926 of these on the HG-U133A array. The two resulting cross-study data matrices are also available from ArrayExpress, accession number E-MTAB-1788.

These comprehensive data sets represent comparable human skeletal muscle expression data over a vast array of different experimental conditions. In order to identify constitutively expressed genes, we analyzed the variance of expression only removing the study effect. The genes with most stable expression are presented in Table S1 (see Additional file 1) and are not likely to be influenced by the experimental conditions. The most stable genes found were myoglobin (MB), GAPDH, and alpha 1 actin (ACTA1) and could serve as candidates for “housekeeping” endogenous control genes in quantitative real-time PCR experiments.

Expression levels of 957 genes are associated with age

We selected a subset of 361 arrays from the compendium to study the effect of aging on gene expression, i.e., 211 arrays from the HG-133A array and 150 from the HG-U133 + 2 array that had specific annotation of age and gender, ranging from <1 year up to 83-year-old individuals (Figure S2 in Additional file 1). Using a linear model with sex and study ID as covariates, 957 genes were significantly associated with age (p < 0.05, Benjamini-Hochberg correction for multiple testing) (top-50 genes are shown in Table 2). Of these, 484 were upregulated and 473 downregulated with increasing age. We verify the removal of study effects by principal component analysis (PCA) before and after study adjustment. Whereas samples from the same study primarily cluster together in the PCA of the unadjusted dataset, this effect is removed in the adjusted one (Fig. 1). Similarly, we use PCA to verify the absence of gender biases in the dataset after the adjustment for study effects (Figure S3 in Additional file 1). A significant overlap (N = 13, p = 1.0 × 10−5) and complete concordance for the direction of the expression for all 13 genes found in our data set out of the 73 genes detected in the multispecies de Magalhaes study [6] were observed. Twenty-five of the 957 genes are reported in the GenAge database of 288 genes linked to aging, an overlap of the two lists which is significant at p = 0.0020 using a hypergeometric test. The GenAge database has also collected and curated a list of genes in loci detected in genome-wide association studies for longevity. They report 886 genes, 353 of which were significantly associated in the original studies. Out of this list, we detect an overlap of 25 out of 957 genes (p = 0.024). Seventeen of the 957 aging genes have been previously reported in the top 250 genes by the Zahn study [5] (overlap p = 0.065). Using publicly available RNA sequencing data (n = 157) from the GTEx project (http://www.gtexportal.org/), 91 genes out of the 957 were found associated with age, a significant overlap at p = 2.2 × 10−16.

Table 2 Top 50 genes significantly associated with age across 361 samples Full size table

Fig. 1 Principal component analysis of the 211 HG-U133A arrays, before (a) and after (b) removal of study effects and corresponding plots for the 150 HG-U133 + 2 arrays (c and d). Different studies (identified by database accession numbers) are represented with different colors, and increasing age of the individual from whom the biopsy was taken is indicated by the increasing dot size. The relatively strong study effect seen even after normalization is removed after the adjustment Full size image

The genes with the strongest association to aging in the current study are H3F3B (p = 3.4 × 10−13) and AHNAK (p = 6.9 × 10−12), both showing increased expression with increased age. AHNAK is a large protein localized to the sarcomere of skeletal muscle and increased expression of AHNAK with aging is in line with the study by de Magalhaes et al. [6]. Increased expression of AHNAK has previously also been associated with a low VO 2MAX and poor muscle fitness [25]. The two genes most strongly showing reduced expression with increasing age are homeobox B2 (HOXB2, p = 1.0 × 10−11) and deleted in lymphocytic leukemia 1 (DLEU1, p = 8.6 × 10−10). The full list of 957 genes is available in Table S2 (see Additional file 2).

Aging significantly alters genes involved in inflammation and mitochondrial metabolism

We analyzed the lists of 484 genes upregulated and 473 downregulated with aging for enrichment of specific pathways and processes using gene set enrichment analysis (GSEA) (Table 3) [9]. We find enrichment of genes with increased expression in six pathways, connected to the complement system, indicating an inflammatory response with aging (p < 0.05, Bonferroni adjusted). This is in line with Zahn et al. that also reported increased expression of genes in the complement system with aging [5]. Thirteen pathways were enriched in genes with decreased expression associated with increasing age (p < 0.05, Bonferroni adjusted). Eight of these represent mitochondrial components, supporting a perturbation of mitochondrial function with aging. Four groups connected to metabolism were also found, indicating reduced expression of genes in the electron transport chain (ETC)/OXPHOS pathway, in the citric acid/tricarboxylic acid cycle (TCA), and in pyruvate metabolism. We observe significant downregulation with increasing age of all four major complexes of the ETC located in the inner mitochondrial membrane (Figure S4 in Additional file 1). Subunits of NADH dehydrogenase (NDUFAF5, p = 1.0 × 10−4; NDUFS3, p = 8.8 × 10−5), cytochrome c reductase (UQCR10, p = 2.6 × 10−4; UQCR11, p = 7.7 × 10−4) and oxidase (COX10, p = 4.3 × 10−5; COX4I1, p = 1.7 × 10−4; COX7B, p = 3.1 × 10−4; COX7C, p = 6.1 × 10−4; COX5B, p = 1.1 × 10−3), and ATP synthase (ATP5G3, p = 1.1 × 10−6; ATPAF1, p = 3.6 × 10−5; ATP5G1, p = 8.0 × 10−5; ATP5C1, p = 2.0 × 10−4; COX10, p = 4.3 × 10−5) are all downregulated with increasing age. We also find that the expression of C-x(9)-C motif containing 2 (CMC2) is decreased with aging (p = 1.0 × 10−7). CMC2 is required for respiratory growth, and mutants with CMC2 deletion are unable to assemble the cytochrome c oxidase complex [26]. In the pyruvate dehydrogenase complex (PDC), proteins A1, B, and X (p = 4.1 × 10−6; 4.2 × 10−5; 1.3 × 10−3) show reduced expression with aging. In connection to carbohydrate metabolism, the expression of 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 1 (PFKFB1) is increased with aging (p = 9.4 × 10−5). Furthermore, glucose uptake through GLUT4 in skeletal muscle is known to be dependent on TBC1 Domain Family Member 1 (TBC1D1) [27, 28], the expression of which is decreased with aging (p = 3.6 × 10−4).

Table 3 Gene set enrichment analysis for the 484 genes upregulated with age and the 473 downregulated Full size table

High physical capacity counteracts age-related changes in muscle expression

Decline in muscle mass and performance with aging can be prevented by exercise. Therefore, we explored whether expressions of any of the genes associated with aging also were influenced by physical capacity assessed as VO 2MAX .

In a subset of 116 samples with information on VO 2MAX , we find 39 genes associated with physical capacity (FDR < 0.05, Benjamini-Hochberg, Table 4), but given the relatively low number of samples included in the VO 2MAX analysis, we restricted our analysis to the 957 genes associated with age. Of them, 21 were also associated with VO 2MAX (Table 5). It is striking, but not unexpected, that aging and increased physical capacity affects gene expression in opposite directions for 20 of the 21 genes. Two of these, the suppressor of cytokine signaling 2 (SOCS2) and the fasciculation and elongation protein zeta 2 (FEZ2) are also associated with BMI (FEZ2: p BMI = 5.9 × 10−4, SOCS2: p BMI = 7.3 × 10−4); increasing BMI affects gene expression in the same direction as increasing age (FEZ2: p age = 2.8 × 10−8, SOCS2: p age = 5.5 × 10−7) and in the opposite direction with increased physical capacity (FEZ2: p VO2max = 3.5 × 10−5, SOCS2: p VO2max = 6.3 × 10−8) (Fig. 2).

Table 4 Thirty-nine genes genome-wide significantly associated with physical capacity across 116 samples Full size table

Table 5 Of the 957 aging genes, 21 were significantly associated with physical capacity across 116 samples Full size table