Sampling

Bat capture and blood sampling were implemented in accordance with the permits and ethical guidelines issued by ‘Arrêté’ by the Préfet du Morbihan and the University College Dublin ethics committee. The sampling procedures are exhaustively described in ref. 23. Briefly, female M. myotis bats, aged 0 to 7+ years (for example, first marked with transponders as adult 6 years before subsequent recapture), were captured in Brittany, France. These individuals have been marked with unique transponders since 2010, and the initial ages of bats when first captured were determined by examining the epiphyseal cartilage in their finger bones. Individuals were recorded as juvenile (0 years old) if the epiphyseal plates in their finger bones were open; otherwise, they were recorded as adult (1+ years old, true age unknown). Each year, individuals recaptured were identified by their transponder number and new captures (juvenile and adult) were transponded. For each individual, a volume of ~50–200 μl of blood, depending on the weight of the individual, was collected from the uropatagial vein using a sterile needle (26 gauge). Blood samples were pipetted into cryotubes (2 ml, Nalgene labware) and immediately flash-frozen in liquid nitrogen. Haemostatic gel (Bloxang, Bausch & Lomb) was applied to the puncture to prevent further bleeding. The bats were rapidly released after being offered water and food. Blood samples were maintained at −150 °C for long-term storage before RNA extraction.

RNA extraction

Total RNA was extracted from whole blood using a RNAzol BD kit (No. RB192, Molecular Research Center, Inc.), following the manufacturer’s instructions with minor modifications. RNA extraction is described in ref. 23. The quantity and quality of RNA were assessed using a NanoDrop Spectrophotometer (Thermo Scientific) and a Bioanalyzer 2100 (Aligent Technologies), respectively. All samples that met the criteria of having >2 μg of total RNA and an RNA Integrity Number score >8.0 were chosen for mRNA-Seq Illumina sequencing library preparation. For small RNA library preparation, 3 μg of total RNA per sample was required.

Library preparation and Illumina sequencing

For both RNA-Seq and small RNA library preparation, all qualified RNA samples were initially purified using a Turbo DNA-free kit (No. AM1907, Ambion) to remove residual DNA. The Globin-Zero Gold rRNA Removal kit (Epicentre Illumina) was further employed to deplete unwanted rRNA and globin mRNA. Nevertheless, this kit was specifically designed for human and mouse, and performed less effectively on other species. A large amount of ribosomal RNA (mostly 7 s rRNA) still remained after depletion based on the MiSeq result (Fasteris SA). Therefore, an additional step of 7 s rRNA depletion was carried out (Fasteris SA). The probes specifically designed for 7 s rRNA removal are 5′-TCCTTAGGCAACCTGGTGG-3′, 5′-GGGAGGTCACCATATTGATG-3′ and 5′-GGCAACCTGGTGGCCCCCCGCTCCCGGGAGG-3′. To assess the efficiency of the 7 s rRNA probes, we constructed two libraries with and without 7 s rRNA depletion from the same sample, HWJ-8. The MiSeq titration run demonstrated that the percentage of short reads that mapped to 7 s rRNA reduced from 31.65 to 17.78 after depletion (Supplementary Fig. 8), implying that the 7 s rRNA depletion step performed effectively.

RNA-Seq libraries were prepared using the TruSeq stranded mRNA kit (Illumina), following the manufacturer’s protocols. Sequencing was performed on Illumina HiSeq 2500 platforms, with the sequencing depth to a minimum of 50 million and 125-bp paired-end reads per sample. Small RNA libraries were constructed using the TruSeq small RNA kit (Illumina) and were further sequenced on Illumina HiSeq 2500 platforms, to generate a minimum of 8 million 50-bp single-end reads per sample.

Summary of sequenced samples

We deep-sequenced 100 transcriptomes and 50 miRNomes from 150 blood samples which were collected from 71 female M. myotis bats (Supplementary Table 1). These individuals were caught in five colonies in Brittany between 2013 and 2016 (Supplementary Table 2). Among these individuals, two were caught four times in consecutive years, two were caught three times and 20 were captured twice, with the remainder being caught only once. Of all samples, 49 had the transcriptome and corresponding miRNome sequenced simultaneously (Supplementary Table 1). The statistics derived from these samples are summarized in Supplementary Tables 3 and 7.

Individual transcriptome assembly

The quality control pipeline is extensively described in ref. 23. To enable a rapid genome-mapping process required for the referenced-based assembly, the identical quality control reads, which consume large amounts of computational resources but make little contribution to the complexity of transcriptome assembly, were removed using FastUniq45, leaving only one pair represented. These adaptor-free, high-quality and non-redundant paired-end reads were further used as inputs for assembly. We employed both de novo and reference-based methods to assemble individual transcriptomes, and the optimal strategy was selected on the basis of the assembly quality. For the reference-based method, since the M. myotis genome is not yet available, the M. lucifugus genome was used as a reference as this is the phylogenetically closest, well-assembled genome available. However, due to the genetic divergence (~30 million years) between these two species46, we carried out a series of tests to determine the optimal number of mismatches per alignment for the genome-mapping step. In brief, the filtered RNA-Seq data were aligned to the M. lucifugus genome (MyoLuc 2.0) using Tophat2 (v.2.1.0)47, with different numbers of mismatches (~2–10). For all RNA-Seq samples, the mapping rates were categorized into different groups based on the number of mismatches, and Student’s t-tests (all data followed normal distribution, Shapiro–Wilk test, P > 0.05) were conducted between groups N 2 and N 3 , groups N 3 and N 4 and so on, up to groups N 9 and N 10 . Group N 6 (six mismatches per alignment) was selected as the optimal mismatch number, because there were no significant differences observed between groups N 6 and N 7 based on the mapping rates (P > 0.05) (Supplementary Fig. 9). To exclude any ambiguous mappings, we retained only the unique and concordant alignments using Samtools48. The resulting BAM file was further used to reconstruct transcripts by Cufflinks (v.2.2.1)49, with the parameters –GTF–guide and –F 0.1. –GTF–guide allows the assembly of unannotated regions in the genome, while –F 0.1 removes dubious isoforms with extremely low expression. The final assembly was determined by keeping those transcripts with fragment per kilobase of transcript per million (FPKM) mapped reads higher than zero. De novo assembly was carried out using Trinity (v.2.1.1)50 in the strand-specific and paired-end mode with the default parameters. For each sample, the quality of the reference-based and de novo assemblies was assessed using CEGMA (v.2.5)51. On average, the completeness of the reference-based assembly was 15% higher than that of the de novo assembly (Supplementary Fig. 10). Given their better quality, the reference-based assemblies were used to construct the ‘super’ transcriptome reference.

‘Super’ transcriptome assembly and annotation

The reference-based assemblies from all 100 samples were merged using Cuffmerge (v.2.2.1)49. The parameter —min-isoform-fraction 0.5 was used to discard the unreliable isoforms poorly supported by low sequencing coverage. The redundancy of the merged assembly was removed using CD-HIT52 with a sequence identity threshold of 95% (-c 0.95), and FrameDP53 was employed to correct the misassembled transcripts with unexpected ‘indels’ or stop codons.

Functional characterization of the super transcriptome assembly was carried out using the following pipeline. Briefly, all transcripts were categorized into three groups: protein-coding RNA, lncRNA and miscRNA. To annotate the protein-coding transcripts, the open reading frame (ORF) of each transcript was predicted using FrameDP53. The transcripts with potential ORFs were queried against the Uniprot and Nr databases using BLASTX54, with an E-value <10−6. The transcripts, with the effective hits which shared at least 60% similarity and 70% sequence coverage with the entries, were accordingly annotated as protein-coding transcripts. The ORF potential transcripts with no effective hits in either database were regarded as miscRNA. lncRNA were predicted from the transcripts with no obvious ORFs defined by FrameDP53. Firstly, these transcripts were compared to the Pfam database55 to search for conserved domains using BLASTX54 with an E-value <10−6. Those transcripts that contained or overlapped with any conserved motifs were excluded. In addition, those transcripts with coding potential scores >0.3, as evaluated by CPAT56, were also discarded. The threshold of the coding potential score is suggested in ref. 56. The final set of lncRNAs was determined by removing those filtered transcripts with lengths shorter than 200 bp. The remainder of the uncharacterized transcripts were considered as miscRNA.

Transcript expression analyses

For each RNA-Seq sample, adaptor-free, high-quality reads (without redundancy removal) were used to quantify the reference transcripts using Salmon (v.0.9)57 with the parameters –ISF and –F 31, indicating the library type (I, first-forward; S, strand-specific; F, paired-end) and k-mer length, respectively. Before expression analysis, RNA-Seq samples with a CEGMA index of the reference-based assembly <0.85, were removed. A transcript was considered expressed if it had a cumulative raw read count >100 across all samples. This led to a matrix containing expression data of 48,749 transcripts across 88 samples.

To examine the complexity of the bat blood transcriptome, the contribution of each transcript to total transcriptional output was measured in each sample. To achieve this, all transcript per million (TPM) values from each sample were sorted from most to least and their relative fractions were calculated by dividing them by the sum of all TPM values respectively. A functional enrichment analysis of the top 100 highly expressed transcripts was performed using Metascape58. We also explored the global transcriptomic similarity across 88 samples. Based on the expression data of 48,749 transcripts, pairwise Spearman’s rank correlation coefficient tests were carried out using the R package cor (v.3.0). Before the analysis, TPM values were log 2 -transformed (log 2 (TPM + 1)). The average linkage method was applied to all correlation tests.

Gene expression variance analyses

In this study, we focused largely on gene-level analysis and investigated only 31,460 protein-coding transcripts, as most lncRNAs and miscRNAs are currently functionally unknown. To obtain gene expression estimates, raw expression counts of the transcripts which had the same BLAST hits in the Uniprot or Nr databases were accordingly aggregated to the gene-level counts using the Bioconductor R package tximport59. Thus, 31,460 protein-coding transcripts corresponded to 12,263 protein-coding genes. All downstream analyses were conducted based on these 12,263 genes.

For the age cohort analyses, samples with unambiguous ages (0, 1, 2, 3, 4, 5, 6) were employed, together with 7+ regarded as the oldest age cohort (Supplementary Fig. 1a). Gene expression counts (Supplementary Data 5) were trimmed mean of M-values (TMM)-normalized, log 2 -transformed and further converted to z-scores. A linear mixed model (LMM) was used to evaluate the contribution of potential variables (see below) to gene expression variation. Normalized gene counts (n = 12,263) were considered as dependent variables, whereas age was considered as an explanatory variable together with other variables including the individual, colony, year of capture and sequencing batch effect. With the exception of age being modelled as a fixed effect, all other variables were modelled as random effects. The LMM was implemented using the Bioconductor R package variancePartition60 with the following formula:

$$\mathrm{Gene} \, \mathrm{Expression} \sim \mathrm{Age} \,+\, (1\vert\mathrm{Individual}) \,+\, (1\vert\mathrm{Colony}) \,+\, (1\vert\mathrm{Year-of-capture}) + (1\vert\mathrm{Batch} \mathrm{effect})$$

Gene co-expression network analyses

To identify gene expression changes with age, pairwise differential gene expression analyses were performed using the R packages DESeq2 (ref. 61) and EdgeR62. For both methods, FDR < 0.05 and an absolute value of log 2 (fold change) > 0.5 were used to define differentially expressed genes. To reduce the rate of false-negatives and obtain a wide range of age-related candidates for pattern analysis, we maximized the number of differentially expressed genes from both methods. All network analyses were implemented in R. Gene co-expression analysis was performed using the R package WGCNA (v.1.63)63. Genes that exhibited no differential expression between any pairs of age cohorts were excluded from this analysis because these non-varying genes usually represent noise for pattern detection. The differentially expressed genes from pairwise differential expression analyses were clustered into different modules, and the correlation between each module and age was calculated (Supplementary information). After Benjamini–Hochberg correction, modules with P < 0.05 were considered to have significant correlation with age. For each module, a functional enrichment analysis was carried out using Metascape58. In particular, we established a gene product network encompassing 107 genes that were enriched in DNA repair (Gene Ontology: 0006281) using STRING64. These genes were clustered in modules that exhibited a positive correlation with age in M. myotis. The interaction was predicted based on different sources of evidence, including curated databases, functional experiments and gene neighbourhood, co-occurrence and co-expression. Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis was performed using STRING64.

Comparative transcriptomic analyses across mammals

To ascertain how the longevity-associated genes and pathways observed in M. myotis changed with age in other mammals, we took advantage of existing ageing blood transcriptome datasets available from three independent cohort studies on other mammals and compared their ageing transcriptomic signatures to those of bats. These comparative datasets included human (H. sapiens, n = 147, age ~25–75 years)24, mouse (Mus musculus, n = 25, ~0.2–2.5 years)15 and wolf (Canis lupus, n = 26, ~1–9 years)65. For human and mouse, raw gene expression counts were obtained from refs. 15 and 24, respectively, while for wolf we analysed the raw RNA-Seq data to generate gene expression estimates using the pipeline established for M. myotis, as described above. For human and mouse, gene ontology term enrichment analyses of the top 100 highly expressed genes were performed using Metascape55, and the percentage of age-related gene expression variance was estimated using the same method as for M. myotis.

To gain a global view of transcriptomic shifts during ageing, for each species PCA was performed based on expression data of all expressed genes using the R package prcomp (Supplementary Fig. 5). For each species, gene expression data were normalized using TMM and further transformed to z-score. To compare age-related gene expression patterns across species, for each gene we measured the Spearman’s rank correlation coefficient between its expression and age across all four mammals. Since the age cohorts in human, mouse and wolf datasets started from adult, samples from juveniles (0 years old) were excluded from correlation analyses in M. myotis. We plotted the distribution of Spearman’s correlation coefficient for each species and investigated the top 200 genes (100 positive and 100 negative) that exhibited the strongest correlation with age (Fig. 4a). Functional enrichment analyses of these top genes were performed using Metascape58.

To investigate the expression pattern at the pathway level, within each species we employed the median z-scores of all genes under each enriched gene ontology term to represent their overall pattern with age. The z-scores were transformed from Spearman’s correlation coefficient within a species. These enriched gene ontology terms were selected on the basis of their distinct expression patterns with age in M. myotis (Fig. 2a–c) and were associated with ageing processes. The full list of gene ontology terms that were used for comparison is given in Supplementary Data 3.

In addition, we also ascertained the expression levels of 307 genes curated in the GenAge database (Build 19) that are highly associated with human ageing16, across these four mammals. Only 207 genes were commonly expressed, so their correlation with age was investigated (Supplementary Data 4). For the top 20 most age-correlated genes (Fig. 4b), we further investigated their functional association with 207 human ageing-associated genes using the STRING database64 (Supplementary Table 8).

miRNA expression analyses

Additionally, we also sequenced 50 small RNA libraries from M. myotis whole-blood samples, 49 of which originated from the same samples that had corresponding transcriptomes previously sequenced (Supplementary Table 1). For each library, miRNA profiling and quantification were analysed using the miRDeep2 pipeline66. The detailed steps and parameters used are described fully in ref. 19. By searching the miRBase (v.32)67, mature miRNA candidates were categorized into the ‘known’ and ‘novel’ groups. Here we focused only on the ‘known’ miRNAs, as these are well documented and experimentally validated. The novel miRNAs were usually expressed at a lower level, thus having limited impact on gene regulation. For miRNA expression analysis, conserved miRNAs across all libraries were extracted as described in ref. 19. Only miRNAs that were presented in at least 80% of all samples were used in the expression analysis.

The samples from individuals 0, 1, 2, 3, 4, 5, 6 and 7+ years of age were used for age cohort analysis. Quantile normalization was applied to the raw miRNA expression counts (Supplementary Data 6). The normalized values were further log 2 -transformed and converted to z-scores. PCA was performed using prcomp (v.3.0) in R68. Pairwise differential miRNA expression analysis was conducted using DESeq2 (ref. 61), and differentially expressed miRNAs were defined according to the criteria stated above.

miRNA–mRNA expression correlation analyses

miRNA–mRNA interaction was investigated to elucidate the mechanisms of miRNA-directed regulation. First, 3′-UTR sequences were initially extracted from the assembled protein-coding transcripts (n = 31,460) using ExUTR44. miRNA targets were predicted using miRanda69, with the parameters -strict and -en -20. For each miRNA–mRNA pair predicted above, Spearman’s rank correlation coefficient was calculated. To achieve this, samples that had both transcriptome and miRNome sequenced were selected. The raw expression counts were quantile-normalized and log 2 -transformed. For each pair, the correlation coefficient was computed using cor (v.3.0) in R68. Negatively correlated miRNA–mRNA pairs were determined according to the threshold of FDR < 0.1. Functional enrichment analysis of miRNA-mediated mRNA was performed using Metascape58. In particular, we investigated those genes that were involved in cell cycle regulation and DNA repair pathways and that are simultaneously regulated by miRNA in M. myotis.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.