Study design

To construct a comprehensive transcriptomic BodyMap of mouse, we generated an RNA-seq data set using a total of 72 samples isolated from 2 female and 2 male C57BL/6JJcl mice of 6-weeks old. Fourteen non-sexual tissues (adrenal gland, bone marrow, brain, forestomach, heart, kidney, liver, large intestine, lung, muscle, small intestine, spleen, stomach, and thymus) and two sexual tissues (ovary and uterus for females, or testis and vesicular gland for males) with two replicate samples of brain, liver, kidney and testis for each mouse and one sample of the other tissues per mouse were subjected to the RNA-seq analysis as follows (Table 1). Total RNA was extracted from each tissue by using the miRNeasy Mini Kit (Qiagen) according to the manufacturer’s protocol, and the RNA quality was evaluated with an Agilent Bioanalyzer. Then we used Illumina HiScan platform to sequence each cDNA library except four kidney and two testis samples for some reasons (Supplementary Table S1). On average, about 10 million 101 * 101 bp pair-ends reads were generated per sample, yielding a total of 1.4 billion reads for the study. Based on the quality reports of the raw data generated by fastQC22, mainly per base GC contents and per base sequence quality, we performed sequence trimming using Trimmomatic23 under default parameter settings. On average, 1.05 million reads (~10.6%) were eliminated from each sample (Supplementary Table S2). All subsequent analyses were based on the remaining reads after trimming.

Table 1 Mice and tissue sampling. Full size table

Overview of the landscape of the mouse transcriptome

With all the biological samples combined, a raw data matrix of fragments per kilobase of exon per million mapped fragments (FPKM) values consisting of 43,628 genes annotated in the Ensembl GRCm38 mouse reference genome across 72 samples was generated at first. It must be noted that the hierarchical cluster analysis of all the 72 samples (Supplementary Fig. S1) showed that one of the bone marrow samples was not clustered together with the other three bone marrow samples, indicating that there may be some problems such as a mislabeling with this sample. Since we were unable to trace down what exactly happened to this sample during the experiment, we removed all the four bone marrow samples hereafter, resulting in 68 remaining samples from 17 tissues for this study to enhance the reliability of our analyses. Of all the 43,628 genes, 22,196, 8,936, 8,060, and 4,436 belong to protein coding genes (PCGs), noncoding RNA (ncRNA), pseudogenes, and other genes (except the above three types), respectively. As reported in a previous study8, when the threshold of expression was set to FPKM > = 0.1 in at least one sample for a given gene, most PCGs (20,261/22,196, 91%) were expressed in some mouse tissues, whereas only a much smaller portion of the ncRNA genes (2,939/8,936, 33%) were expressed. That is, most ncRNA genes were typically tissue-specific or not expressed in any of the tissues studied. The percentages of expressed and non-expressed pseudogenes were almost even (52% vs 48%). Interestingly, the expression of the remaining 4,436 genes belonging to other categories showed a similar pattern as the PCGs, i.e., most of them were expressed, implying that these genes may play a role like PCGs in the activities of cells (Fig. 1a). This complementary expression pattern between PCGs and ncRNA was also observed under different thresholds (0.1 and 1) when considering the number of expressed genes given the number of tissues (Supplementary Fig. S2). Our results were similar to those found by the Genotype-Tissue Expression (GTEx) project8. However, the percentage of ncRNAs detected in at least one sample in our study was lower than that of the GTEx project (33% compared to 71%). It may be partly due to much fewer samples and tissues in our study compared to the GTEx project.

Figure 1 Landscape of the mouse transcriptome. (a) Percentage of expressed (red, FPKM > = 0.1 in at least one sample) and non-expressed (blue, FPKM < 0.1 across all samples) genes by different gene types. PCGs: protein-coding genes; ncRNA: non-coding RNA genes; Others: other genes except PCGs, ncRNA, and pseudogenes. (b) Number of expressed (FPKM > 0.1) genes by tissues. Numbers were averaged over all biological samples for a given tissue. (c) Hierarchical clustering analysis of gene expression profiles of 68 samples with 31,687 genes. Each column indicated a sample, whereas each row indicated a gene. Each tissue symbol was shown upon the color bar for each cluster. The mouse information where each sample came from was also labeled upon the color bar. Mouse 1 and Mouse 2 were two male mice (in blue text), while Mouse 3 and Mouse 4 were two female mice (in red text) as shown in Table 1. Full size image

The number of genes expressed in a given tissue ranged from 15,578 to 22,295. On average 18,634 (42.7%) of the 43,628 genes were defined as expressed (FPKM > = 0.1) per tissue. Differences among tissues in the numbers of genes expressed were observed. The testis expressed the most genes, followed by ovary and lung, which was consistent with findings in human9 and rat10, whereas the liver expressed the least number of genes (Fig. 1b). In addition, liver, heart, and kidney expressed relatively fewer genes than other tissues in all the three species. It proved in a way that the same tissues from different species exhibit high similarity. We also identified 11,017 genes and 10,926 isoforms ubiquitously expressed (FPKM > = 0.1 across all the 68 samples) which should play a vital role in basic biological functions of cells.

To analyze the gene expression profiles in a logarithmic form, genes with FPKM values of zero across all 68 samples were removed, and the value of 1 was added to each FPKM value before log2 transformation. Thus, a final data matrix of 31,687 genes and 68 samples was generated for subsequent analyses. Of the remaining 31,687 genes, 20,387 and 2,985 belong to PCGs and ncRNA, respectively. Hierarchical cluster analysis (Fig. 1c) as well as principal component analysis (Supplementary Fig. S3) of all the 68 samples demonstrated that different tissue types exhibited unique gene-expression profiles. Morphologically or functionally similar tissue pairs, such as heart versus muscle, stomach versus forestomach, and ovary versus uterus, were clustered more closely to each other, indicating that these tissue pairs shared a more consistent expression profile than other tissues. Both brain and liver showed relatively large differences in gene-expression profiles with the other tissues (Supplementary Fig. S3).

Tissue-dependent differentially expressed genes

We conducted a pairwise comparison using a t-test (FDR < 0.05, fold change (FC) > = 2 or <= 0.5) to identify differentially expressed genes (DEGs) between any two tissues (for all of the 17 tissues). The numbers of DEGs were largely dependent on the two types of tissues being compared. DEGs in the heart, liver, and muscle were generally under-expressed compared with other tissues, whereas DEGs were generally over-expressed compared with other tissues in adrenal gland, brain, lung, spleen, testis, and thymus (Fig. 2a). Our results were similar to a previous study on the rat transcriptome10. We also conducted separate pairwise comparison analyses for PCGs or ncRNA only (Supplementary Fig. S4). It was found that the PCGs had a quite similar pattern with the whole set of genes, whereas the ncRNA genes showed less divergence between tissues. Only brain, testis, and thymus had relatively high number of over-expressed ncRNA genes compared with other tissues.

Figure 2 Tissue-dependent differentially expressed genes. (a) Number of differentially expressed genes (FDR < 0.05, fold change >2 or <0.5) for each pairwise organs. For each tissue in each column, the figures indicated the numbers of down-regulated genes comparing this tissue to the other tissues. For each tissue in each row, the figures indicated the numbers of up-regulated genes comparing this tissue to the other tissues. The tissues are in the same order as in Fig. 1c. (b) Expression profiles of 5,035 tissue-specific genes across 68 samples were arranged by tissue types (in decreasing order based on the number of organ-enriched genes). Testis expressed the most tissue-specific genes among all tissues. Full size image

We also identified tissue-specific genes that are defined as those with more than four-fold higher expression in a given tissue over any other tissues. Totally, we identified 5,035 tissue-specific genes, with a relatively small number for each tissue except testis and brain (Fig. 2b). The number of tissue-specific genes varied from tissue to tissue. Testis expressed the most tissue-specific genes (2,496), closely followed by brain (708) and liver (280). The numbers of tissue-specific genes identified in the other ten tissues ranged from 41 to 223 (Supplementary Fig. S5).

DAVID online tools24 were used to reveal the biological meaning behind these tissue-specific genes. Generally, these genes were highly correlated with the particular biological processes or the development of a specific tissue (Supplementary Table S3). For example, the brain-specific genes were mainly associated with neural processes such as the transmission of nerve impulse and synaptic transmission as well as the KEGG25 pathway of neuroactive ligand-receptor interaction. The heart-specific and muscle-specific genes were involved in the biological processes of heart development and muscle tissue development, respectively. Furthermore, processes involved in the metabolism and acute inflammatory response were identified in liver-specific genes, whereas processes related to transport were enriched with kidney-specific genes. Genes specifically expressed in the four sexual tissues were mainly enriched in gene ontology categories associated with sex-related biological processes such as sexual reproduction, reproductive structure development, gamete generation, embryonic limb morphogenesis, and mating plug formation.

Sex-dominated genes expressed in non-sexual tissues

We also conducted comparisons between female and male mice to identify sex-dominated genes for the 13 non-sexual tissues (Supplementary Fig. S6). Overall, we identified 1,682 female-dominated genes (genes with expression level in female more than two-fold higher than in male) and 2,377 male-dominated genes (genes with expression level in male more than two-fold higher than in female) across all 13 tissues. Some tissues such as heart and spleen showed relatively large difference between female and male in the number of sex-dominated genes. Meanwhile, different patterns were observed in the differences between the numbers of female and male dominated genes within a specific tissue across all 13 tissues. Some tissues expressed an equivalent number of sex-dominated genes for both sexes (such as adrenal gland, brain, kidney, liver, and thymus), whereas forestomach, heart, large intestine, lung, muscle, and spleen expressed more male-dominated genes, and small intestine and stomach expressed more female-dominated genes (Supplementary Fig. S7). Though each tissue had only two replicates (except brain and liver which had four replicates) for each sex, our results can somehow show the sex differences in non-sexual tissues.

Identification of housekeeping genes

Housekeeping genes are typically constitutive genes that are required for the maintenance of basic cellular functions, and are expressed in all cells of an organism under different conditions26,27,28. Thus, housekeeping genes are widely used as internal controls for experiments as well as for normalizing gene expression data29,30,31. To identify housekeeping genes in mouse, we used similar criteria as a previous study32 (see Methods).

Hence, we identified 4,781 candidate housekeeping genes satisfying the above criteria. The vast majority of these housekeeping genes (4,662) belonged to protein-coding genes, whereas the rest were classified as lncRNA (30), processed pseudogene (24), processed transcript (23), antisense (23), and so on. The distribution of these genes varied across chromosomes (Fig. 3a). Chromosome 2 harbored the most housekeeping genes (420), closely followed by chromosome 11 (416), while chromosome X had the least housekeeping genes (107) without taking account for chromosome Y because no housekeeping genes were found on chromosome Y. However, after normalizing the data by the number of all the genes on each chromosome, we found chromosome 5 had the highest percentage of housekeeping genes, followed by chromosome 8 and 19. To assess the reliability of our results, we compared our gene list to a list of 3,804 human housekeeping genes32 based on human and mouse homologous genes33. Of the 3,506 human housekeeping genes having a homology to mouse genes, 2,608 (74.4%) were found in our candidate mouse housekeeping genes. Conversely, among the 4,781 candidate mouse housekeeping genes from our study having a homology to human genes, 2,608 (54.5%) were also found in the list of human housekeeping genes (Fig. 3b). The discrepancy may be due to the true differences between species or the types of tissues included in the respective studies.

Figure 3 4,781 genes were identified as mouse candidate housekeeping genes. (a) Chromosome distribution of the 4,781 housekeeping genes. The exact number for each chromosome was highlighted in blue figure. Upper panel: the absolute number of housekeeping genes identified on each chromosome; lower panel: percentage of housekeeping genes on each chromosome (divided by the total number of genes on each chromosome). (b) 2,608 genes were both identified as housekeeping genes in mouse (our study, red circle) and human (Eisenberg’s study, blue circle). (c) Expression profiles of the 4,781 genes by groups. Red: genes also identified as housekeeping genes in human; Green: genes not identified in human. (d) Expression of four well-known housekeeping genes (Gapdh, Actb, Hprt, and B2m). Only Hprt was found in our list of housekeeping genes. Full size image

According to the comparison results, we divided the candidate mouse housekeeping genes into two groups: genes with high confidence (the 2,608 genes that are in common with the human housekeeping genes) and relatively low confidence (the remaining 2,173 genes). Interestingly, the expression level of the former group was always higher than the latter one within the corresponding tissue (Fig. 3c). The higher measurement variability for the relatively lower expression genes might account for the differences between human and mouse.

As expected, the housekeeping genes are enriched in biological processes or pathways related to basic cellular activities, such as RNA processing, translation, protein localization, protein transport, and so on. Separate analyses of the two groups of housekeeping genes demonstrated that they shared 186 and 12 significantly enriched (p-value < 0.05) GO terms and KEGG pathways (Supplementary Fig. S8) mostly associated with fundamental activities of cells, respectively.

Furthermore, we particularly focused on the expression profiles of four well-known housekeeping genes, Gapdh, Actb, Hprt, and B2m, which were widely used as internal references to normalize RNA expression levels (Fig. 3d). However, only Hprt was found in our list of candidate housekeeping genes, because the other three genes were highly variable among tissues and exhibited more than 41, 13, and 77 fold maximal variability for Gapdh, Actb, and B2m, respectively. Our observations are consistent with results from previous studies32, 34, indicating that Gapdh, Actb, and B2m are probably not suitable for the reference purposes.

Consequently, we proposed a list of 27 genes that are consistently and highly expressed across all tissues included in this study (Table 2). The expression profiles of these 27 genes (Supplementary Table S5) are much more consistent than currently commonly used housekeeping genes such as Gapdh, Actb, and B2m. Thus, we recommend using the 27 candidate genes as the reference, e.g., for the normalization of the gene-expression studies.