Significant 3’UTR shortening in spermatogenesis

To examine how APA is regulated during spermatogenesis, we extracted RNA from mouse testes at different time points after birth, namely 1 week (w), 2w, 3w, 4w, 5w and 6w. These time points correspond to key phases in the first wave of spermatogenesis (indicated in Fig. 1a), during which cells are largely synchronized [49, 50]. RNA was subjected to 3’ region extraction and deep sequencing (3’READS), a deep sequencing method we recently developed to examine the expression of poly(A) + RNAs and identify their pAs [12].

We first compared the relative expression of the top two most abundant 3’UTR-APA isoforms of each gene at different time points. Based on the location of their pAs relative to the coding region, they were named proximal pA (Prx-pA) isoform and distal pA (Dis-pA) isoform, respectively (illustrated in Fig. 1b). Using relative expression (RE) values between Prx-pA and Dis-pA isoforms (Fig. 1b), we found that the relative expression of Dis-pA isoforms to Prx-pA isoforms were significantly lower in 4–6 week samples than in 1–3 week samples (Fig. 1c). Consistently, 1–3 week samples could be separated from 4–6 week samples using cluster analysis based on RE values (Fig. 1c).

We next applied significant analysis of alternative polyadenylation (SAAP), our recently developed method to statistically identify regulated APA events using a bootstrapping approach [26], and global analysis of alternative polyadenylation (GAAP), a method for comparing the extent of APA regulation between samples using a read sampling method [26]. With sequencing depth controlled across all samples (1.5 M) and false discovery rate (FDR) set at 5 %, we found that 3’UTRs began to shorten between 3w and 2w, and significant shortening took place between 4w and 3w as well as between 5w and 4w, as indicated by the ratio of number of genes with shortened 3’UTRs (Sh) to that with lengthened 3’UTRs (Le) (Fig. 1d, right). While 4w vs. 3w and 5w vs. 4w were similar in the extent of regulation (log2(Sh#/Le#) = 4.4 and 4.3, respectively), the former involved much more APA events than the latter (by ~4-fold, Fig. 1d, left). To examine 3’UTR length changes more directly, we calculated the 3’UTR size for each gene in each sample using weighted mean of all 3’UTR-APA isoforms, with weight being the read number in the sample. In line with other analysis results, the median 3’UTR size decreased progressively from 949 nt at 1w to 431 nt at 6w (Fig. 1e). An example gene Eif4h is shown in Fig. 1f, which had two detected pA isoforms and displayed substantial 3’UTR shortening in spermatogenesis (weighted mean of 3’UTR size = 1,470 nt and 239 nt at 1w and 6w, respectively). Consistent with the global trend, the period between 3w and 4w involved a switch-like change of isoform expression (Fig. 1f). This is also supported by the relative expression difference (RED) value, which is the difference between RE values from two adjacent time points (Fig. 1f).

There are several cell types in the testis, including spermatogonia, spermatocytes, spermatids, mature sperms and Sertoli cells. Since 2w testes are highly enriched with spermatocytes and 4w testes with spermatids, our result suggests significant 3’UTR shortening during maturation of spermatocytes into spermatids. To confirm cell specificity of APA, we analyzed a strand-specific RNA-seq data set with different cell types purified from the testis [30]. Because RNA-seq data do not directly reveal APA isoforms, we used our 3’READS data to divide 3’UTRs into two portions (Fig. 2a): the region before the first pA, named common UTR (cUTR), and the rest of 3’UTR, named alternative UTR (aUTR). We then calculated the ratio of RNA-seq read density in aUTR to that in cUTR, shown as log2(aUTR/cUTR), to reflect the expression level of long 3’UTR isoforms relative to the short 3’UTR isoforms (Fig. 2a). Consistent with the 3’READS data, the median log2(aUTR/cUTR) ratio progressively decreased from spermatogonia to spermatocytes to spermatids (Fig. 2b). The value for Sertoli cells was the highest, indicating much longer 3’UTRs in these cells. An example gene Cep57 is shown in Fig. 2c, which displayed fewer RNA-seq reads mapped to the downstream region of the first 3’UTR pA in spermatids compared to other cell types (Fig. 2c, bottom). Note that the 3’READS data provided a much more precise picture of pAs and their regulation (Fig. 2c, top), attesting to the superiority of the method over a regular RNA-seq method for APA analysis. Taken together with the 3’READS result, these data indicate that 3’UTRs progressively shorten during spermatogenesis, with the shortest 3’UTRs being expressed in spermatids.

Fig. 2 3’UTRs are the shortest in spermatids. a Schematic showing analysis of 3’UTR-APA using RNA-seq reads. Three 3’UTR-APA isoforms are shown. The common 3’UTR portion is called cUTR, and the alternative portion aUTR. RNA-seq reads mapped to aUTR was normalized to those mapped to cUTR to infer the relative expression of long vs. short 3’UTR isoforms. b Cumulative distribution function (CDF) curves of log2Ratio for RNA-seq reads mapped to aUTRs vs. those to cUTRs in different purified cells from testis. Median values are shown on the right. The data set used [NCBI GEO:GSE43717] was based on a strand-specific RNA-seq method. c An example gene Cep57. The 3’READS data indicating 3’UTR-pA sites are shown on the top and RNA-seq data are shown at the bottom. Weighted mean of 3’UTR size based on the 3’READS data is shown for each time point Full size image

3’UTR shortening is coupled with upregulation of gene expression that is important for sperm maturation

We next asked whether 3’UTR shortening in spermatogenesis is related to gene expression changes. To this end, we first summed up all 3’UTR-APA isoforms to represent gene expression and examined changes between adjacent time points. Compared to genes without 3’UTR changes, genes with shortened 3’UTRs were significantly upregulated in spermatogenesis (blue vs. gray lines in Fig. 3a). This difference could be observed between any adjacent time points after 2 weeks, with 3w vs. 2w and 4w vs. 3w being the most significant ones (P <2 × 10−16, Kolmogorov–Smirnov, or K–S, test). To corroborate this result, we analyzed a RNA-seq data set, for which RNAs from testes were sequenced by a strand-specific method [44]. We used only reads mapped to CDS to eliminate the possible influence of 3’UTR change on gene expression calculation (Fig. 3b). Consistent with the 3’READS result, genes with shortened 3’UTRs were significantly upregulated from 15 dpp to 40 dpp, as compared to genes without 3’UTR regulation or with lengthened 3’UTRs (Fig. 3b).

Fig. 3 Genes with shortened 3’UTRs are more likely to have upregulated expression. a Gene expression changes at different stages of spermatogenesis. Genes were divided into three groups based on 3’UTR regulation between comparing samples (FDR = 5 %, SAAP analysis), namely shortened, unchanged and lengthened (shown in blue, gray and red, respectively). 3’READS data were used for the analysis, with all APA isoforms of a gene combined to represent the overall expression of the gene. The median value for each group is indicated by a dotted vertical line. P values (K–S test) indicating difference in expression between genes with shortened or lengthened 3’UTRs and genes with 3’UTRs unchanged are shown in each graph (in blue or red, respectively). b Gene expression changes between different stages of spermatogenesis based on RNA-seq reads [NCBI GEO:GSE42004] mapped to coding sequences (CDS). Because only CDS reads were used, gene expression analysis was not affected by 3’UTR changes. As in (a), genes were divided into three groups based on APA regulation using 3’READS data with the closest time points. Dpp, day post partum. c Gene expression levels (log2(RPKM)) for genes with shortened, unchanged or lengthened 3’UTRs at different stages of spermatogenesis. The plot is based on the RNA-seq data used for (b). 3’UTR regulation is based on 4w vs. 2w comparison. d Testis-specific genes tend to have greater 3’UTR shortening than ubiquitously expressed genes. The number of genes for each group is indicated. The P value (K–S test) indicates difference in relative expression difference (RED, 4w vs. 2w, see Fig. 1f for definition) between two groups Full size image

Interestingly, we found that the absolute expression level, as indicated by CDS read density (log2(RPKM)), of the genes with 3’UTR shortening was significantly higher than that of genes without 3’UTR changes, particularly at later stages of spermatogenesis (P = 0 for 28 dpp and 40 dpp samples, K–S test, Fig. 3c), suggesting that genes with shortened 3’UTRs are important for functions in spermatogenesis. Consistently, the top gene ontology (GO) terms associated with genes with 3’UTR shortening were highly relevant to sperm maturation, such as “protein ubiquitination”, “calcium ion import”, “centrosome”, “ciliary part”, etc. (Table 1). Notably, the protein ubiquitination pathway, which has been implicated in playing key roles in spermiogenesis, such as nucleosome removal and morphogenesis of the sperm [51], was also found to be highly significant by the ingenuity pathway analysis (Additional file 1: Table S1). In further support of the functional relevance of APA to sperm development, we found that testis-specific genes tend to have significantly greater 3’UTR shortening (4w vs. 2w) than genes ubiquitously expressed across different tissues (P = 4.1 × 10−13, K–S test, Fig. 3d). Taken together, these data indicate that 3’UTR shortening coordinates with upregulation of genes that play important roles in sperm maturation.

Table 1 Gene ontology terms enriched for genes with significant 3’UTR shortening in spermatogenesis Full size table

3’UTR shortening correlates with high RNA polymerase II signals and open chromatin state

We next asked whether the regulated 3’UTR-APA sites had certain sequence and genomic features different from other sites. Interestingly, we noticed that genes with shortened 3’UTRs tended to have a significantly shorter cUTR and a longer aUTR than those with lengthened or unchanged 3’UTRs (P <1 × 10−15, K–S test, Fig. 4a). In addition, by dividing genes into five groups based on aUTR size (Fig. 4b), we found that the longer the aUTR the greater the extent of 3’UTR shortening, as indicated by the mean RED values of all genes. This trend was particularly noticeable for the 4w vs. 3w comparison (Fig. 4b), with P = 1.3 × 10−41 (Wilcoxon rank-sum test) for the difference between gene bins 1 (with the shortest aUTRs) and 5 (with the longest aUTRs). Since the aUTR size is relevant to competition between two 3’UTR pAs for usage [26], this result suggests that the 3’ end processing activity is generally regulated in spermatogenesis. Previous studies showed that the 3’UTR length inversely correlates with expression of proliferation genes [21] and C/P-related factors [20]. However, no obvious global change of expression was observed with either group in spermatogenesis (Additional file 1: Figure S1), suggesting that trans factor expression as a whole may not be the reason for APA regulation.

Fig. 4 Shortening of 3’UTR by APA correlates with high transcriptional activity and open chromatin. a cUTR and aUTR sizes of genes with different 3’UTR-APA regulations. Left, boxplot showing cUTR size of genes with shortened (Sh), lengthened (Le), or unchanged (Uc) 3’UTRs. Right, same as the plot on the left except that aUTR size is plotted. 3’UTR regulation by APA was based on the SAAP analysis (4w vs. 2w, FDR = 5 %). P values comparing different gene groups were based on the K–S test. b Genes with longer aUTRs tend to have greater 3’UTR shortening in spermatogenesis. Genes are divided into five bins based on aUTR size, as indicated. The mean RED values between adjacent time points of all genes are plotted. P values (Wilcoxon rank-sum test) comparing gene bins 1 and 5 are shown. c Log2Ratio of RNAPII ChIP-seq levels between spermatids and pachynema (pachytene stage spermatocytes) for three groups of genes, namely 3’UTR shortened, unchanged and lengthened, around the TSS (+/− 1 kb, left), in gene body (middle) and around the last pA (+/− 1 kb, right). 3’UTR regulation was based on comparison of 4w and 2w samples. P values (K–S test) comparing genes with 3’UTR shortened (blue) or lengthened (red) with 3’UTR unchanged are indicated. d Log2Ratio of H3K4me3 ChIP-seq levels between spermatids and spermatocytes for three groups of genes, namely 3’UTR shortened, unchanged and lengthened, around the TSS (+/− 1 kb, left), in gene body (middle) and around the last pA (+/− 1 kb, right). 3’UTR regulation was based on comparison of 4w and 2w samples. P values (K–S test) comparing genes with 3’UTR shortened (blue) or lengthened (red) with 3’UTR unchanged are indicated Full size image

We next examined cis elements near the regulated pAs. Because proximal and distal pAs are surrounded by distinct cis elements [9, 52], we analyzed proximal and distal pAs separately (Additional file 1: Figure S2A). As such, upregulated proximal pAs of genes with shortened 3’UTR were compared with proximal pAs of other genes, and so were downregulated distal pAs. Overall, very few 4-mers were found to be significantly enriched in the −40 to +100 nt region around the regulated pAs (Additional file 1: Figure S2B). The top ones were UUGU and UUUU in the −40 to −1 nt region and the +1 to +100 nt region of distal pAs, respectively (Additional file 1: Figure S2B). More significant 4-mers were found in the −100 to −41 nt regions of proximal pAs, such as CGAC, AAGA and CACC, and of distal pAs, such as UAUU, UUUU and AUUU (Additional file 1: Figure S2B). However, their significance of enrichment is substantially lower than what we previously observed with APA regulation by C/P factors [26], and they appear to be related to mRNA stability regulation (see below). Thus, while it remains possible that some cis elements near the pA may regulate pA usage through binding to certain RBPs, this result does not support a global, cis element-based APA mechanism in spermatogenesis.

Prompted by the correlation between upregulation of gene expression and 3’UTR shortening (Fig. 3), we next asked whether APA changes in spermatogenesis were related to transcriptional regulation, as we previously observed across different tissue types and cell conditions [53]. To this end, we analyzed a chromatin immunoprecipitation and deep sequencing (ChIP-seq) data set for RNA polymerase II (RNAPII) [54], which included data for spermatids and pachytene stage spermatocytes (Additional file 1: Figure S3A). We found that genes with shortened 3’UTRs tended to have significantly higher RNAPII signals around the transcription start site (TSS), in the gene body and around the last pA, as compared to genes without 3’UTR-APA changes (P = 2.6 × 10−9, 2.0 × 10−12 and 5.7 × 10−5, respectively, K–S test, Fig. 4c). This result indicates that 3’UTR shortening is coupled with transcriptional upregulation in spermatogenesis.

We next asked whether chromatin structure, which is substantially remodeled in spermatogenesis and has been implicated in regulation of RNAPII activities [30], was related to 3’UTR-APA changes. To this end, we analyzed a ChIP-seq data set for H3K4 tri-methylation (H3K4me3) levels in spermatids and spermatocytes [32]. As expected, H3K4me3 levels were high around the TSS and downstream of the last pA (Additional file 1: Figure S3B). Importantly, the H3K4me3 level was significantly higher in spermatids vs. spermatocytes for genes with significant 3’UTR shortening than other genes (Fig. 4d) around the TSS, in the gene body and around the last pA (P = 7.3 × 10−12, 5.8 × 10−63 and 4.7 × 10−8, respectively, K–S test), indicating that APA regulation correlates with the H3K4me3 level. Since high H3K4me3 levels represent open chromatin state, this result suggests that chromatin structure change may lead to more efficient 3’ end processing, resulting in preferential usage of proximal pAs in 3’UTRs.

3’UTR shortening eliminates destabilizing elements that are highly potent in spermatogenesis

Because 3’UTR is important for mRNA metabolism, we next wanted to examine how 3’UTR shortening in spermatogenesis impacts cis elements in 3’UTRs. To this end, we first examined cis elements in shortened and lengthened 3’UTRs. To simplify analysis, we focused on two most abundant 3’UTR isoforms per gene and examined cis elements in cUTRs and aUTRs. As such, for each gene, the short isoform contained cUTR and the long isoform contained both cUTR and aUTR. Using APA events regulated between 4w and 2w samples (Fig. 5a) and 4-mers as indicators of cis elements, we found significant enrichments of U-rich, AU-rich and UG-rich elements in the aUTRs of shortened 3’UTRs, such as UUUU, AUUUU, UUUA, UUUG and UUGU (Fig. 5b). By contrast, the significance of 4-mers enriched in cUTRs of shortened 3’UTRs was rather modest, with the top 4-mers being CCAC, AAGA, CUCG, GCCG and GGAC (Fig. 5b). Interestingly, the cUTRs and aUTRs of lengthened 3’UTRs were also enriched with different 4-mers, including UG-rich and U-rich elements for cUTRs, such as UUGU, UGUU, UUUG, GUUU and UUUU; and G-rich elements, such as GCGG, CGGG and CAGG, for aUTRs (Fig. 5c). These results indicate that cis elements are highly biased in different portions of regulated 3’UTRs, suggesting potential impacts of 3’UTR cis elements on the APA profile.

Fig. 5 3’UTR cis elements contribute to mRNA abundance changes and APA profiles in spermatogenesis. a Scatterplot showing genes with lengthened, shortened and unchanged 3’UTRs between 4w and 2w samples. Significant 3’UTR-APA events were based on the SAAP analysis (FDR = 5 %) using the top two most abundant 3’UTR-pA isoforms of each gene. b Significant 4-mers enriched in cUTRs and aUTRs of genes with shortened 3’UTRs. Values are − log 10 (P), where P was based on the Fisher’s exact test examining the enrichment of 4-mers in cUTR (left) or aUTR (right) regions of genes with shortened 3’UTRs (4w vs. 2w). c As in (b) except that genes with lengthened 3’UTRs were analyzed. d Top, schematic showing cis element analysis using genes with only one pA, i.e., having a single 3’UTR (sUTR). Middle, upregulated and downregulated sUTR genes in the 4w vs. 2w comparison, corresponding to >1.4 fold change in RPM value, were selected for 3’UTR analysis. Bottom, top enriched 4-mers for sUTRs of downregulated genes (left) or upregulated genes (right). As in (b), values are − log10(P), where P was based on the Fisher’s exact test examining 4-mer enrichment. e Comparison of 4-mer enrichment in cUTR as calculated in (b) and (c) with that in sUTR as calculated in (d). X-axis values are − log 10 (P)*S, where P is the 4-mer enrichment P value for upregulated vs. downregulated genes, and S is 1 if a 4-mer is more enriched for sUTRs of upregulated genes, or −1 otherwise. Y-axis values are − log 10 (P)*S, where P is for enrichment of 4-mer in cUTRs of shortened or lengthened 3’UTRs, whichever is greater, and S is 1 if a 4-mer is more enriched for cUTRs of shortened 3’UTRs or −1 otherwise. f As in (e) except that y-axis is for enrichment of 4-mers in aUTRs of shortened or lengthened 3’UTRs Full size image

To address if 3’UTR cis elements can regulate mRNA abundance in spermatogenesis, we selected genes with a single 3’UTR, or sUTR. As such, their expression could not be affected by APA. Interestingly, U-rich and UG-rich elements, such as UUUU, GUUU and UUGU, were significantly enriched in the sUTRs of downregulated genes (4w vs. 2w, Fig. 5d), indicating that these elements correlate with transcript abundance changes. Since these elements have previously been shown to play roles in mRNA stability [55, 56], it is likely that there exist potent mechanisms during spermatogenesis that degrade mRNAs containing these elements.

We next used the significance score (SS, derived from P value from the Fisher’s exact test, see Methods for detail) for each 4-mer to indicate its significance of enrichment in one set of sequences vs. another. We found that the SS values of 4-mers for sUTR regulation were positively correlated with those derived from cUTR analysis (r = 0.65, Pearson correlation, Fig. 5e), i.e., those associated with downregulated sUTR genes were also enriched in cUTRs of lengthened 3’UTRs, and those with upregulated sUTR genes were also enriched in cUTRs of shortened 3’UTRs (top 4-mers were highlighted in red in Fig. 5e). By contrast, a negative correlation was observed between the SS values of 4-mers associated with sUTR gene expression and those derived from aUTR analysis (r = −0.49, Pearson correlation, Fig. 5f). Overall, U-rich UG-rich and UA-rich 4-mers were highly enriched for sUTRs of downregulated genes, cUTRs of lengthened genes and aUTRs of shortened genes (Fig. 5e and f). It is also noteworthy that analyses using 6-mers gave very similar results (Additional file 1: Figure S4). Taken together, these results indicate that cis elements in 3’UTRs play a significant role in regulating mRNA abundance in spermatogenesis, likely through control of mRNA stability, and hence contribute to the APA profiles. Conversely, our data also suggest that shortening of 3’UTR can significantly remove destabilizing elements, impacting gene expression.

3’UTR shortening substantially removes transposable elements in the transcriptome

Recent studies have indicated that pachytene piRNAs can degrade target mRNAs through imperfect base-pairing [45]. This mechanism has been implicated in degradation of transcripts containing transposable elements (TEs). We found that TE sequences detected by the RepBase database [57, 58] accounted for a sizable fraction of the 3’UTR repertoire in the mouse genome: for genes with a single pA, TEs accounted for 12 % of their 3’UTRs; for genes with APA, TEs accounted for 6 % and 16 % of their cUTR and aUTR sequences, respectively (Fig. 6a). Consistently, 3’UTR shortening in spermatogenesis led to ~5-fold decrease of the total number of transcripts with TEs (Fig. 6b). Using genes with a single pA, we found TEs in 3’UTRs could lead to significant downregulation of mRNA transcripts between 3w and 2w (P = 6 × 10−7, K–S test) and between 4w and 3w (P = 1 × 10−11, K–S test) (Fig. 6c), confirming the negative impact of 3’UTR TEs on mRNA expression.

Fig. 6 3’UTR shortening eliminates TEs. a TE contents in 5’UTR, CDS and 3’UTR of all genes. aUTR, alternative 3’UTR; cUTR, common 3’UTR; sUTR, single 3’UTR (gene without APA). The fraction of mRNA sequence related to TEs was based on the number of nucleotides of the TEs annotated by the RepeatMasker track of UCSC Genome Browser (mm9). b Change of TE-containing mRNAs in spermatogenesis. The fraction of the transcripts containing 3’UTR TEs at each time point is plotted. c Gene expression changes for genes with or without 3’UTR TEs. Only genes with a single 3’UTR (sUTR) were used for this analysis. Expression was based on the 3’READS data. P values (K–S test) indicating difference between the two gene groups are shown. d Expression changes of four types of genes. A type 1 gene has 3’UTR shortened and contains TEs only in aUTR; a type 2 gene also has 3’UTR shortened but contains TEs in cUTR only; a type 3 gene has 3’UTR unchanged and contains TEs in aUTR; a type 4 gene has a single 3’UTR (no APA) and there are TEs in the 3’UTR. All APA regulation was based on the 4w vs. 2w comparison. Bottom, gene expression was analyzed using 3’READS data (4w vs. 2w, left), or CDS reads of RNA-seq data (28 dpp vs. 15 dpp, right). P values (K–S test) indicating difference between type 1 genes and others are indicated on the top. e Gene expression difference in Miwi−/− vs. Miwi+/− at the early round spermatid stage for the four gene types described in (d) except that the APA analysis was based on 3w vs. 1w comparison. Gene expression was based on RNA-seq reads mapped to CDS. The type 1 gene set was compared with other types using the K–S test. P values are all significant (<1 × 10−3) Full size image

To examine how 3’UTR shortening impacts TE-mediated mRNA degradation, we divided genes into four groups based on both 3’UTR regulation between 4w and 2w and TE location in the 3’UTR (Fig. 6d, top): 1) genes with 3’UTRs shortened and with TEs in aUTR (TEs can be eliminated by 3’UTR shortening); 2) genes with 3’UTRs shortened and with TEs in cUTR (TEs cannot be eliminated despite 3’UTR shortening); 3) genes with unchanged 3’UTR and with TEs in aUTR (no APA regulation and TEs cannot be eliminated); 4) genes with a single 3’UTR which contains TEs (no APA and TEs cannot be eliminated). Both 3’READS data and RNA-seq data showed that genes with shortened 3’UTRs and with TEs in aUTR (group 1) tended to have significantly higher expression levels than genes in other groups between 4w and 2w (P <1 × 10−3, K–S test, Fig. 6d, bottom). This result suggests that elimination of TEs in 3’UTRs by 3’UTR shortening leads to higher transcript abundance.

To further address whether the TE-mediated gene regulation is due to mRNA degradation by the piRNA/Miwi pathway, we analyzed RNA-seq data from Miwi−/− and Miwi−/+ mice [44]. Focusing on the four groups of genes described above, we found that group 1 genes were significantly less activated in Miwi−/− vs. Miwi−/+ than the other three groups of genes (P <0.05, K–S test, Fig. 6e), indicating that downregulation of gene groups 2–4 in wild type mice were mediated by Miwi. This result further supports the notion that 3’UTR shortening helps avoid TE/piRNA/Miwi-mediated mRNA degradation in spermatogenesis. It is also notable that TE regulation was independent of the U-rich elements described above, because U-rich elements could still be detected after excluding TE sequences in analysis (Additional file 1: Figure S5A) and TE-mediated regulation identified by the Miwi−/− vs. Miwi−/+ comparison did not involve U-rich elements (Additional file 1: Figure S5B).

Widespread regulation of C/P events in introns

Over ~40 % of mouse genes express mRNA isoforms using pAs in introns or exons upstream of the 3’-most exon [12] (illustrated in Fig. 7a). These APA events are commonly called CDS-APA because they lead to APA isoforms with different CDS. Regulation of CDS-APA can be controlled by both splicing and C/P activities [26]. Comparing CDS-pA isoform expression with that of 3’UTR-APA isoforms, we found samples could be clustered into two groups (Fig. 7b), i.e., 1–3 weeks and 4–6 weeks, similar to using the 3’UTR-APA data (Fig. 1c). Using SAAP and GAAP analyses, we found that both 3w vs. 2w and 4w vs. 3w comparisons involved a large number of regulated CDS-APA events. Interestingly, while similar numbers of genes displayed upregulated and downregulated isoforms in 3w vs. 2w, a much greater number of genes showed upregulation of CDS-pA isoforms in 4w vs. 3w, indicating distinct mechanisms involved in CDS-APA in these two phases. Consistently, we found that intronic pAs close to the 5’ end of a gene were more likely to be activated in 4w vs. 3w (Fig. 7d), a trend not observed with 3w vs. 2w (Fig. 7d). In addition, gene groups regulated by CDS-APA in 4w vs. 3w were associated with distinct GO terms than those with CDS-APA regulated in 3w vs. 2w (Table 2). By contrast, other features, including intron size, 5’ splice site strength, or 3’ splice site strength, did not appear to be significantly associated with CDS-APA regulation at either stage (Additional file 1: Figure S6).

Fig. 7 Regulation of CDS-APA in spermatogenesis. a Schematic of CDS-APA. CDS-APA isoforms are those using pAs in introns or non-3’-most exons. b Heatmap showing relative expression (RE) values of CDS-APA isoforms vs. 3’-most exon isoforms. RE values are mean-centered and clustered using hierarchical clustering with Pearson correlation coefficient as metric. c Normalized number of genes with significant regulation of CDS-APA as identified by GAAP and SAAP analyses (FDR = 5 %) (see Methods for details). The ratio of number of genes with upregulated (UP) CDS-APA isoforms to that with downregulated (DN) isoforms is indicated. d Regulation of isoforms using pAs in different introns. Introns were divided into five groups, i.e., the first and second introns (+1 and +2, respectively), the last and second to last introns (−1 and −2, respectively) and middle introns (M). The expression change of isoform is based on RPM values. e Gene expression changes vs. CDS-pA regulation. Genes were divided into three indicated groups based on CDS-APA regulation between comparing samples (SAAP analysis, FDR = 5 %). The median value for each group is indicated by a dotted vertical line. The P value (K–S test) for difference between genes with upregulated or downregulated CDS-APA and those with unchanged CDS-APA is shown in each graph (in red or blue, respectively). RNA-seq data with CDS reads were used for gene expression analysis. f ChIP-seq analysis of H3K4me3 levels on genes with CDS-APA regulation. Log2Ratio of ChIP-seq levels between spermatids and spermatocytes for the three gene groups is shown. CDS-APA regulation was based on comparison of 4w and 2w samples, corresponding to spermatid and spermatocyte stages, respectively. P value (K–S test) comparing genes having downregulated (blue) or upregulated (red) CDS-pA isoforms with genes having CDS-APA unchanged is indicated Full size image

Table 2 Gene ontology terms associated with genes with regulated CDS-APA in spermatogenesis Full size table

Compared to genes with unchanged CDS-APA, genes with downregulated CDS-APA tended to be significantly upregulated (P = 2 × 10−155 and 2 × 10−21 for 3w vs. 2w and 4w vs. 3w, respectively, K–S test, Fig. 7e), whereas genes with upregulated CDS-APA tended to be either mildly downregulated in 3w vs. 2w, or unchanged in 4w vs. 3w (Fig. 7e). In addition, genes with downregulated CDS-APA tended to have increased H3K4me3 signals around the TSS and in the gene body (Fig. 7f), whereas those with upregulated CDS-APA tended to have mildly decreased levels, as compared to genes with unchanged CDS-APA (Fig. 7f). Thus, the usage of upstream pAs appears to be inhibited when gene expression is upregulated and chromatin becomes more open, a trend that is opposite to the usage of proximal pA in 3’UTRs (see above). Whether this difference is due to the involvement of splicing activity, which plays a major role in CDS-APA [26], remains to be examined in the future.

Significant activation of bidirectional transcription in spermatogenesis

A large fraction of RNAPII promoters in mammalian cells are bidirectional, leading to both sense and antisense transcripts (reviewed in [59, 60] and illustrated in Fig. 8a). The antisense transcripts upstream of the TSS are typically called PROMPTs or uaRNAs. Recent studies have indicated that C/P is involved in termination of uaRNAs [28, 29], and PABPN1 and the exosome complex regulate their abundance [26]. Consistent with our previous data using mouse C2C12 cells [26], we found uaRNA pAs in testis samples were widely distributed within 2 kb from the TSS, peaking around −700 nt (Fig. 8b). Importantly, their expression was significantly upregulated during spermatogenesis, with the most significant phase for upregulation being 3w vs. 2w (Fig. 8b). We found that there was a general correlation between uaRNA expression changes and regulation of their sense strand transcripts (Fig. 8c), with r = 0.30 and r = 0.36 (Pearson correlation) for 3w vs. 2w and 4w vs. 3w, respectively. Further analysis of the H3K4me3 ChIP-seq data showed that the TSS regions of genes with upregulated uaRNAs had much higher H3K4me3 signals in both spermatocytes and spermatids (P = 5 × 10−206 and 3 × 10−153, respectively, Wilcoxon rank-sum test). However no difference could be discerned between spermatocytes and spermatids (data not shown). This result indicates that open chromatin may be a prerequisite for uaRNA regulation, but the extent of its regulation is governed by other factors, possibly the activity of the corresponding bidirectional promoter.