cis-Encoded antisense RNAs (asRNAs) are widespread along bacterial transcriptomes. However, the role of most of these RNAs remains unknown, and there is an ongoing discussion as to what extent these transcripts are the result of transcriptional noise. We show, by comparative transcriptomics of 20 bacterial species and one chloroplast, that the number of asRNAs is exponentially dependent on the genomic AT content and that expression of asRNA at low levels exerts little impact in terms of energy consumption. A transcription model simulating mRNA and asRNA production indicates that the asRNA regulatory effect is only observed above certain expression thresholds, substantially higher than physiological transcript levels. These predictions were verified experimentally by overexpressing nine different asRNAs in Mycoplasma pneumoniae. Our results suggest that most of the antisense transcripts found in bacteria are the consequence of transcriptional noise, arising at spurious promoters throughout the genome.

Keywords

There is an ongoing discussion in both eukaryotes and prokaryotes as to what extent this plethora of sRNAs provides a crucial layer of transcriptional and translational regulation, or if a large part of them are the result of transcriptional noise, arising from spurious promoters ( 17 , 18 ). Bacterial promoters are characterized by low information content, and their major landmark is the Pribnow motif that has the consensus sequence “5′-TANAAT-3′” ( 19 ). Other features include (i) the −35 box, although this has been shown not to be essential (especially in Firmicutes) and can be replaced by other elements ( 20 ), and (ii) low melting energies, which ultimately depend on the AT composition of the promoter region. Such low information content implies that promoters could easily arise by random mutations in bacterial genomes, especially given the presumptive bias toward G/C nucleotides mutating to A/T ( 21 ). If sRNAs are the product of transcriptional noise due to spurious 5′-TANAAT-3′ boxes, we predict that the number of sRNAs in bacteria will strongly correlate to the AT content of their genomes in an exponential manner (fig. S2A). Because of the stochastic nature of transcription and the short half-life of RNAs in bacteria, low levels of random production of asRNA from these spurious Pribnow boxes would not affect the levels of the sense mRNA (fig. S1B).

The catalog of bacteria-encoded RNAs has recently undergone a vast expansion. The canonical mRNAs and known noncoding RNAs [ribosomal RNAs (rRNAs), transfer RNA (tRNAs), transfer mRNA (tmRNA), and others] are now accompanied by a handful of new transcript categories. Small, non–protein-coding RNAs or sRNAs are one of these new categories. The numbers of initially reported sRNAs ranged from dozens to hundreds in different species ( 1 , 2 ). These include cis-encoded sRNAs, which overlap functionally defined genes, either in sense or antisense (thus named asRNAs), and trans-encoded sRNAs, which are separated from their target genes. These sRNAs span a wide range of lengths: from dozens of to a few thousand base pairs ( 2 ). However, recent improvements in techniques for analysis of transcription have revealed that noncoding transcription in prokaryotes is pervasive through the genome ( 3 – 5 ). Still, only few sRNAs have been functionally characterized ( 6 – 8 ), most of which correspond to the category of trans-encoded sRNAs. Examples of these are the ones associated with bacterial virulence ( 9 – 11 ). The most common mechanism of action of sRNAs is via complementary base pairing with coding sequences (fig. S1A). RNA duplex formation between sRNA and mRNA can change mRNA stability, inducing degradation or stabilization of the duplex. This duplex may as well induce or repress mRNA translation by affecting the ribosome binding site ( 2 , 12 ). Another asRNA regulatory mechanism is transcriptional interference, occurring if two RNA polymerases transcribing in convergent directions collide ( 13 ). Other types of RNA having a regulatory role by “nonstandard” mechanisms should not be disregarded. For instance, if there was a Dicer-like mechanism in bacteria as it occurs in eukaryotes ( 14 ), low abundant RNAs could exert a strong influence on complementary, more abundant, mRNAs. In this respect, we have the CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system in bacteria, where crRNAs (CRISPR RNAs), even if not abundant, target the enzyme against foreign DNA ( 15 ) and/or RNA sequences ( 16 ).

RESULTS

To investigate these hypotheses, we annotated sRNAs de novo in the genomes of Buchnera aphidicola, Mycoplasma hyopneumoniae, and Mycoplasma mycoides subspecies capri (tables S1 to S3 and fig. S3) in a similar way as we did with Mycoplasma pneumoniae (22). We also considered the sRNAs annotated using deep sequencing data in 17 other bacterial genomes and a chloroplast genome (table S4). These 21 genomes span an AT content ranging from 28 to 80%, and their genome sizes range from 416 kb (B. aphidicola Cc) to 9.02 Mb (Streptomyces avermitilis). Investigating the number of canonical Pribnow boxes in these genomes, we found an exponential dependency of the number of boxes on the AT content, qualitatively similar to our theoretical expectations (fig. S2A). Moreover, comparison of the number of these boxes upstream of open reading frames (ORFs) and sRNAs showed that the proportion of sRNAs with Pribnow boxes is similar to or higher than the proportion of ORFs having them (fig. S2B). This supports the hypothesis that an increase in AT content also results in an increase in spurious Pribnow boxes.

We found that the number of sRNAs normalized by genome size versus the AT content in the studied bacterial species has a clear exponential dependency (Fig. 1A), similar to that of the number of TANAAT motifs randomly expected given a certain AT% (fig. S2A). The exponential trend observed for the sRNAs is conserved, omitting the species whose sRNAs were de novo annotated (R2 = 0.814), indicating that it is not an artifact of the method used to identify them (see fig. S3 and Materials and Methods). In contrast to the observed sRNA trend, the number of coding genes normalized by genome size shows no dependency on AT content, and this trend is invariant with respect to genome size (Fig. 1B). We tested whether the AT dependency held true for both asRNAs and trans-encoded sRNAs. asRNAs follow an exponential dependency on the AT content (fig. S4A), whereas trans-encoded sRNAs behave similarly to coding genes and are uncorrelated to the AT content of the intergenic regions (even when considering a minimal size larger than that of an average asRNA; fig. S4B). These results support the transcriptional noise hypothesis, and that random mutations in coding genes could result in spurious antisense 5′-TANAAT-3′ boxes, in a manner related to the genome AT content, which could drive the expression of asRNAs.

Fig. 1 Different genomic features show distinct dependency on the genomic AT content. The number of features was divided by the genome size for normalization and represented versus the genomic AT content. The following genomes are represented: Atu, Agrobacterium tumefaciens; Bcc, Buchnera aphidicola (str Cc); Bsu, Bacillus subtilis; Cgl, Corynebacterium glutamicum; Chl, chloroplast (Arabidopsis thaliana); Cje, Campylobacter jejuni; Eco, Escherichia coli; Hpy, Helicobacter pylori; Mge, Mycoplasma genitalium; Mhy, Mycoplasma hyopneumoniae; Mmy, Mycoplasma mycoides; Mpn, Mycoplasma pneumoniae; Mtu, Mycobacterium tuberculosis; Pau, Pseudomonas aeruginosa; Sav, Streptomyces avermitilis; Sco, Streptomyces coelicolor; Sme, Sinorhizobium meliloti; Sth, Salmonella typhimurium; Sve, Streptomyces venezuelae; Syn, Synechocystis spp., Vch, Vibrio cholerae. (A) Number of total sRNAs in different bacteria. Total sRNAs have an exponential dependency on the AT content (R2 = 0.88) and do not correlate with genome size. (B) Genome compaction (that is, number of ORFs normalized by genome size) versus AT content. Genome compaction in the different bacterial genomes analyzed shows no dependency on the AT content. Instead, the number of ORFs in bacterial genomes correlates with the genome size (R = 0.99).

Regarding expression levels, it has been shown that essential ORFs show higher mRNA levels, suggesting that elements with essential roles are more transcribed (23). Therefore, we compared transcript levels of ORFs and asRNAs in eight of the bacteria in our study. In all cases, average asRNA levels were lower than average mRNA levels (fig. S5A). This could indicate that at least a majority of the asRNAs could be nonessential. Indeed, a recent study on the essentiality of the M. pneumoniae genome revealed that only 5% of all sRNAs are essential (23). We also compared the expression of each asRNA to its overlapping mRNA. asRNA-mRNA expression ratios are presented in fig. S5B. These ratios are below 1 in most of the cases (fig. S5B). For three of the species in our study (M. pneumoniae, M. mycoides, and Bacillus subtilis), we compared asRNA levels at exponential and stationary growth phases (fig. S5C). Most of the asRNAs remain unchanged, excluding the effect of the growth phase at where the bacteria were analyzed. Additionally, asRNA and trans-encoded sRNA levels were compared in five species (B. aphidicola, Mycoplasma genitalium, M. pneumoniae, M. mycoides, and M. hyopneumoniae), and we found that asRNA expression is significantly lower than trans-encoded sRNA levels in all cases (Welch’s two-sample t test, P < 0.05).

We estimated the energy consumed by the cells in transcribing these asRNAs in M. pneumoniae, considering the number of noncoding RNAs, their length, and their transcription rate, compared to those of mRNAs, tRNAs, and rRNAs (see Materials and Methods). M. pneumoniae spends ~5000 adenosine triphosphate (ATP) units per cell per second in transcribing mRNAs, tRNAs, and rRNAs (24). This amount is proportional to the transcription rate of these molecules, their length, and their copy number in the cell. Taking into account these parameters for sRNAs, we estimate that M. pneumoniae spends 2.94% of the energy of RNA transcription in synthesizing sRNAs, equivalent to ~147 ATP units per cell per second. This number represents 0.24% of the total ATP generated per cell per second (24). Thus, according to our calculations, the energetic impact of spurious transcription is not high even in bacteria with a large number of asRNAs.

asRNAs have been proposed to play a role in transcription regulation complementing the role of transcription factors (25). Should this be the case, we would expect a negative dependency with the number of transcription factors in the different bacteria analyzed here. The number of transcription factors, as reported in the P2TF database (26), shows a linear trend with genome size as previously described (27) (fig. S6A). However, this trend does not exist for asRNAs (fig. S6B). To determine if there is a negative dependency between transcription factors and asRNAs, we considered groups of genomes with approximately similar AT content and different numbers of transcription factors. We found no negative relationship between the number of transcription factors and the number of asRNAs per genome having similar AT content (>60%) (fig. S6C). For bacteria with high AT content, there is a positive correlation, contrary to what we would expect (R = 0.94). This can be explained by the fact that for this group, larger genomes present both more transcription factors and more asRNAs. Indeed, for bacteria with similar AT content, the number of asRNAs correlates with the number of genes, indicative of genome size (fig. S6D).

As we indicated in fig. S1B, asRNAs expressed at low levels could barely encounter its sense mRNA, given the stochastic nature of transcription. Therefore, no effect on mRNA half-life or translation would be expected. To see if this is the case, we constructed a mathematical model of transcription and translation of a gene in the bacterium M. pneumoniae. We modeled three possible effects of the asRNA: (i) the binding of the asRNA to the mRNA induces degradation of the duplex, (ii) the binding of the asRNA to the mRNA induces degradation of the mRNA, and (iii) the binding of the asRNA to the mRNA is stable but prevents translation (fig. S1A). In all cases, binding of the mRNA to the ribosome prevents degradation of the mRNA. Parameters for this model were determined from experimental data (see Materials and Methods). Other possible effects, such as transcriptional interference, were not considered as the low transcription rates in M. pneumoniae deem the collision of transcribing polymerases to be very unlikely. We scanned the parameter space of the mRNA and the asRNA transcription rates, from typical wild-type levels to ~100-fold overexpression (Fig. 2 and fig. S7). We found that for the three cases modeled, the region with low concentrations of both asRNA and mRNA shows no changes with respect to the control simulations. This can be explained by the fact that in this region, RNA copy numbers are below 1 per cell, and thus the chance of an mRNA and an asRNA to occur simultaneously at the same cell is negligible (fig. S1B). Remarkably, most of the RNAs in different bacteria are present at concentrations that yield no asRNA effect (28), although some exceptions have been described, showing that some asRNAs can have a regulatory role (29–31) (Fig. 2A). This mathematical model can be a valuable resource to identify putative functional asRNAs in a given organism according to their expression levels. By determining the concentrations of all asRNAs in M. pneumoniae, we can determine a list of potential functional asRNA candidates. In this bacterium, asRNAs are insufficiently expressed to trigger an effect in their overlapping mRNAs, according to our simulations. It has to be noted, though, that the values of decay rates used in these simulations represent the average values determined for M. pneumoniae. Individual transcripts with decay rates that differ significantly from the average should be analyzed on a case-by-case basis. With the adequate parameters, the model could be extended to other bacteria, given that the action mechanism of asRNAs is known beforehand.

Fig. 2 Simulation of the effect of the asRNAs, assuming that the asRNA-mRNA pairing causes duplex degradation. Parameters for the simulations are detailed in the Supplementary Materials. Each point of the heat maps represents the average change in the protein concentration for 100 simulations of 1000 min each, for specific parameters of asRNA and mRNA transcription rates. The remaining parameters remain constant for all the simulations. The axes represent the mRNA and asRNA concentration in the control experiments for the corresponding transcription rates scanned. (A) Changes in the mRNA concentration after 1000 min of simulation. Blue circles represent experimental data from the overexpression of asRNAs in M. pneumoniae, whereas green circles represent data from studies in Gram-negative bacteria (29–31). The green ellipse delimits the region of the concentrations of most transcripts in E. coli (28). (B) Changes in the protein concentration after 1000 min of simulation. Blue circles represent experimental data from the overexpression of asRNAs in M. pneumoniae.

To verify these results, we overexpressed nine asRNAs in the bacterium M. pneumoniae (up to sixfold; Fig. 2 and table S5). These asRNAs were selected such that they overlap different regions of their corresponding mRNA partners (5′ end, 3′ end, or center), to test different possible action mechanisms. Additionally, asRNAs with different expression levels were chosen. Shotgun proteomics of the clones revealed no significant changes in the protein levels of the overlapping genes (Fig. 3A and table S6). Also, RNA-seq (RNA sequencing) revealed no significant changes in the mRNA levels (Fig. 3B and table S7). Thus, our simulations and our experimental data do not support the hypothesis that asRNAs have a general regulatory role in bacteria replacing the function of transcription factors. Only in those exceptions in which both asRNA and mRNA are expressed over a certain threshold can a regulatory behavior be expected.

Fig. 3 Effect of the overexpression of asRNAs in their overlapping genes, measured by RNA-seq and shotgun proteomics. (A) Protein levels of the genes overlapping each asRNA under control conditions and in the strains transformed with the antisense constructs. Error bars represent the SD of the samples. Two of the proteins, MPN056 and MPN305, were not detected in any of the strains of M. pneumoniae. (B) mRNA levels of the genes overlapping each asRNA under control (wild-type) conditions and in the strains overexpressing the antisense transcripts. Error bars represent the SD of the samples.

Our findings support the idea that most of the asRNAs are a consequence of transcriptional noise, rather than of tightly regulated events. The distribution of asRNAs in bacteria with distinct AT content and the lack of capability of replacing transcription factors support this idea. Probably, the bias toward AT mutations in bacteria (21) generates spurious promoter sequences that are able to trigger transcription. However, spurious expression of asRNAs is not incompatible, with some being functional, as described elsewhere (1, 2, 6–8, 12). Indeed, asRNAs claimed to be functional are expressed at much higher rates than the average (28–31). Despite the observed general trend, we should not ignore that, in some bacteria, there are proteins [such as RNA chaperone Hfq (32)] that help to stabilize asRNAs or the duplexes they form with mRNAs. In such cases, even low expressed asRNAs may exert a regulatory function. Nevertheless, this protein is not conserved throughout the bacteria in our study, and although it is conserved in some species, it is not essential. Therefore, we cannot expect such a mechanism to be general but rather an adaptation for specific cases. This suggests that asRNAs may accumulate in bacterial genomes because of transcriptional noise and a lack of negative selection, probably due to the low energy needed for their transcription and the absence of deleterious effects. Some of these asRNAs may afterward gain a function. Additionally, pervasive noncoding transcription may as well have unspecific functional roles, such as buffering the RNA polymerase levels inside the bacterial cell. Our results are likely to be valid throughout the bacterial kingdom, and according to a recent study (33), they may also apply to eukaryotes.