Here, we develop an approach that combines ribosome profiling (Ribo‐seq) with quantitative RNA‐seq that enables the high‐throughput characterization of endogenous sequences and synthetic genetic parts controlling transcription and translation in absolute units. Ribo‐seq provides position‐specific information on translating ribosomes through sequencing of ribosome‐protected fragments (RPFs; approximately 25–28 nt). This allows for genome‐wide protein synthesis rates to be calculated with accuracy similar to quantitative proteomics (Li et al , 2014 ). By supplementing the sequencing data with other experimentally measured cell parameters, we generate transcription and translation profiles that capture the flux of both RNA polymerases (RNAPs) and ribosomes governing these processes. We apply our method to Escherichia coli and demonstrate how local changes in these profiles can be interpreted using mathematical models to infer the performance of three different types of genetic part in absolute units. Finally, we demonstrate how genome‐wide shifts in transcription and translation can be used to dissect the burden that synthetic genetic constructs place on the host cell and the role that competition for shared cellular resources, such as ribosomes, plays.

The past decade has seen tremendous advances in sequencing technologies. This has resulted in continuously falling costs and a growing range of information that can be captured (Goodwin et al , 2016 ). Sequencing also offers several advantages over fluorescent probes for characterizing and debugging genetic parts and circuits. Firstly, it does not require any modification of the circuit DNA. Second, it provides a more direct measurement of the processes being controlled (e.g., monitoring transcription of specific RNAs), and third, it captures information regarding the host response and consequently their indirect effects on a part's function. Recently, RNA sequencing (RNA‐seq) has been used to characterize every transcriptional component in a large logic circuit composed of 46 genetic parts (Gorochowski et al , 2017 ). While successful in demonstrating the ability to characterize genetic part function, observe internal transcriptional states, and find the root cause of circuit failures, the use of RNA‐seq alone restricts the method to purely transcriptional elements and does not allow for quantification in physically meaningful units.

Fluorescent proteins and probes are commonly used to characterize the function of genetic parts (Jones et al , 2014 ; Hecht et al , 2017 ) and debug the failure of genetic circuits (Nielsen et al , 2016 ). For circuits that use transcription rate (i.e., RNAP flux) as a common signal between components (Canton et al , 2008 ), debugging plasmids containing a promoter responsive to the signal of interest have been used to drive expression of a fluorescent protein to track the propagation of signals and reveal the root cause of failures (Nielsen et al , 2016 ). Alternatively, any genes whose expression is controlled by the part of interest can be tagged with a fluorescent protein (Snapp, 2005 ). Such modifications allow for a readout of protein level but come at the cost of alterations to the circuit. This is problematic as there is no guarantee the fluorescent tag itself will not affect a part's function (Baens et al , 2006 ; Margolin, 2012 ).

The construction of a genetic circuit requires the assembly of many DNA‐encoded parts that control the initiation and termination of transcription and translation. A major challenge is predicting how a part will behave when assembled with many others (Cardinale et al , 2013 ). The sequences of surrounding parts (Poole et al , 2000 ), interactions with other circuit components or the host cell (Cardinale et al , 2013 ; Ceroni et al , 2015 ; Gyorgy et al , 2015 ; Gorochowski et al , 2016 ), and the general physiological state of the cell (Wohlgemuth et al , 2013 ; Gorochowski et al , 2014 ) can all alter a part's behavior. Although biophysical models have been refined to capture some contextual effects (Salis et al , 2009 ; Seo et al , 2013 ; Espah Borujeni et al , 2014 ), and new types of part created to insulate against these factors (Moon et al , 2012 ; Daniel et al , 2013 ; Mutalik et al , 2013 ; Siuti et al , 2013 ; Yang et al , 2014 ; Shendure et al , 2017 ), we have yet to reach a point where large and robust genetic circuits can be reliably built on our first attempt. A crucial step toward this goal will be to better understand how the many parts of large genetic circuits function in concert. However, approaches to simultaneously measure the performance of many parts within the context of a circuit are currently lacking.

Gene expression is a multi‐step process involving the transcription of DNA into messenger RNA (mRNA) and the translation of mRNAs into proteins. To fully understand how a cell functions and adapts to changing environments and adverse conditions (e.g., disease or chronic stress), quantitative methods to monitor these processes are required (Belliveau et al , 2018 ). Gene regulatory networks (also known as genetic circuits) control when and where these processes take place and underpin many important cellular phenotypes. Recently, there has been growing interest in building synthetic genetic circuits to understand the function of natural gene regulatory networks through precise perturbations and/or creating systems de novo (Smanski et al , 2016 ; Wang et al , 2016 ).

Results

Generating transcription and translation profiles in absolute units To enable quantification of both transcription and translation in absolute units, we modified the RNA‐seq protocol and extended the Ribo‐seq protocol with quantitative measurements of cellular properties (red elements in Fig 1A). For RNA‐seq, we introduced a set of RNA spike‐ins to our samples at known molar concentrations before the random alkaline fragmentation of the RNA (left panel, Fig 1A). The RNA spike‐ins span a wide range of lengths (250–2,000 nt) and concentrations and share no homology with the transcriptome of the host cell (Appendix Fig S1). Using RNA spike‐ins with known concentrations, the mapped reads can be converted to absolute molecule counts and then normalized by cell counts to give absolute transcript copy numbers per cell (Mortazavi et al, 2008; Bartholomäus et al, 2016) (Materials and Methods). The total number of transcripts per cell was ~8,200 which correlates well with earlier measurements of ~7,800 mRNA copies/per cell using a single spike‐in (Bartholomäus et al, 2016). Similar overall copy numbers have been theoretically predicted (Bremer et al, 2003) and experimentally determined for another E. coli strain (Taniguchi et al, 2010). For Ribo‐seq, we directly ligated adaptors to the extracted ribosome‐protected fragments (RPFs) (Guo et al, 2010) to capture low‐abundance transcripts (Del Campo et al, 2015). Sequencing was also complemented with additional measurements of cell growth rate, count, and protein mass (right panel, Fig 1A). Figure 1.Overview of the workflow Major steps involved when quantifying transcription (RNA‐seq) and translation (Ribo‐seq) and the additional cellular features measured. Elements required for quantification in absolute units are highlighted in red. Model for calculating the translation initiation rate of a ribosome binding site, see equation 2. Model for calculating translation termination efficiency of a stop codon, see equation 3. Star denotes the location of the stop codon. Model for calculating translational frameshifting efficiency between two coding regions “A” and “B” in zero and −1 reading frames, respectively, see equation 4. A previous method was further developed to generate transcription profiles that capture the number of RNAPs passing each nucleotide per unit time across the entire genome (i.e., the RNAP flux). This assumes that RNA levels within the cells have reached a steady‐state (Gorochowski et al, 2017) and that all RNAs have a fixed degradation rate (0.0067/s) so that RNA‐seq data, which captures a snapshot of relative abundances of RNAs, can be used to estimate relative RNA synthesis rates (Gorochowski et al, 2017). Because each RNA is synthesized by an RNAP, these values are equivalent to the relative RNAP flux. Since mRNA degradation rates can vary significantly across the transcriptome, we relaxed this assumption by incorporating experimentally measured transcript‐specific degradation rates using previously published data (Chen et al, 2015). Finally, by using the known molar concentrations of the RNA spike‐ins and their corresponding RNA‐seq reads from our modified protocol (Appendix Fig S1), we are able to convert the transcription profiles into RNAP/s units. Existing mathematical models of promoters and terminators were then used to interpret changes in the transcription profiles and quantify the performance of these parts in absolute units. To generate translation profiles that capture the ribosome flux per transcript, we first took each uniquely mapped RPF read from the Ribo‐seq data and considering the architecture of a translating ribosome, estimated the central nucleotide of each codon in the ribosomal P site, i.e., the peptidyl‐tRNA site (Materials and Methods; Appendix Fig S6) (Mohammad et al, 2016). By summing these positions for all reads at each nucleotide x, we computed the RPF coverage N(x). If each ribosome translates at a relatively constant speed, then at a point in time the RPF coverage is proportional to the number of ribosomes at each nucleotide. This captures relative differences in ribosome flux, i.e., more heavily translated regions will have a larger number of ribosomes than lowly translated segments and so accrue a larger number of RPF reads in the Ribo‐seq snapshot. However, the translation rate of individual codons can vary causing an enrichment in RPF reads at slowly translating codons (Woolstenhulme et al, 2015). Therefore, we divide N(x) by the translation time of the codon (Fluitt et al, 2007) with a central nucleotide at position x to give the weighted RPF coverage W(x). This weighting corrects for position‐specific variations. Moreover, the approach is extendable by other factors that may cause variations in translation speed, e.g., local mRNA secondary structure (Del Campo et al, 2015; Gorochowski et al, 2015) or interaction of some nascent chain segments with the ribosomal exit tunnel (Charneski & Hurst, 2013). We next convert the weighted RPF coverage into a translation profile whose height corresponds directly to the ribosome flux across each nucleotide in ribosomes/s units. By assuming that each weighted RPF read corresponds to an actively translating ribosome which synthesizes a full‐length protein product, and that the cellular proteome is at steady‐state, then the protein copy number for gene i is given by . Here, f t is the weighted total number of mapped RPF reads, m t is the total protein mass per cell, and f i and m i are the weighted number of mapped RPF reads and the protein mass of gene i, respectively. We measured m t directly (Fig 1A) and calculated m i from the amino acid sequence of gene i (Materials and Methods). Because proteins are synthesized by incorporating individual amino acids during the translocation cycle (i.e., by ribosome translocating from the A to P site), the replication of the entire proteome requires r t = Σ i n i a i ribosome translocations, where a i is the number of amino acids in the protein encoded by gene i. Assuming that cells are growing at a constant rate with doubling time t d , then the total ribosome flux across the entire transcriptome per unit time is given by q = 3r t /t d . The factor of three accounts for ribosomes translocating at three‐nucleotide registers (i.e., 1 codon/s = 3 nt/s). These calculations also assume that active protein degradation has a small contribution compared to dilution by cell division, which is reasonable in most cases. For example, > 93% of the Escherichia coli proteome is not subject to rapid degradation with protein half‐lives being well beyond cell doubling times during exponential growth and even starvation conditions (Nath & Koch, 1971). x is calculated by multiplying the total ribosome flux q by the fraction of active ribosomes W(x)/f t at that position and normalizing by the number of transcripts per cell of the gene being translated m x , computed from the RNA‐seq data (Fig (1) Finally, the translation profile for nucleotideis calculated by multiplying the total ribosome fluxby the fraction of active ribosomes)/at that position and normalizing by the number of transcripts per cell of the gene being translated, computed from the RNA‐seq data (Fig 1 A). This gives, Importantly, because both the transcription and translation profiles are given in absolute units (RNAP/s and ribosomes/s, respectively), they can be directly compared across samples without any further normalization.

Characterizing genetic parts controlling translation Genetic parts controlling translation alter ribosome flux along a transcript, and these changes are captured by the translation profiles. We developed new mathematical models to interpret these signals and quantify the performance of RBSs, stop codons, and translational recoding (e.g., ribosome frameshifting) in open reading frames (ORFs) at stable secondary structures. et al, 2014 (2) In prokaryotes, RBSs facilitate translation initiation and cause a jump in the translation profile after the start codon of the associated gene due to an increase in ribosome flux originating at that location (Fig 1 B). If initiation is rate limiting (Li), then the translation initiation rate of an RBS (in ribosomes/s units) is given by the increase in ribosome flux downstream of the RBS, x 0 is the start point of the RBS, and x s and x e are the start and end points of the protein‐coding region associated with the RBS, respectively (Fig n = 30 nt (10 codons) was also used to average fluctuations in the translation profile upstream of the RBS; the averaging window is equal to the approximate length of a ribosome footprint. If the transcription start site (TSS) of the promoter expressing this transcript fell in the upstream window, then the start point (x 0 – n) was adjusted to the TSS to ensure that the incoming ribosome flux is not underestimated. A similar change was made if the coding sequence was within an operon and the end of an upstream protein‐coding region falls in this window. In this case, the start point was adjusted to 9 nt (3 codons) downstream of the stop codon of the overlapping protein‐coding region. We also included correction factors to remove the effect of translating ribosomes upstream of the RBS that are not in the same reading frame as the RBS‐controlled ORF and therefore may not fully traverse the coding sequence due to out‐of‐frame stop codons. These are given by, (3) (4) (5) s− and s+ are the positions of the first out‐of‐frame stop codon downstream of x 0 – n in the −1 and +1 reading frame, respectively. C− and C+ capture the average out‐of‐frame ribosome flux in the region upstream of the RBS in the −1 and +1 reading frame, respectively, and C(x) calculates the total sum of these ribosome fluxes that would reach nucleotide x downstream of the RBS. Here,is the start point of the RBS, andandare the start and end points of the protein‐coding region associated with the RBS, respectively (Fig 1 B). By averaging the translation profile over the length of the protein‐coding region, we are able to smooth out small localized fluctuations that might affect the measurement. A window of30 nt (10 codons) was also used to average fluctuations in the translation profile upstream of the RBS; the averaging window is equal to the approximate length of a ribosome footprint. If the transcription start site (TSS) of the promoter expressing this transcript fell in the upstream window, then the start point () was adjusted to the TSS to ensure that the incoming ribosome flux is not underestimated. A similar change was made if the coding sequence was within an operon and the end of an upstream protein‐coding region falls in this window. In this case, the start point was adjusted to 9 nt (3 codons) downstream of the stop codon of the overlapping protein‐coding region. We also included correction factors to remove the effect of translating ribosomes upstream of the RBS that are not in the same reading frame as the RBS‐controlled ORF and therefore may not fully traverse the coding sequence due to out‐of‐frame stop codons. These are given by,whereandare the positions of the first out‐of‐frame stop codon downstream ofin the −1 and +1 reading frame, respectively.andcapture the average out‐of‐frame ribosome flux in the region upstream of the RBS in the −1 and +1 reading frame, respectively, and) calculates the total sum of these ribosome fluxes that would reach nucleotidedownstream of the RBS. et al, 2016 (6) Ribosomes terminate translation and disassociate from a transcript when a stop codon (TAA, TAG or TGA) is encountered. This leads to a drop in the translation profile at these points (Fig 1 C). Although this process is typically efficient, there is a rare chance that some ribosomes may read through a stop codon and continue translating downstream (Arribere). Assuming that all ribosomes translating the protein‐coding region are in‐frame with the associated stop codon and do not frameshift prior to it, then the translation termination efficiency of the stop codon (i.e., the fraction of ribosomes terminating) is given by, Here, x 0 and x 1 are the start and end nucleotides of the stop codon, respectively, x s is the start of the coding region associated with this stop codon, and n = 30 nt (codons) is the window, with the same width as described above, used to average fluctuations in the translation profile downstream of the stop codon (Fig 1C). If additional stop codons are present in the downstream window, the end point of this window (x 1 + n) was adjusted to ensure that the translation termination efficiency of only the first stop codon was measured. A similar adjustment was made if the end of a transcript generated by an upstream promoter ends within this window. 1990 et al, 1991a 2009 2014 (7) Translation converts the information encoded in mRNA into protein whereby each triplet of nucleotides (a codon) is translated into a proteinogenic amino acid. Because of the three‐nucleotide periodicity in the decoding, each nucleotide could be either in the first, second, or third position of a codon, thus defining three reading frames for every transcript. Consequently, a single mRNA sequence can encode three different proteins. Although synthetic biology rarely uses multiple reading frames, natural systems exploit this feature in many different ways (Tsuchihashi & Kornberg,; Condron; Giedroc & Cornish,; Bordeau & Felden,). In our workflow, the RPFs used to generate the translation profiles were aligned to the middle nucleotide of the codon residing in the ribosomal P site, providing the frame of translation. To characterize genetic parts that cause translational recoding through ribosomal frameshifting, we compared regions directly before and after the part. Strong frameshifting will cause the fraction of RPFs to shift from the original frame to a new one when comparing these regions with the frameshifting efficiency given by, Here, x 0 is the nucleotide at the start of the region where frameshifting occurs, and x 1 is the end nucleotide of the stop codon for the first coding sequence (Fig 1D).

Measuring genome‐wide translation initiation and translation termination in Escherichia coli We applied our approach to E. coli cells harboring a lacZ gene whose expression is induced using isopropyl β‐D‐1‐thiogalactopyranoside (IPTG) (Fig 2A). After induction for 10 min, lacZ expression reached 14% of the total cellular protein mass (Appendix Table S1). Samples from non‐induced and induced cells were subjected to the combined sequencing workflow (Fig 1A). Sequencing yielded between 41 and 199 million reads per sample (Appendix Table S2) with no measurable bias across RNA lengths and concentrations (Appendix Fig S1), and a high correlation in endogenous gene expression between biological replicates (R2 > 0.96; Appendix Fig S2). Distributions of mRNA copy numbers and RPF densities per gene were similar across conditions with RPF densities showing a broader spread than mRNA copy numbers (Appendix Fig S5). Figure 2.Measuring translation initiation and translation termination signals across the E. coli transcriptome Genetic design of the LacZ reporter construct whose expression is activated by the inducer IPTG. Normalized RPF count profile averaged for all E. coli transcripts. Profiles generated for cells grown in the absence and presence of IPTG (1 mM). Start and stop codons are shaded. Bar chart of all measured RBS initiation rates ranked by their strength. Strong RBSs with initiation rates > 1 ribosome/s are highlighted in red. Bar chart of all measured translation termination efficiencies at stop codons ranked by their strength. Stop codons with translation termination efficiency > 0.99 are highlighted in red. Distribution of initiation rates for cells grown in the absence and presence of IPTG (1 mM). Distribution of translation termination efficiencies for cells grown in the absence and presence of IPTG (1 mM). Transcription and translation profiles were generated from these data and used to measure translation initiation rates of RBSs and translation termination efficiencies of stop codons across the genome. To remove the bias due to the RPF enrichment at the 5′‐end of coding regions (Ingolia et al, 2009) (Fig 2B), x s was adjusted to 51 bp (17 codons) downstream of the start codon when estimating average ribosome flux across a coding region in Equations 2 and 6. To determine whether translation rates were fairly constant across each gene, we compared the number of RPFs mapping to the first and second half of each coding region. If the ribosomes traverse the coding sequence at a constant speed, then the two halves of a transcript should have a near identical RPF coverage. We found a high correlation between both halves for cells with non‐induced and induced lacZ expression with less than ± 1.5‐fold difference for 80% of all genes (Appendix Fig S3). This suggests a relatively constant speed of the ribosomes across each coding sequence but does not allow for comparisons between genes due to potential gene‐wide biases, e.g., an enrichment in rare codons for a particular gene. We characterized chromosomal RBSs in E. coli by assuming that each covered a region spanning 15 bp upstream of the start codon. Like background RPF levels, the correction factors in equation 5 applied during characterization of the RBSs were small, on average 0.06 and 0.1% of the ribosome flux through the coding region, both with and without inducing lacZ expression with IPTG, respectively. The translation initiation rates of the 779 RBSs we measured varied over two orders of magnitude with a median initiation rate of 0.18 ribosome/s (Fig 2C; Dataset EV1). This closely matches previously measured rates for single genes (Kennell & Riezman, 1977). A few RBSs of transcripts mostly related to stress response functions (tabA, hdeA, uspA, uspG), the ribosomal subunit protein L31 (rpmE), and some genes with unknown function (ydiH, yjdM, yjfN, ybeD) reached much higher rates of up to 3.4 ribosomes/s. To estimate translation termination efficiency at stop codons, we analyzed regions that spanned 9 nt up and downstream of the stop codon (Fig 2B). We excluded overlapping genes and those bearing internal sites that promote frameshifting (Baggett et al, 2017), both of which break the assumptions of our model. In total, the translation termination efficiency of 750 stop codons was measured and their median translation termination efficiency across the transcriptome was found to be 0.974, with 249 of them (33% of all measured) having translation termination efficiencies > 0.99 (Fig 2D; Dataset EV2). Similar performance for both RBSs (R2 = 0.81) and stop codons (R2 = 0.45) was found between cells with non‐induced and induced lacZ expression (Fig 2E and F; Datasets EV1 and EV2).

Quantifying differences in transcription and translation of endogenous and synthetic genes The quantitative measurements produced by our methodology allow both transcription and translation to be monitored simultaneously. To demonstrate this capability, we first focused on differences in the contributions of transcription and translation to overall protein synthesis rates of endogenous genes in E. coli. For each gene, we calculated the protein synthesis rate by multiplying the transcript copy number by the RBS‐mediated translation initiation rate per transcript. We found a strong correlation with previously measured synthesis rates (Li et al, 2014) (Fig 3A). We also extracted the transcription and translation profiles of three genes (uspA, ompA, and gapA) whose protein synthesis rate was similar, but whose expression was controlled differently at the levels of transcription and translation (Fig 3B). Quantification of the promoters and RBSs for these three genes showed more than an order of magnitude difference in their transcription and translation initiation rates; uspA was weakly transcribed and highly translated, ompA was highly transcribed and weakly translated, and gapA was moderately transcribed and translated (Fig 3C). Figure 3.Simultaneous quantification of transcription and translation of endogenous genes and a synthetic genetic construct Comparison of protein synthesis rate of endogenous E. coli genes measured using Ribo‐seq from this study (in molecules/s units) and from that by Li et al ( 2014 Transcription (bottom) and translation (top) profiles for uspA, ompA, and gapA, computed from the RNA‐seq and Ribo‐seq data without induction. Positions of the genetic parts and gene are shown below the profiles. Promoter strengths in RNAP/s units and RBS initiation rates in ribosome/s units. Transcription (bottom) and translation (top) profiles for lacZ. Profiles are shown for cells in the absence and presence of IPTG (1 mM). Position of genetic parts and gene is shown below the profiles. RBS is omitted from the genetic design due to its size. Measured promoter strength in RNAP/s units, RBS initiation rate in ribosomes/s units, and the transcriptional terminator and translation termination efficiency for lacZ. Data shown for cells in the absence and presence of IPTG (1 mM). Because we measure transcription and translation initiation rates in absolute units, it was also possible to determine their ratio (RNAP/ribosome) for each gene and assess whether there was a preference for high/low relative synthesis rates for transcription/translation given a gene's overall protein expression level. This analysis revealed a trend where weakly expressed genes exhibited low RNAP/ribosome ratios, while strongly expressed genes saw higher RNAP/ribosome ratios (Fig 3A). These different modes of gene expression can have a major influence on the efficiency of protein synthesis (Ceroni et al, 2015) and affect the variability in protein levels between cells (Raser & O'Shea, 2005). For example, a metabolically efficient way to strongly express a protein of interest in bacteria is by producing high numbers of transcripts (e.g., with high transcription initiation rate and high stability) with a relatively weak RBS (e.g., low translation initiation rate). This ensures that each ribosome initiating on a transcript has a very low probability of colliding with others, guaranteeing efficient translation elongation (Cambray et al, 2018; Gorochowski & Ellis, 2018). We observe that this strategy is adopted for strongly expressed endogenous genes (Fig 3A). We next sought to demonstrate the ability to measure dynamic changes in the function of regulatory parts using the LacZ construct. We quantified the inducible promoter and terminator controlling transcription, and the RBS and stop codon controlling translation when the inducer IPTG was absent and present. The transcription and translation profiles clearly showed the beginning and end of both the transcript and protein‐coding region, with sharp increases and decreases at transcriptional/translational start and stop sites (Fig 3D). Induction caused a large increase in the number of lacZ transcripts from 0.18 to 110 copies per cell, which was directly observed in the transcription profiles. In contrast, the translation profiles remained stable across conditions. The P tac promoter has a transcription initiation rate of 0.0004 RNAP/s in the absence and 0.3 RNAP/s in the presence of IPTG (1 mM), respectively (Fig 3E). This closely matches the previously measured transcription initiation rate of 0.33 RNAP/s for the P lac promoter (Kennell & Riezman, 1977), which the P tac promoter is derived from (De Boer et al, 1983). The RBS for the lacZ gene had consistent translation initiation rates of between 0.13 and 0.14 ribosomes/s, respectively (Fig 3E). It may seem counterintuitive to observe translation without IPTG induction because very few transcripts will be present. However, leaky expression from the P tac promoter was sufficient to capture enough RPFs during sequencing to generate a translation profile. It should be noted that the translation profile represents the ribosome flux per transcript; thus, its shape was nearly identical to that when the P tac promoter was induced. Like the RBS, both the transcriptional terminator and stop codon showed similar efficiencies of 0.93–0.95 and 0.96–0.99, respectively (Fig 3E).

Characterizing a synthetic pseudoknot that induces translational recoding Pseudoknots (PKs) are stable tertiary structures that regulate gene expression. They are frequently combined with slippery sequences in compact viral genomes to stimulate translational recoding and produce multiple protein products from a single gene (Tsuchihashi & Kornberg, 1990; Brierley et al, 2007; Giedroc & Cornish, 2009; Sharma et al, 2014). The percentage of recoding events generally reflects the stoichiometry of the translated proteins (e.g., capsule proteins for virus assembly) and helps overcome problems where the stochastic nature of transcription and translation makes maintenance of specific ratios difficult (Condron et al, 1991a). PKs are the most common type of structure used to facilitate mostly −1 frameshifting (Atkins et al, 2016) and in much rarer cases +1 frameshifting (e.g., in eukaryotic antizyme genes) (Ivanov et al, 2004). PKs consist of a hairpin with an additional loop that folds back to stabilize the hairpin via extra base pairing (Fig 4A). In addition to stimulating recoding events, PKs regulate translational initiation, where they interfere with an RBS through antisense sequences that base pair with the RBS (Unoson & Wagner, 2007; Bordeau & Felden, 2014). They also act as an evolutionary tool, reducing the length of sequence needed to encode multiple protein‐coding regions and therefore act as a form of genome compression. Figure 4.Characterization of a synthetic pseudoknot construct that induces translational frameshifting Genetic design of the PK‐LacZ construct. Expanded sequence shows the PK secondary structure with the slippery site underlined, as well as the two genes (gene10 and lacZ) in differing reading frames. Translation profiles for the PK‐LacZ construct in cells cultured in the absence (bottom) and presence (top) of IPTG (1 mM). The gene10, middle, and lacZ regions are labeled above the profiles. Shaded region denotes the PK, and dashed lines denote the start codon and stop codons of gene10 and LacZ. Fractions of the total RPFs and mRNA reads in each reading frame for the gene10, PK or middle, and lacZ regions. Data shown separately for cells cultured in the absence and presence of IPTG (1 mM). Violin plots of the distributions of fractions of total RPFs and mRNA reads in each reading frame for all E. coli transcripts. Median values shown by horizontal bars. Data from two biological replicates. *P = 0.049; **P = 1.6 × 10−9 (Mann–Whitney U test). Two elements signal and stimulate frameshifting. The first is a slippery site consisting of a heptanucleotide sequence of the form XXXYYYZ which enables out‐of‐zero‐frame paring in the A or P site of the ribosome, facilitating recoding events. The second is a PK situated 6–8 nt downstream of the slippery site. In bacteria, the distance between the slippery site and the 5′‐end of the PK positions mRNA in the entry channel of the 30S ribosomal subunit, enabling contact with the PK which pauses translation and provides an extended time window for frameshifting to occur (Giedroc & Cornish, 2009). To demonstrate our ability to characterize this process, we created an inducible genetic construct (referred to as PK‐LacZ) that incorporated a virus‐inspired PK structure within its natural context (gene10) fused to lacZ in a −1 frame (Fig 4A) (Tholstrup et al, 2012). Gene10 ends with a stop codon such that translation of lacZ requires frameshifting at the PK. We specifically chose a PK variant (22/6a), which exhibits much lower frameshifting efficiency (~3%) (Tholstrup et al, 2012) compared to the wild‐type PK (~10%) in its natural context (Condron et al, 1991a), but is known to heavily sequester and stall ribosomes and induce a significant stress response (Tholstrup et al, 2012). With our approach, we sought to perform complementary quantification of the frameshifting efficiency, but more importantly to explore why such significant cellular stress was caused. A slippery site UUUAAAG preceded the PK. Gene10 of bacteriophage T7 produces two proteins, one through translation in the zero‐frame and one through a −1 frameshift; both protein products constitute the bacteriophage capsid (Condron et al, 1991b). We generated translation profiles to assess ribosome flux along the entire construct (Fig 4B). These showed high levels of translation up to the PK with a major drop of 80–90% at the PK to the end of the gene10 coding region, and a further drop of ~97% after this region (Fig 4B). To analyze frameshifting within gene10, we divided the construct into three regions: (i) the gene10 segment up to the slippery site, (ii) the middle region, which covers the slippery site along with the PK up to the gene10 stop codon, and (iii) the downstream lacZ gene in a −1 frame. The large drops in the translation profiles at both the PK and gene10 stop codons lead to low numbers of RPFs across the lacZ gene and caused high levels of noise in the translation profiles (Fig 4B). This made direct comparisons of frame‐specific expression levels at a codon resolution impossible. Therefore, for each of the three regions, we pooled the RPFs and calculated the fraction of RPFs in each frame as a total of all three possible frames. We found that the zero and −1 frames dominate the gene10 and lacZ regions, respectively, with > 46% of all RPFs being found in these frames (top row, Fig 4C). The middle region saw a greater mix of all three, and the zero‐frame further dropped in the lacZ region. This is likely due to a combination of ribosomes that have passed the PK successfully and terminated in zero‐frame at the end of gene10 and those that have frameshifted. Similar results were found with and without induction by IPTG (Fig 4C). An identical analysis of the reading frames from the RNA‐seq data revealed that no specific frame was preferred with equal fractions of each (bottom row, Fig 4C). This suggests that the reading frames recovered for the RPFs were not influenced by any sequencing bias. We further tested whether the major translation frame could be recovered by analyzing the entire genome and measured the fraction of each frame across every gene. The correct zero‐frame dominated in most cases (Fig 4D). Finally, to calculate the efficiency of PK‐induced frameshifting, we compared the density of RPFs per nucleotide for the middle and lacZ regions. Because the PK causes ribosome stalling, the assumption of constant ribosome speed is broken for the gene10 region upstream of the PK. Therefore, when calculating the frameshifting efficiency using equation 7, x s and x 0 were set to the start and end nucleotides of the middle region, directly downstream of the PK where pausing was not expected to occur. We found that the PK caused 2–3% of ribosomes to frameshift. This precisely matched previous measurements of 3% for the same PK variant (22/6a) measured by monitoring radioactive methionine incorporation (Tholstrup et al, 2012).