Significance Phylogenetic evidence suggests that a factor in the emergence of the ancestral eukaryotic cell may have been selection pressure resulting from invasion and proliferation of retroelements. Here we experimentally determine the effects of a retroelement invasion on genetically simple host organisms, and we demonstrate theoretically that the observed effects are sufficient to explain their observed rarity in bacteria. We also show that nonhomologous end-joining (NHEJ), a mechanism of DNA repair found in all extant eukaryotes, but only some bacteria, significantly enhances the efficiency of retrotransposition and the effects of retroelements on the host. We hypothesize that the interplay of NHEJ and retroelements may have played a previously unappreciated role in the evolution of advanced life.

Abstract Phylogenetic evidence suggests that the invasion and proliferation of retroelements, selfish mobile genetic elements that copy and paste themselves within a host genome, was one of the early evolutionary events in the emergence of eukaryotes. Here we test the effects of this event by determining the pressures retroelements exert on simple genomes. We transferred two retroelements, human LINE-1 and the bacterial group II intron Ll.LtrB, into bacteria, and find that both are functional and detrimental to growth. We find, surprisingly, that retroelement lethality and proliferation are enhanced by the ability to perform eukaryotic-like nonhomologous end-joining (NHEJ) DNA repair. We show that the only stable evolutionary consequence in simple cells is maintenance of retroelements in low numbers, suggesting how retrotransposition rates and costs in early eukaryotes could have been constrained to allow proliferation. Our results suggest that the interplay between NHEJ and retroelements may have played a fundamental and previously unappreciated role in facilitating the proliferation of retroelements, elements of which became the ancestors of the spliceosome components in eukaryotes.

The complexity of eukaryotes relative to bacteria and archaea is a consequence of the increased connectivity and plasticity of networks and interactions, rather than an increase in the amount of coding DNA (1). Such complexity is mediated by several mechanisms: one is the spliceosome, a complex molecular machine present in eukaryotes that operates on nascent mRNAs to generate mature transcripts. In some animals, for example, the spliceosome can generate multiple mRNAs through alternative splicings of a single primary transcript, allowing access to additional complexity without a concomitant increase in the amount of coding DNA. The spliceosome’s primary role is the removal of introns, intervening sequences that disrupt the coding regions of eukaryotic genes and make up, for example, ∼24–37% of the human genome (2). Conversely, bacteria and archaea lack a spliceosome, and intervening sequences are present only in limited numbers as retrotransposable elements called group II introns.

Group II introns are found in only ∼30% of sequenced bacterial species and are generally present in low copy numbers of ∼1–10 per individual in those species where they exist (3). Conversely, retroelements in eukaryotes are vastly more abundant. For example, retrotransposons in humans comprise another ∼45% of the genome in addition to introns and make up the majority of so-called “junk DNA” (2, 4). The human retroelement LINE-1 (or “L1”) alone makes up ∼17% of the genome, with ∼500,000 total integrants and ∼80–100 complete and active, or hot (L1H), copies per individual (5, 6). L1 activity contributes significantly to human genetic heterogeneity, disease, development, and evolution (7⇓⇓–10), and its known mechanisms of transposition show significant similarity to those of bacterial group II introns such as Ll.LtrB (11). This motivates their classification together as target-primed retrotransposons (12).

On the basis of manifold sequence, structural, and mechanistic similarities among bacterial group II introns, the spliceosome, eukaryotic spliceosomal introns, and autonomous eukaryotic retrotransposons, it has been hypothesized that an invasion of group II introns from an endosymbiotic eubacterial organelle contributed to the proliferation of introns within eukaryotic genomes before the last eukaryotic common ancestor (13, 14). If so, the resulting disruption to protein coding sequences could be alleviated by, among other contributing factors, consolidation of intron maturase splicing activity within the centralized spliceosome complex (3, 15) and the spatial decoupling of transcription and translation by a nuclear envelope (16, 17), although the order in which these developments occurred remains unclear. However, what enabled the proliferation of retroelements in eukaryotes and the evolutionary pressures and mechanisms limiting proliferation of retroelements in bacteria and archaea remain poorly understood and the subject of speculation (13, 18), particularly in light of the horizontal transfer of proliferative autonomous retroelements from humans to bacteria, as in the case of the recent transfer of L1 to the pathogen Neisseria gonorrhoeae (19).

To illuminate the changes in cellular machinery and tolerance of retroelements that would have been necessary to go from simple bacterial-like systems to eukaryotic ones, it would be important to understand precisely how retroelements may produce deleterious effects (20), what limits their activity in simple genomes, and what may have enabled their proliferation in eukaryotic genomes. To this end, we have constructed a bacterial version of L1 to quantitatively assess the function and effects of retroelement expression in the bacteria Escherichia coli and Bacillus subtilis, and we compare its effects with those of the bacterial group II intron Ll.LtrB. We find that L1 is functional in E. coli, successfully integrating into its genome. We demonstrate that retroelement expression is severely detrimental to both E. coli and B. subtilis, with wild-type B. subtilis in particular unable to tolerate any retroelement expression. We find that capacity of the host to perform nonhomologous end joining (NHEJ) repair of DNA double breaks increases retrotransposition rates by approximately three orders of magnitude, and that, surprisingly, NHEJ also strongly enhances bacterial sensitivity to the activity of retroelements. We show that these results demonstrate that retroelement activity generally leads to low copy numbers or extinction, as seen in bacteria and archaea, and that proliferation of retroelements in eukaryotes and subsequent addition of complexity to the eukaryotic genome may have been enabled by precise tuning of parameters, leading to suppression of growth defects and enhancement of integration efficiency.

Discussion That both human L1H and bacterial Ll.LtrB expression results in exponential decrease in growth rate suggests a simple universal underlying mechanism: each retroelement mRNA transcript has a probability of integrating and disrupting essential genes affecting growth. In the simplest model of this type, the probability that a cell will survive is described by a binomial distribution with zero disruptive integration events, leading to an exponential decrease in growth rate with transcript number; including variable integration rates and physiological responses does not significantly affect the resulting behavior (SI Appendix, Supplementary Analysis). As a consequence, in bacteria, the growth defect is a monotonically increasing function of the integration rate. To further understand how retrotransposons will proliferate within a host genome, we constructed a simple model of retroelement activity, motivated by the existing body of work on retroelement activity (20, 35⇓⇓⇓⇓⇓–41), and analyzed its dynamics (SI Appendix, Supplementary Analysis). Populations of asexually multiplying cells were simulated on the basis of measured integration rates and growth defects, and allowed to evolve over 10,000 generations. The resulting phase diagrams are shown in Fig. 5 for retrohoming (reflective boundary conditions) and retrotransposition (absorbing boundary conditions), respectively. We find that retrohoming generally leads to low but stable numbers of retroelements, whereas the parameters with which retrotransposition occurs must be finely tuned to achieve long-lived states with proliferation of retrotransposons in the host. Fig. 5. Phase diagram of retrotransposon dynamics. We simulated the model of retrotransposon dynamics, SI Appendix, Eq. 2.7 (SI Appendix, Supplementary Analysis), using a total system size [defined as the number of available empty sites in the environment plus (effective) number of individuals in the population] of Ω = 109, with an initial population of ψ 1 = 0.1 and all other states empty. This initial state was allowed to evolve for 10,000 generations with Δ = 10−8 retrotransposon−1⋅cell−1⋅generation−1 and β = 10−2 cell−1⋅generation−1, at the conclusion of which we calculated the average number of retrotransposons per cell over the extant population. Results are shown for (A) reflecting boundary conditions with x max = 4 and (B) absorbing boundary conditions with x max = − ln ( 0.1 ) / b . The phase portrait in Fig. 5B shows that there exists a small set of parameter values (low growth defect, b, of less than 0.01 and high integration rate, µ, of ∼10−3 retrotransposon−1⋅cell−1⋅generation−1), where retrotransposons can proliferate to high numbers. Coupling of the integration rate and growth defect implies that increases in the integration rate inexorably push bacteria toward the upper right of the phase diagram, and thus toward extinction. Hence, the bacterial phase space is highly constrained, and they are unlikely to be found within this small proliferative regime. To demonstrate this, we performed simulations using absorbing boundary conditions across parameter values, and for each, we recorded the number of generations required for the retrotransposon to go extinct. The result is shown in Fig. 6. From this analysis, we see that the time required for a retrotransposon to go extinct can vary more than ∼7 orders of magnitude, depending on its dynamics and effects. For those parameter regimes corresponding to the aggressive autonomous retrotransposon L1 (b ≥ 10−2, µ ≥ 10−2 retrotransposon−1⋅cell−1⋅generation−1), extinction of retroelements is rapid, occurring in ∼100–10,000 generations. Conversely, parameter regimes corresponding to the group II intron Ll.LtrB (10−3 ≤ b ≤ 10−2, 10−9 ≤ µ ≤ 10−6 retrotransposon−1⋅cell−1⋅generation−1) can persist in low copy numbers (∼1 per cell) for millions to tens of millions of generations. We also see that the small parameter regime in which retrotransposons can proliferate to high copy numbers (b ≤ 10−2, µ ∼10−3 - 10−4 retrotransposon−1⋅cell−1⋅generation−1) persists for hundreds of thousands to millions of generations, and could be maintained longer with the inclusion of horizontal gene transfer. Fig. 6. Time to extinction of retrotransposons in a bacterial population. Simulations of the model SI Appendix, Eq. 2.7 (SI Appendix, Supplementary Analysis), with absorbing boundary condition at x max = − ln ( 0.1 ) / b , system size of Ω = 109, Δ = 10−8 retrotransposon−1⋅cell−1⋅generation−1, β = 10−2 cell−1⋅generation−1 and initial population of ψ 1 = 0.1 with all other states empty. Color indicates the number of generations required for the average number of retrotransposons per cell to drop below 1/Ω. Solid contour lines indicate major decade divisions; dashed contour lines indicate half-decade divisions. Hence, this simple model suggests that for retroelements to proliferate to high numbers within asexual populations, the coupling of integration rate and growth defect must be weakened. In addition, increases in retrotransposition efficiency by NHEJ, present in all extant eukaryotes, must also be compensated for by suppression of the growth defect to enable proliferation. Indeed, it is hypothesized that many eukaryotic features arose specifically to mitigate the effects of retroelements (3, 13, 16, 17, 42, 43). For example, the nuclear membrane allows the spliceosome to complete intron excision before nuclear export and translation (16, 17). Furthermore, important spliceosomal components are derived from group II introns, and consolidation of splicing activity into the spliceosomal complex may facilitate efficient intron removal (3, 13). With the spliceosome, further complexity added to the eukaryotic genome by retroelements could then be exploited for benefit through, for example, alternative splicing by exon-skipping in some eukaryotes. In summary, proliferation of retroelements plays a dual role. On the one hand, group II introns create genome instability and negative physiological effects. On the other hand, by duplicating themselves, copies of group II introns are free to diversify and become the ancestors of both spliceosome and spliceosomal introns (13, 14). We hypothesize that NHEJ enhances retrotransposition by directly joining the newly reverse-transcribed retroelement with the remaining free end of the endonuclease-induced break. Without NHEJ, this break can only be repaired through homologous recombination, generally leading to removal of the integrant and apparent low retrotransposition efficiencies, as observed in NHEJ-deficient E. coli. However, it is surprising that minimal, two-protein bacterial NHEJ systems interact with and enhance human L1 retrotransposition efficiency. Intriguingly, NHEJ proteins also heavily associate with telomeres and are required for proper telomere length regulation and end protection (44, 45). Furthermore, the reverse transcriptase activity of telomerase likely shares a common ancestor with group II introns, and in some organisms (e.g., Drosophila), telomere maintenance is performed by retroelements rather than telomerase (13). Combined with our results, we conjecture that NHEJ systems, together with retroelement proliferation, were implicated in the unexplained evolutionary transition from generally circular bacterial chromosomes to linear eukaryotic chromosomes (13, 42, 45).

Methods Strains and Media. Manipulation of constructs was performed with E. coli strain NEBTurbo (New England Biosciences). Experiments assaying effects of retroelement expression in E. coli were performed in the strain BL21(DE3). B. subtilis experiments were performed with strain 168, as well as ΔykoU (WN1080/BFS1845), ΔykoV (WN1081/BFS1846), and ΔykoU ΔykoV (WN1082/BFS1847) knockout strains (31). Plasmid Construction. See SI Appendix for descriptions of plasmid constructs. B. subtilis Transformation. B. subtilis transformation was performed as described in ref. 46, with modifications (SI Appendix, Supplementary Methods). LacZ Measurements. B. subtilis 168 pHCMC05-lacZYAX was inoculated into RDM glucose and, when OD600 of the culture reached ∼0.3–0.5, 0.5 mL culture was added to 0.5 mL Z-buffer + 0.1% SDS with 100 μL toluene. This mixture was vortexed and incubated in a 37 °C water bath for 30 min. The LacZ assay was then performed as previously described (SI Appendix, Fig. S1) (47, 48). Growth Rate Determination. Detailed methods of growth rate determination can be found in the SI Appendix. Microscopy. To perform fluorescence microscopy, 50 μL samples of culture were spread onto 1% agarose pads prepared on glass slides, covered with a #1.5 glass coverslip and imaged; see SI Appendix for details. Quantitative RT-PCR. Methods for qRT-PCR can be found in SI Appendix. Ll.LtrB Retrotransposition Frequency Assays. Retrotransposition efficiency of Ll.LtrB with and without NHEJ expression was determined by the protocol of ref. 11, with modifications; see SI Appendix, Supplementary Methods.

Acknowledgments We thank Prof. Douglas Mitchell (University of Illinois Urbana–Champaign) for the gift of B. subtilis 168 and plasmids, Wayne L. Nicholson (University of Florida) for the gift of B. subtilis NHEJ knockout strains, and Marlene Belfort (University of Albany, State University of New York) for the gift of Ll.LtrB constructs and sequence information. This work was supported by the NSF Center for the Physics of Living Cells (Grant PHY 1430124), the Alfred P. Sloan Foundation (Award FG-2015-65532), and the Institute for Universal Biology, through partial support by the NASA Astrobiology Institute (NAI) under Cooperative Agreement No. NNA13AA91A issued through the Science Mission Directorate. G.L. is supported by the National Science Foundation Graduate Research Fellowship Program under Grant DGE-1144245. All work was reviewed and approved by the University of Illinois Urbana–Champaign Institutional Review Board and Institutional Biosafety Committee.

Footnotes Author contributions: N.A.S., N.H.K., N.G., and T.E.K. designed research; G.L., N.A.S., E.R., D.K., N.U., K.M.M., C.X., N.G., and T.E.K. performed research; N.A.S., K.M.M., C.X., N.G., and T.E.K. analyzed data; and G.L., N.A.S., D.K., K.M.M., C.X., N.G., and T.E.K. wrote the paper.

Reviewers: E.V.K., National Institutes of Health; M.L., Arizona State University; and S.W.R., San Francisco State University.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1807709115/-/DCSupplemental.