CFPS from extracts of a genomically recoded organism

To benchmark CFPS activity, we first compared sfGFP yields in extracts from C321.∆A and BL21 Star (DE3), the standard commercial protein expression strain (Fig. 1a). Combined transcription (TX)–TL reactions were carried out in 15 µL volumes for 24 h at 30 °C. Protein yields from BL21 Star (DE3) extracts were > 3-fold higher than those from C321.∆A (Fig. 1b), highlighting the need to improve protein synthesis yields to take advantage of the benefits of RF1 removal for making modified proteins with ncAAs for preparative purposes.

Fig. 1 CFPS from extracts of a genomically recoded organism. a Schematic of the production and utilization of crude extract from genomically recoded organisms with plasmid overexpression of orthogonal translation components for cell-free protein synthesis (CFPS). CFPS reactions are supplemented with the necessary substrates (e.g., amino acids, NTPs, etc.) required for in vitro transcription and translation as well as purified orthogonal translation system (OTS) components to help increase the ncAA incorporation efficiency. aaRS, aminoacyl tRNA synthetase; ncAA, non-canonical amino acid; T7P, T7 RNA polymerase; UAG, amber codon. b Time course of superfolder green fluorescent protein (sfGFP) synthesis catalyzed by extracts derived from a genomically recoded organism, C321.∆A, and a commercial strain, BL21 Star (DE3). Three independent batch CFPS reactions (n = 3) were performed at 30 °C for each time point over 24 h. Error bar = 1 SD Full size image

Previously, genomic modifications to the extract source strain to stabilize DNA template35, amino acid supply36, and protein degradation37 have improved CFPS yields from other source strains. For example, we engineered a partially recoded strain of E. coli (rEc.E13.∆A) by disrupting genes encoding nucleases (MCJ.559 (endA− csdA−)) to improve protein synthesis yields > 4-fold relative to the parent strain34. Building on this knowledge, we hypothesized that the genomic disruption of negative protein effectors in C321.∆A extracts would help stabilize essential substrates in cell-free reactions, extend reaction durations, and increase CFPS yields.

Strain engineering for improved CFPS performance

We targeted the functional inactivation of five nucleases (rna, rnb, mazF, endA, and rne), two proteases (ompT and lon), and eight targets shown previously to negatively impact amino acid, energy, and redox stability (gdhA, gshA, sdaA, sdaB, speA, tnaA, glpK, and gor) in C321.∆A individually and in combination (Supplementary Table 1). Our effort followed a five-step approach. First, we generated a library of single mutant strains in which we used MAGE to insert an early TL termination sequence into the open reading frames of gene targets that would functionally inactivate them, as we have done before34 (Fig. 2a and Supplementary Tables 2 and 3). Second, we confirmed gene disruptions using multiplex allele specific PCR and DNA sequencing. Third, we measured the growth rate for each of the MAGE-modified strains, noting that average doubling time increased 9 ± 9% above the parent strain (Supplementary Table 4). Fourth, cell extracts from each strain were generated using a high-throughput and robust extract generation procedure38. Fifth, we tested the strains in CFPS to assess their overall protein synthesis capability. We observed that seven single functional inactivation mutations increased CFPS yields more than 50% relative to the wild type C321.∆A strain; namely, rne−, mazF−, tnaA−, glpK−, lon−, gor−, and endA− (Fig. 2b). These results suggested that some of the protein effectors targeted for inactivation were deleterious to CFPS activity. They also demonstrated the difficulty associated with predicting CFPS productivities from engineered strains. For example, some mutations identified in previous screens (e.g., rnb− in rEc.E13.∆A)34 were not beneficial in the C321.∆A context, others which reduced cellular fitness enhanced CFPS activity (e.g., lon−), and yet others with no impact on cell growth (e.g., ompT−) led to poor extract performance (Fig. 2b).

Fig. 2 Engineering C321.∆A variants for enhanced CFPS. a Schematic of design-build-test cycles employing multiplex automated genome engineering (MAGE) to disrupt putative negative protein effectors (Supplementary Table 1) in engineered C321.∆A strains for producing extracts with enhanced cell-free protein synthesis (CFPS) yields. b Cell extracts derived from C321.∆A and genomically engineered strains containing a single putative negative effector inactivation were screened for sfGFP yields. Beneficial mutations that increase active yields ≥ 50% relative to C321.ΔA are highlighted with an *(p < 0.01, Student’s t-test). c C321.∆A.542 (endA−) was chosen as the next base strain and the following beneficial disruptions were pursued in combination: rne, mazF, tnaA, glpK, lon, and gor. d C321.∆A.709 (endA− gor−) was selected as the subsequent base strain for triple and quadruple mutant construction. C321.∆A.759 (endA− gor− rne− mazF−) yielded the highest level of CFPS production. Total sfGFP concentration was measured by counting radioactive 14C-Leucine incorporation and active protein was measured using fluorescence. Three independent batch CFPS reactions were performed for each sample at 30 °C for 20 h (n = 3). Error bar = 1 SD Full size image

With improvements in hand from single mutant strains, we next set out to identify synergistic benefits to CFPS productivity by combining highly productive mutations. We introduced the rne−, mazF−, tnaA−, glpK−, lon−, and gor− mutations to the best performing strain from our initial screen, strain C321.∆A.542 (endA−) (Fig. 2c). The combination of endA− and gor− mutations resulted in an extract capable of synthesizing 1,620 ± 10 mg/L of active sfGFP (strain C321.∆A.709). We then used C321.∆A.709 to generate six additional strains with combined mutations. Although we did not observe synergistic enhancements, our top performing extract chassis strain (C321.∆A.759 (endA− gor− rne− mazF−)) resulted in total yields of 1,780 ± 30 mg/L (Fig. 2d), representing a 4.5-fold increase in sfGFP yield relative to the progenitor strain (C321.∆A). In addition, we tested 12 combinatorial mutants generated throughout our MAGE screening, and although a few demonstrated CFPS yields > 1 g/L of active sfGFP, none surpassed the CFPS yields observed from C321.∆A.759 (Supplementary Table 5). Lastly, we determined that CFPS improvements seen in C321.∆A.759 brought on by genomic modifications could not be obtained by simply supplementing C321.∆A-based reactions with RNAse inhibitors (Supplementary Fig. 1). Final strains were fully sequenced to verify functional targeted modifications in the genome. Whole-genome sequences for strains C321.∆A, C321.∆A.542, C321.∆A.705, C321.∆A.709, C321.∆A.740, and C321.∆A.759 have been deposited in the NCBI SRA collection under accession code PRJNA361365. Each of the targeted mutations were achieved. MAGE has been shown to induce mutations throughout the genome before, and we observed a number of accumulated polymorphisms in the extract chassis strains. These polymorphisms, along with a specific list of protein-coding genes bearing mutations, are shown in Supplementary Tables 6 and 7. In the future, we seek to better understand the systems impact of the non-targeted mutations.

Based on our previous studies using rEc.E13.∆A[34], we hypothesized that the beneficial mutations in C321.∆A.759 reduced messenger RNA degradation and stabilized the DNA template. To test mRNA stability, we performed TL-only reactions using extracts derived from C321.∆A.759 and C321.∆A. Purified mRNA template coding for sfGFP was used to direct protein synthesis. We observed a twofold increase in mRNA and ~ 90% increase of active sfGFP using C321.∆A.759 extracts relative to C321.∆A extracts after a 120 min cell-free reaction (Supplementary Fig. 2). To test DNA stability, TX-only reactions were used. Specifically, plasmid DNA containing the modified red fluorescent protein–Spinach aptamer gene (Supplementary Table 3) was pre-incubated with cell extract and a fluorophore molecule, 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI), for 0, 60, and 180 min. Then, CFPS reagents were added and mRNA was synthesized, then quantified by measuring the fluorescence of DFHBI-bound Spinach aptamer mRNA. After 180 min of pre-incubation, nearly 50% of Spinach aptamer mRNA was synthesized in C321.∆A.759 (endA−) extracts relative to the 0 min control. In contrast, the extract with endonuclease I (C321.∆A) decreased the maximum mRNA synthesis level by ~ 75% (Supplementary Fig. 3). Together, our data support the hypothesis that inactivating nucleases in the extract chassis strain stabilized DNA and mRNA to improve CFPS yields.

In addition to confirming added DNA and mRNA stability, we also assessed potential changes in energy and amino acid substrate stability that may have occurred in C321.∆A.759– relative to C321.∆A–based CFPS. Similar trends in ATP levels (Supplementary Fig. 4), adenylate charge (Supplementary Fig. 5), and amino acid concentrations (Supplementary Fig. 6) were observed in CFPS reactions derived from both strains. Supplemental feeding with the amino acids found to be most rapidly depleted did not improve yields (Supplementary Fig. 6f). The similar amino acid and energy stability profiles in C321.∆A.759 compared with C321.∆A suggest that our strain engineering efforts did not modulate the availability of these substrates.

To generalize CFPS improvements in C321.∆A.759, we next expressed four model proteins that have been previously synthesized in CFPS systems and compared productivities to BL21 Star (DE3). We observed a 31–63% increase in soluble and total protein synthesis of sfGFP, chloramphenicol acetyltransferase (CAT), dihydrofolate reductase (DHFR), and modified murine granulocyte-macrophage colony-stimulating factor (mGM-CSF) in our engineered C321.∆A.759 extracts as compared to BL21 Star (DE3) extracts (Supplementary Fig. 7a). Autoradiograms of proteins produced using C321.∆A.759 extract show production of full-length sfGFP, CAT, DHFR, and mGM-CSF (Supplementary Fig. 7b and 7c). In addition, we observed disulfide bond formation in the model mGM-CSF under an oxidizing CFPS environment (– DTT), as has been previously shown (Supplementary Fig. 7c)39,40. In sum, the development of enhanced extract source strains by MAGE enabled a general and high-yielding CFPS platform.

Multi-site ncAA incorporation into proteins in CFPS

We next aimed to test site-specific ncAA incorporation into proteins using our high-yielding CFPS platform from C321.∆A.759-derived extracts and compare these results to reactions using extracts from BL21 Star (DE3) (containing RF1) and a partially recoded RF1-deficient engineered strain MCJ.559 based on rEc.E13.∆A. To do so, we transformed each organism with pEVOL-pAcF plasmid that expresses both orthogonal pAcF synthetase (pAcFRS) and tRNA (o-tRNAopt)41. Then, we quantitatively assessed the incorporation of pAcF into sfGFP variants with up to five in-frame amber codons. CFPS reactions were supplemented with additional OTS components based on our previous work27. Specifically, we added 10 µg/mL of linear DNA encoding optimized orthogonal tRNA in the form of a transzyme (o-tRNAopt) for in situ synthesis of the tRNA. The orthogonal pAcFRS was overproduced, purified as previously described, and added at a level of 0.5 mg/mL. The ncAA, in this case pAcF, was supplied at a level of 2 mM in each CFPS reaction. Total protein yields were quantified by 14C-leucine radioactive incorporation. Production of wild-type and modified sfGFP containing one UAG codon (sfGFP-UAG) was increased 77% and 92% in C321.∆A.759 extracts as compared with BL21 Star (DE3), and 120% and 145% as compared with MCJ.559, respectively (Fig. 3a and Supplementary Fig. 8a). Moreover, we observed that sfGFP-UAG was expressed at 90% the level of wild-type sfGFP. Owing to the absence of RF1 competition, the major protein produced was full-length sfGFP using extracts derived from C321.∆A.759 and MCJ.559, whereas truncated sfGFP was visible in reactions catalyzed by BL21 Star (DE3) extract, presumably due to RF1 competition (Fig. 3a and Supplementary Fig. 8a)30,42. Similar results were obtained with a second model protein, CAT with an in-frame amber codon at position 112 (CAT-UAG) (Fig. 3a and Supplementary Fig. 8b). When expressing CAT-UAG using MCJ.559 extract, similar levels of truncated CAT relative to BL21 Star (DE3) were observed; however, this is most likely due to an upregulation of rescue mechanisms for ribosome stalling in the partially recoded strain34. Single pAcF incorporation into CAT-UAG using C321.∆A.759 lysate demonstrated only full-length product. Therefore, our completely recoded, genomically engineered C321.∆A.759 strain provides benefits for efficient ncAA incorporation without detectable levels of truncation product.

Fig. 3 Multi-site incorporation of pAcF into proteins. Cell-free p-acetyl-l-phenylalanine (pAcF) incorporation was compared using extracts derived from BL21 Star (DE3), MCJ.559, and C321.∆A.759 strains containing the pEVOL-pAcF vector. The pEVOL-pAcF vector harbors the orthogonal translation machinery necessary for pAcF incorporation. a Total protein yields for wild-type (WT) and 1 UAG versions of superfolder green fluorescent protein (sfGFP) and chloramphenicol acetyl transferase (CAT) are shown along with an autoradiogram of the resulting protein product. Supplementary Fig. 8 shows the entirety of the autoradiogram along with a molecular weight marker. b Multi-site incorporation of pAcF into sfGFP variants as quantified by active protein produced. The sfGFP variants used were wild-type (WT), sfGFP containing a single pAcF corresponding to the position of T216 (1 UAG), sfGFP containing sfGFP containing two pAcFs (2 UAG), and sfGFP containing five pAcFs (5 UAG). Three independent batch CFPS reactions were performed for each sample at 30 °C for 20 h (n = 3). Error bar = 1 SD. c Spectrum of the 28 + charge state of sfGFP, obtained by top-down mass spectrometry and illustrating site-specific incorporation of pAcF at single and multiple sites. Experimental (Exper) and theoretical (Theor) mass peaks for each sfGFP variant are shown. Major peaks (color) in each spectrum coincide with the theoretical peaks for each species (see also Supplementary Fig. 11). Smaller peaks immediately to the right of the major peaks are due to oxidation of the protein, a common electrochemical reaction occurring during electrospray ionization. Experimentally determined masses are ≤ 1 p.p.m. in comparison of theoretical mass calculations. Owing to the size of pAcF, misincorporation would result in peaks present at lower m/z values relative to the colored theoretical peak Full size image

We then evaluated the ability of our high-yielding CFPS platform to facilitate incorporation of up to five identical ncAAs into sfGFP. For ease of analysis, a fluorescence assay was used, which indicated increased production of sfGFP in extracts from C321.∆A.759 (Fig. 3b). Results for BL21 Star (DE3) extract displayed an exponential decrease in active sfGFP synthesized with an increasing presence of UAG, leading to the production of no detectable active protein for sfGFP-5UAG. Active protein produced by C321.∆A.759 extract were ~ 2-fold greater than that produced by MCJ.559 extract, suggesting that benefits observed in increased yield can be extended to multi-site ncAA incorporation for our enhanced, fully recoded strain. Furthermore, we examined the ability to incorporate consecutive pAcFs into single protein. Protein gel and autoradiogram analysis of sfGFP with eight and nine consecutive amber codons indicated that this is possible, with the percent of full-length product being ~ 75% and 60%, respectively (Supplementary Fig. 9).

In addition, batch reactions catalyzed by C321.∆A.759 extracts could also be scaled 17-fold without loss of productivity provided that a proper ratio of surface area to volume ratio is maintained (Supplementary Fig. 10)43. Of note, we believe our reactions could be further scaled to a wide range of volumes to produce larger amounts of protein if accounting for surface area to volume effects. For example, Sutro Biopharma has applied E. coli-based CFPS platforms to clinical manufacturing of therapeutics at the 100 L scale44, with an expansion factor of 106. In terms of cost, although we use a phosphoenolpyruvate (PEP)-based CFPS system here, cellular metabolism could be used to fuel cost effective, high-level protein synthesis suitable for manufacturing applications45,46.

After demonstrating benefits for protein expression, we carried out top-down mass spectrometry (i.e., MS analysis of whole intact proteins) to detect and provide semi-quantitative data for the incorporation efficiency of pAcF into sfGFP using extract derived from C321.∆A.759. Figure 3c shows the 28 + charge state of sfGFP and clearly illustrates mass shifts corresponding to the incorporation of one, two, and five pAcF residues. Site-specific incorporation of pAcF, as detected by MS, was ≥ 98% in all samples, with ≤ 1 p.p.m. difference between experimental and theoretical protein masses (Supplementary Fig. 11). In other words, efficient and high yielding site-specific pAcF incorporation into sfGFP was observed when using C321.∆A.759 extract. We went on to further show that extracts generated from C321.∆A.759 are compatible with multiple OTSs, showing the incorporation of p-propargyloxy-l-phenylalanine and p-azido-l-phenylalanine (pAzF) (Supplementary Fig. 12).

Multi-site ncAA incorporation into ELPs

We next explored the synthesis of sequence-defined biopolymers containing tens of site specifically introduced ncAAs using our efficient and tunable CFPS system. As a model biopolymer, we selected ELPs. ELPs are biocompatible and stimuli-responsive biopolymers that can be applied for drug delivery and tissue engineering47,48. Typically, ELPs consist of repeats of the pentapeptide sequence VPGVG, which is known to be a key component in elastin and exhibits interesting self-assembly behavior (random coil to helix) above its transition temperature. The structure and function of elastin is maintained as long as the glycine and proline residues are present; however, the second valine residue is permissive for any amino acid except proline and is therefore also permissive to ncAAs20. Previously, ncAAs have been introduced into ELPs by substituting natural amino acids with structurally similar ncAAs in CFPS systems49. Conticello and colleagues50 have also previously produced imperfect ELPs containing up to 22 ncAAs in vivo using an E. coli strain with an attenuated activity of RF1. We previously incorporated up to 30 ncAAs into ELPs by evolving orthogonal synthetases in vivo with enhanced specificities20. In this study, we constructed and tested in CFPS three ELP constructs containing 20, 30, and 40 UAG codons, as well as control proteins with tyrosine codons substituted for UAGs.

Before characterizing ELP yields, we first carried out a series of optimization experiments to enhance CFPS yields of sfGFP with 5 UAG codons, as expression yields for this construct were reduced in our initial studies (Fig. 3b). By testing total and soluble protein yields, we determined that the reduction in yield was a result of loss in sfGFP solubility and activity (Supplementary Fig. 13). However, a 31% increase in sfGFP-5UAG production was observed upon increasing pAcFRS levels 2-fold, pAcF levels 2.5-fold, and o-tz-tRNAopt 3-fold (Supplementary Fig. 14). Upon application of these optimized conditions, called OTSopt, to the synthesis of ELP-UAGs containing 20, 30, and 40-mers, total yields increased by 40%, 33%, and 26%, respectively, as compared with supplementing with OTS levels optimized for 1 ncAA incorporation (Supplementary Fig. 15). ELP-UAG products were visualized using an autoradiogram, which demonstrated the high percentage of full-length protein and whose band intensities corroborate total yields measured (Supplementary Fig. 15b).

We next applied OTSopt to the synthesis of ELP-UAGs with 20, 30, and 40-mers in the presence and absence of pAcF to demonstrate specificity of incorporation. ELP-UAGs were only synthesized in the presence of pAcF without any clear indication of truncation products, whereas no protein was observed in the absence of pAcF (Fig. 4). We anticipated that yields would decrease as the number of UAG codons increased due to the higher demand of pAcF-charged o-tRNA. In contrast, near wild type yields of ~ 100 mg/L were obtained for all UAG constructs. We then carefully examined the efficiency of multi-site ncAA incorporation using top-down liquid chromatography (LC)-MS of intact ELPs. LC-MS analysis showed ≥ 98% site-specific pAcF incorporation in ELP-UAG constructs of 20, 30, and 40-mers (Fig. 4).