From gene cloning to protein purification, the cellular and molecular tools needed in all steps of the process are widely accessible, and many alternatives are available. Still, failure to obtain a functional recombinant protein is not uncommon, due to protein toxicity to the host or aggregation in inclusion bodies. That is why there is continuous interest in novel approaches that optimize recombinant protein production in E. coli . Numerous reviews have covered different aspects of the topic in detail. 2 - 5 In this review, we cover advances reported in the last 5 years, in the areas of host engineering, expression vector design, and culture conditions. The newly developed tools show much promise in the field, and we expect them to disseminate in the scientific community rapidly. Lastly, for those who are about to embark on the fascinating world of the heterologous expression of proteins, we advise not only to read this review, but also refer to our previous one, given that both are complementary. 4

The study of proteins or their use in biotechnological applications often requires their isolation from other cellular components. Purification can be performed from the natural source of the protein; however, this approach is usually cumbersome and inefficient for most of them. The coding sequence for the protein of interest can be inserted into an appropriate expression vector and transformed into a prokaryotic host, such as the bacterium Escherichia coli . Using E. coli as a microbial cell factory for producing recombinant proteins lowers the costs of production and improves the yield. Nowadays, many proteins of commercial interest are produced in E. coli . 1 In the lab, the recombinant production of proteins in E. coli is the method of choice for their structural and functional study.

The features above tipped the scale in favor of the B line for the production of recombinant proteins. A major breakthrough came with the generation of the derivative BL21(DE3) by Studier and Moffat. 16 The BL21(DE3) strain carries a copy of phage T7 RNA polymerase (T7RNAP) under control of the lac UV5 promoter. Genes of interest are cloned under control of a T7 promoter in expression plasmids, and protein production begins upon addition of the gratuitous inducer isopropyl β‐D‐1‐thiogalactopyranoside (IPTG). This system provides the user with full control of the induction of protein synthesis with high selectivity and activity. All of these attributes established BL21(DE3) as the preferred host. Its genome has been sequenced and recently curated (GenBank entry CP001509.3). 8 Later derivatives improved other aspects inherent to the production of foreign proteins in E. coli , such as rare codon usage and disulfide bond formation. However, the system based on the T7RNAP and its promoter has remained virtually the same since its inception.

Accumulation of any given protein is a fine balance between its biosynthesis and its degradation. Deficiency in key proteolytic systems in E. coli B can extend the lifetime of recombinant proteins in certain cases. An IS186 insertion in the promoter of the lon gene eliminates this major protease. 13 Also, a deletion in the gene for the outer membrane protease OmpT can lead to less proteolysis during purification, as cell disruption causes a massive release of this protease. 14

Compared to K‐12, E. coli B produces less acetate during cultivation with glucose as the sole carbon source. 10 The pH of cultivation media is typically around 7.0, so acetate is in equilibrium with acetic acid, which in turn diffuses into the cell, altering internal pH control and impairing cell viability. 11 Also, during cultivation, acid production further lowers pH causing more acetic acid to accumulate. In line with this, Wang et al. showed that cultivation at pH 7.5–8.5 improves the production of recombinant proteins by lowering acetate stress. 12

In rich media, common E. coli strains have a doubling time of about 20 minutes. Differences in growth time are more notorious in minimal media: B cells typically grow faster than other lines (such as the cloning strain K‐12 and its derivatives) under these conditions. 9 Many researchers rediscover that the B cells are nonmotile: When culture vessels are left unattended on the bench, B cells sink to the bottom while cultures with cloning strains remain turbid. Due to a large deletion in fli genes required for flagellar proteins biosynthesis, B cells do not have flagella. 6 Some authors have proposed that this in part explains the fast cell growth of B cells as flagellar biosynthesis and assembly is an energy‐intensive process. 9

Amid the biotechnological revolution that occurred in the last decades of the 20th century, different E. coli lines were tested for their characteristics in the production of recombinant proteins. The B line emerged as the winner given its salient features. One derivative, BL21(DE3), has become the preferred host for recombinant protein production. Genome sequencing of strains of the B line has helped understand the molecular basis of useful phenotypes for heterologous protein synthesis. 6 - 8

The process is greatly facilitated if additional strong selection or screening methods are used, most importantly in cases where the expression of the protein does not cause growth impairment. For example, the recombinant protein can be fused to an antibiotic selection marker or a fluorescent protein. 27 - 29 The underlying assumption is that if the marker protein is functional, then the heterologous protein must be correctly folded as well. Although some examples can be found (reviewed in Schlegel et al. 30 ), alas, mutations in the isolated over‐producing strains are typically not characterized. Also, successful outcomes can sometimes be protein‐dependent, that is, when the production of other proteins (even homologues) is tested, the isolated strain may fail to produce significant amounts. 31 This limits the applicability of the isolated strains and explains why, apart from C41(DE3) and C43(DE3), there have not been major advances in strain isolation of the B line with superior expression capabilities using mutagenesis approaches. Of importance, the de Gier lab reported the isolation and characterization of the E. coli membrane protein production strain Mutant56(DE3), which showed better yields for the production of membrane proteins than C41(DE3) and C43(DE3). 32 In this strain, the mutation that allowed better protein production changed one amino acid in T7RNAP, weakening its binding to the T7 promoter.

In a discovery‐driven approach, mutagenesis with chemical agents or transposons, directed evolution methods or spontaneous mutations under strong selection pressures are used to alter the bacterial genome. Then, selections and screening assays are performed to pick strains displaying better protein production than the wild‐type. In this strategy, previous knowledge of mechanisms that hamper protein production is not necessary. Discovery‐driven approaches work well for optimizing the synthesis of toxic proteins. For example, the expression of toxic recombinant proteins causes cell death or poor growth. So, growth of IPTG‐resistant colonies may indicate that genome modification in that strain allowed for gene expression of the offending protein. The popular C41(DE3) and C43(DE3) strains were isolated in this way. 25 Protein production was tolerated because homologous recombination of the lac UV5 promoter of the T7RNAP gene in the original BL21(DE3) strain with the lac promoter of the lac operon resulted in a weaker promoter (p lac Weak). Thus, T7RNAP levels are reduced in comparison to BL21(DE3) and then, sublethal amounts of recombinant protein can be obtained. 26

Coexpression of proteins requires cloning the corresponding genes in compatible plasmids. Alternatively, the Duet series of plasmids from Novagen allows for cloning two coding sequences under separate T7 promoters in the same plasmid. Up to eight proteins can be coexpressed using compatible Duet plasmids. However, the stoichiometric control of protein level is almost impossible. In the Duet system, all coding sequences are under the influence of the same promoter, which can pose a problem if the levels of the components of a heterologous metabolic pathway need to be fine‐tuned for optimum yield. In an impressive work combining genetic engineering, rational design, and directed evolution, Meyer et al. created the Marionette strains. 24 In these cell lines (available in MG1655, DH10B, and BL21 backgrounds), the genes for 12 different genetic regulators were inserted in the genome. The chosen transcription factors control gene expression by binding (positive regulator) or detaching (negative regulator) from their cognate promoter and respond to chemical inducers added to the medium. Theoretically, up to 12 genes of interest can be cloned (the authors tested a five‐enzyme lycopene biosynthetic pathway) in plasmids, with each coding sequence under the control of a different inducible promoter of the sensor array. In this way, the level of each recombinant protein can be manipulated at will by adding the proper inducer, with little cross‐reactivity, high specificity, low leakiness, and ample dynamic range.

The problem of codon bias in recombinant protein production can be addressed by supplementing extra copies of genes coding for rare tRNAs. These extra copies were included in the pRIL and pRARE plasmids, leading to the strains BL21(DE3) CodonPlus and Rosetta, respectively. Interestingly, Lipinszki et al. integrated the six least abundant tRNA species into a ribosomal operon in the chromosome of E. coli BL21(DE3). 23 In this way, expression of rare tRNAs is coupled to the actual needs for translational capacity. Additionally, the burden of a second plasmid and supplementation of its corresponding selection antibiotic are avoided. The resulting strain, named SixPack, outperformed (or performed as well as) both BL21(DE3) and Rosetta2(DE3) in the production of recombinant proteins.

Another recent development pertains tighter control of basal expression and tunability of protein production by dual transcription‐translational control aided by riboswitches. For extremely toxic proteins, leaky expression can lead to cell death. The BL21(DE3) pLysS strain contains a plasmid (pLysS) with the gene for the T7 lysozyme, a natural inhibitor of T7RNAP. 19 This provides an efficient mechanism to inhibit the small amount of T7RNAP synthesized in the absence of inducer, due to its stochastic transcription from the lac UV5 promoter. However, even with this tighter control, leaky expression is still known to occur. Moreover, IPTG is a potent inducer of gene expression. Fine‐tuning its concentration to reduce the expression of a toxic gene product is a laborious task. In this case, tunable systems allowing for easy and predictable optimization of inducer concentration are highly desirable. The Dixon lab devised the RiboTite system, consisting of the BL21(IL3) strain (or more recently, the BL21[LV2] strain 20 ) and the pETORS plasmid. 21 BL21(IL3) possesses the T7RNAP gene with a very similar configuration as in BL21(DE3). However, an orthogonal riboswitch sequence is contained in the 5′ untranslated region of the T7RNAP gene and also before the gene of interest cloned in the pETORS expression plasmid. A riboswitch is a segment in a messenger RNA that folds into intricate structures that block gene expression by interfering with translation. Binding of an effector molecule induces a change in conformation permitting the regulation of expression post‐transcriptionally. In RiboTite, the riboswitch is a modified version of the adenine‐sensing add A‐riboswitch from Vibrio vulnificus engineered to bind the effector pyrimido‐pyrimidine‐2,4‐diamine (PPDA). 22 So, the expression of the foreign coding sequence can only occur in the presence of both IPTG and PPDA, which effectively reduces leaky expression to almost undetectable levels. Moreover, the amount of recombinant protein can be modulated by tuning PPDA concentration.

Popular strains such as BL21(DE3) pLysS (for control of basal expression), CodonPlus or Rosetta (for codon bias correction), Origami, SHuffle or CyDisCo (for correct disulfide bond formation), Tuner or Lemo21 (for tunable induction), and many others were rationally constructed using this approach. 4 In the last 5 years, several strains were described in which known molecular systems were exploited to enhance protein production. For example, protein secretion via the Tat secretion pathway allows for export of fully folded proteins (fused to a TorA signal peptide) up to 150 kDa in molecular mass. This system can be an excellent alternative to the Sec pathway; however, its low abundance may result in poor yields. 17 Browning et al. engineered BL21 by placing the strong inducible promoter p tac upstream of the chromosomal tatABCD operon. The resulting strain, dubbed TatExpress BL21, was shown to secrete 30 mg L −1 of recombinant human growth hormone into the periplasm. 18

As explained above for BL21(DE3) and derivatives, strain engineering has advanced the capabilities of E. coli as a cell factory (Table 1 ). Genome manipulation for maximizing heterologous protein production can be undertaken by two different approaches: hypothesis‐driven and discovery‐driven. A hypothesis‐driven strategy aims to manipulate molecular components of a known pathway or process and directly tackling the problem that hampers protein production. In a discovery‐driven strategy, cells are mutagenized and then screened or selected for increased protein production.

Advances in Plasmid Design

The sequence encoding for the protein of interest is generally cloned in an expression plasmid. The plasmid must contain at least a promoter and a translation initiation region to direct the expression of the coding sequence, a selectable marker, and replication elements. Additionally, the vector may contain other genetic elements to facilitate the detection, purification, or solubilization of the protein, such as sequences encoding for affinity tags and fusion partners. In the last few years, there have been many advances in all of these features, which are described later.

Promoters and translation initiation regions Promoters influence protein yield by modulating two key aspects in the expression of heterologous genes: stringency of repression before induction (low stringency leading to high levels of leaky expression) and rate of transcription after inducer addition. An “ideal” promoter should not allow for basal expression and should permit the synthesis of high amounts of messenger RNA (“strong” promoter) after induction. Also, manipulation of RNA levels by adjusting inducer concentration (tunability) is another desirable trait. As already explained, the T7 promoter is the most widely used for recombinant protein production. It is present in the pET series of vectors and is probably one of the strongest promoters known. However, when using the BL21(DE3)/pET vectors system, the levels of recombinant proteins are difficult to manipulate, as IPTG is a potent inducer even at very low concentrations. For this reason, other alternatives such as the tunable araBAD promoter may be more suitable.33 The list of promoters used for recombinant protein production is long and includes more than 10 different options, where the user can select for strong/weak, tunable/constitutive, or chemically inducible/thermally inducible promoters.4, 34 In the last few years, tools for the determination of the best promoter for a given coding sequence for the protein of interest were generated. Yang et al. have designed a vector suite for the screening of 10 IPTG‐inducible promoters (T7, A3, lpp, tac, pac, Sp6, lac, npr, trc, and syn). These promoters are contained in plasmids with a PLICing position, so that a target previously amplified with phosphorothioated primers can be cloned into 10 vectors in a single step without using restriction enzymes.35 Similarly, Cheng et al. devised a method for rapid promoter replacement, called ReToAd (“retreat to advance”).36 Seven promoters were cloned in a specific region of the vector containing the gene of interest by whole‐plasmid amplification and touchdown polymerase chain reaction (PCR) in a single reaction. Then, colonies were screened for optimized protein production. PLICable promoters and ReToAd are interesting methods to select the best promoter for any given protein. Undoubtedly, their dissemination and inclusion of other promoters are warranted. Of note, Anilionyte et al. discovered that variants of the promoter PthrC are self‐inducible by growth phase transition (specifically, when the culture reaches an optical density around 0.5, a common value used for inducer addition).37 The use of PthrC‐derived promoters in expression vectors eliminates monitoring cell density and manual induction, which would be useful in high‐throughput trials of protein production. Translation initiation regions contain a Shine‐Dalgarno sequence and a linker region to the start codon. These sequences are optimized for protein production in expression vectors. However, due to cloning procedures, new suboptimal sequences may be generated between the promoter and the start of the gene of interest (“cloning scars”). Mirzadeh et al. proposed a PCR‐based method to generate a library of the vector–coding sequence junction.38 Briefly, the cloning scar sequence was changed in all possible combinations by PCR with degenerate primers. The library was transformed in E. coli, and protein production was screened via cell sorting as the coding sequence of interest was fused to green fluorescent protein (or more recently, by translationally coupling the coding sequence to an antibiotic resistance gene39). The authors reported a 1000‐fold difference between low and high expression vectors, which highlights the importance of optimizing the translation initiation region. Nevertheless, new restriction‐free cloning methods are becoming increasingly popular and permit cloning of the gene without generating cloning scars.40

Selection markers Plasmid maintenance is ensured by including a selection marker. Under the selection pressure, only cells containing the plasmid survive. In protein overproducing strains, accumulation of the heterologous protein causes a metabolic burden and nonproducing cells eventually overtake the culture, resulting in yield decline over time.41 Commonly, plasmids contain genes conferring antibiotic resistance. Media supplementation with antibiotics is simple, cost‐effective, and convenient and is by far the most common strategy for protein synthesis at a lab scale. However, at larger scales, the use of antibiotics is frowned upon due to its associated costs, environmental pollution, and regulatory restrictions. Much progress has been made to develop antibiotic‐free selection systems, most importantly, in the area of plasmid‐addiction and nutrient prototrophies. Recently, Ali et al. designed expression vectors containing the gene encoding the enzyme enoyl‐acyl carrier protein reductase from Vibrio cholerae (fabV), which confers resistance to Triclosan, a nonantibiotic biocide polychloro phenoxy phenol.42 Protein production levels were the same when compared to cells carrying the same expression plasmid containing a β‐lactamase gene for selection. Also, the construction of high‐copy number expression plasmids with increased stability has been described. Primelles Eguia et al. used a par locus, a cis‐acting locus that allows stable plasmid inheritance, to assure retention of the plasmid pAR‐KanI.43 After 8 hours of induction, almost all cells maintained the vector in the absence of antibiotic. In contrast, if cells bearing the commercial pET28 vector are grown in the absence of antibiotic, only 5% retain the plasmid. The systems mentioned earlier select for plasmid‐containing cells, but they do not select for protein‐overproducing ones. Even if cells contain the expression vector, protein production can diminish over time due to accumulated mutations and insertion of mobile sequences in the coding sequences of interest.41, 44 Rubjberg et al. reported a very interesting approach that rewards overproducing cells. The system relies on product‐addiction for mevalonate production, so only over‐producing cells can survive.45 This feat was achieved by placing two nonconditionally essential genes under the control of the pBAD promoter in a mevalonate‐producing E. coli strain. Then, an engineered mevalonate‐responsive AraC variant was provided so that cells became product‐addicted. The resulting strain showed high‐yield and stable mevalonate production over 95 generations. Another advantage is that the system does not rely on external inputs, that is, supplementation of media. The system works for this particular case (mevalonate production) but this strategy and others relying on similar principles46, 47 pave the way for future advances.