Over the past decade, decreases in the costs of chemically synthesizing DNA and improved methods for assembling DNA fragments have enabled researchers to scale up synthetic biology to the level of generating entire chromosomes and genomes. So far, synthetic DNA has been constructed with up to one million base pairs, notably a set of chromosomes from the yeast Saccharomyces cerevisiae and several versions of the genome of the bacterium Mycoplasma mycoides1,2. Now, writing in Nature, Fredens et al.3 report the completion of a 4-million-base-pair synthetic version of the Escherichia coli genome. This is a landmark in the emerging field of synthetic genomics, and finally applies the technology to the laboratory’s workhorse bacterium.

Read the paper: Total synthesis of Escherichia coli with a recoded genome

Synthetic genomics offers a new way of understanding the rules of life, while at the same time moving synthetic biology towards a future in which genomes can be written to design. The pioneers in the field — the researchers at the J. Craig Venter Institute in Rockville, Maryland — have used this method to better define the minimal set of genes required for a free-living cell. By adopting an approach that involves redesigning genome segments by computer, chemically synthesizing the fragments and then assembling them, these pioneers succeeded2 in reducing the size of the M. mycoides genome by around 50%. Doing the same with just genome-editing tools would be much more laborious, as past work with E. coli demonstrates: here, gene-deletion methods have removed, at best, only 15% of the genome4.

Fredens and colleagues used this reduced genome from E. coli as the template for a synthetic genome with another kind of minimization in mind — codon reduction. The genetic code has inherent redundancy: there are 64 codons (triplets of ‘letters’, or bases) to encode just 20 amino acids plus the ‘start’ and ‘stop’ points that mark the beginning and end of a stretch of protein-coding sequence. This redundancy means, for example, that there are six codons that encode the amino acid serine, and three possible stop codons. Through design, synthesis and assembly, Fredens et al.3 have been able to construct an E. coli genome that uses only 61 of the 64 available codons in its protein-coding sequences, replacing two serine codons and one stop codon with synonyms (codons that are ‘spelt’ differently but give the same instruction). Past work using genome-editing tools has already produced a synthetic E. coli that uses just 63 of the 64 codons, but this required only the stop codons with the sequence TAG (of which there were just 321 around the genome) to be changed to an alternative stop codon5. Reduction to 61 codons demanded that a whopping 18,214 codons be changed, necessitating a genome-synthesis approach.

Fredens and colleagues built their synthetic E. coli genome by using large-scale DNA-assembly and genome-integration methods that they had developed previously6 to probe the limits of codon changes in E. coli. In their approach (Fig. 1), DNA is computationally designed, chemically synthesized and assembled in 100-kilobase fragments in vectors in S. cerevisiae; these vectors are then taken up by E. coli and integrated into the genome in the direct place of the equivalent natural region. Iterating this process five times resulted in 500-kilobase sections of DNA being replaced by synthetic versions. Eight strains of E. coli were produced in this way, each harbouring synthetic DNA sections that covered a different region of the genome. These sections were then combined using conjugation methods to make the complete synthetic genome.

Figure 1 | Design and construction of a recoded genome. a, Fredens et al.3 recoded three base triplets (codons) — TCG and TCA, which encode the amino acid serine, and TAG, a stop codon that marks the end of a protein-coding sequence — to alternatives that have the same functions (AGC, AGT and TAA respectively) in the genome of the bacterium Escherichia coli. b, In some genomic locations, open reading frames (ORFs; protein-coding regions) overlap, and a change in the codons of one ORF might produce an unwanted change in the overlapping region. Fredens et al. ‘refactored’ these ORFs to separate them, as illustrated for ORF1 and ORF2 (the two ORFs on the left are ‘read’ in the same direction; the two on the right are read in opposite directions). c, Redesigned DNA was synthesized and assembled into 100-kilobase fragments in the yeast Saccharomyces cerevisiae; fragments were then combined into sections and integrated into the E. coli genome. The sections were brought together to generate the complete functional synthetic genome.

The large-scale construction was impressively successful, with very low off-target mutation rates, but was not without its challenges. Many genes in the E. coli genome partially overlap with others, and in 91 cases the overlapping regions contained codons that needed to be changed. This is complex because synonymous alterations in one protein-coding sequence might alter the amino acids encoded by the overlapping one. To tackle this, the team ‘refactored’ 79 locations in the genome, duplicating the sequence to separate out overlapped coding sequences into individual recoded ones (Fig. 1). Although this approach was generally successful, it did require careful debugging in a few cases in which refactoring also altered gene regulation.

The final strain proved viable and was able to grow in a range of typical laboratory conditions, albeit a little less vigorously than its natural counterpart. It no longer uses the stop codon TAG or the two serine codons TCG and TCA, so the cellular machinery that recognizes these can now be either deleted or reassigned to recruit ‘non-canonical’ amino acids beyond the usual 20 used by most living cells. Such recruitment has already been shown to be useful in the 63-codon E. coli, both for biotechnology projects, in which non-canonical amino acids are encoded into desired sequence positions to provide residues that can take part in chemical reactions that natural proteins can’t; and for biosafety reasons, in that the natural transfer of readable DNA-encoded information in and out of the synthetic E. coli is limited because the cell operates with a slightly different genetic code from the rest of the natural world5. Expect all of these applications to be expanded in the new 61-codon E. coli, which has the potential to encode the use of more than one non-canonical amino acid, and to generate a more stringent genetic firewall (because 3 of the 64 codons are no longer recognized).

Synthesis of a 4-million-base-pair genome and reduction of the genetic code to 61 codons are new records for synthetic genomics, but might not be for much longer. The international Sc2.0 consortium is closing in on synthesizing all 16 chromosomes of the 12-million-base-pair S. cerevisiae genome — the first synthetic genome of a eukaryotic organism, the group that includes plants, animals and fungi — and the synthesis of a 57-codon E. coli genome is also under way1,7. A genome of the bacterium Salmonella Typhimurium that has two fewer codons than the natural organism is also being constructed8. This could one day enable bacteria with synthetic genomes to be used as cell-based technologies in the human gut.

From a technological standpoint, the most interesting aspect of all these different projects is that the workflows for synthetic-genome construction are remarkably similar, with kilobase sections of synthesized DNA being assembled (by the process of homologous recombination) into 50- to 100-kilobase pieces in yeast cells, and these pieces then being used to replace natural sequences inside the target organism (by selectable recombination methods). Standardization of methods will enable steps to be automated and more research groups to enter the field. Genome minimization and codon reduction are just the first uses of this new technology, which could one day give us functionally reorganized genomes and genomes that are custom designed to direct cells to perform specialized tasks.