ADAPTED FROM AN IMAGE BY DENNIS SUN, MEZARQUE DESIGNA synthetic base pair, first reported in 2014, can now not only replicate inside living cells, but also encode and produce proteins containing atypical amino acids, according to a report in Nature today (November 29). This proof-of-principle advance now sets the stage for biochemists to generate proteins with entirely novel forms and functions to those that can be created by natural organisms, say the authors.

“What a beautiful paper,” says chemical and biological engineer Michael Jewett of Northwestern University who was not involved in the study. “What’s so special about the work is that the authors have captured the entire information flow of the central dogma—information storage, retrieval, and, ultimately, translation into a functional output—using this expanded genetic alphabet.”

In all forms of life on earth, genetic information is composed of a four-letter alphabet—the nucleotides G, C, A, and T, which form the base pairs G-C and A-T. But three years ago, chemistry professor Floyd Romesberg of the Scripps Research Institute in California and colleagues extended this alphabet, reporting the creation of additional artificial nucleotides, X and Y, that could pair up within DNA and take part in replication within a living bacterial cell.

See “Augmenting the Genetic Alphabet”

From that point on, Romesberg explains, the team’s ambition was “to get the molecules to function with polymerases and with ribosomes, in a cell”—that is, work with the cellular machinery that transcribes DNA into RNA and translates RNA into protein. Now, that ambition has been achieved.

First, the researchers introduced the artificial base pair, X-Y, into the gene for green fluorescent protein (GFP)—switching a codon in a non-critical part of the gene from TAC (which encodes the amino acid tyrosine) to AXC. Next, they created a transfer RNA that contained the corresponding anti-codon, GYT, and that carried a non-canonical amino acid called PrK—a researher-supplied amino acid that is rarely found in any natural proteins. The team then expressed these two genes inside specialized bacteria that support the retention of synthetic nucleotides within their DNA—called semi-synthetic organisms. Lo and behold, the microbes produced GFP proteins containing the non-standard amino acid.

Every protein produced in any living cell has been produced by decoding a four-letter alphabet. We have now reported the decoding of proteins with a six-letter alphabet. . . . That still makes the hair on the back of my neck stand up.—Floyd Romesberg,

The Scripps Research Institute

“It’s an engineering feat,” says biologist and biochemist Eugene Wu of the University of Richmond in Virginia who did not participate in the research.

“It’s a surprise that everything works so well,” adds biological chemist Nigel Richards of Cardiff University in the U.K. “It’s such a complicated system and there’s so many places where it could have failed.”

But it didn’t. The team went on to show that transcription and translation could occur with an alternative synthetic codon—GXC—and result in the inclusion of yet another non-canonical amino acid called pAzF. They used several assays, including mass spectrometry and click chemistry, to confirm the presence of the non-canonical amino acids within the proteins.

The artificial X-Y base pair is formed via hydrophobic attraction between the two elements, rather than hydrogen bonding, which normally forms the connections between the natural Watson-and-Crick base pairs. But the X and Y nucleotides are otherwise similar—sharing the sugar-phosphate-base composition of normal nucleotides.

“It’s really interesting that you don’t need hydrogen bonding to control information transfer,” says Richards.

“What it tells me,” says Wu, “is that being able to approximate the shape of base pairs is enough.”

However, the unusual pairing chemistry likely limits the number of such artificial base pairs that could be included in a DNA molecule, says Richards. “You get distortions in the helix” where these base pairs occur, he explains. So, a single base pair may be accommodated because “the surrounding Watson-and-Crick base pairs almost certainly compensate. . . . But if you had three in a row, then now it’s not so clear that you could maintain the helical structure, or that the enzymes would actually work.”

Other researchers, including Steven Benner of the Foundation for Applied Molecular Evolution in Florida, have created a number of novel base pairs that do utilize hydrogen bonding. These integrate into DNA without disrupting the double helix and can be present within the DNA in long stretches. However, so far, these nucleotides are only capable of replication, transcription, and translation in vitro, Benner explains in an email to The Scientist.

The most likely application for Romesberg’s artificial base pair is to “incorporate unnatural amino acids into specific parts of proteins,” says Wu, thereby vastly increasing the possibilities for biochemists to create proteins with new functions. Having a unique amino acid within a protein could, for example, enable the attachment of a drug or other molecule of interest to a specific point on a specific protein. And for those sorts of applications, the limitations of hydrophobic bonding probably “doesn’t matter,” says Richards.

Ultimately, the X-Y engineers wanted “to get molecules that function in a cell. . . . That was our focus,” says Romesberg. Before this paper, he adds, “every protein produced in any living cell has been produced by decoding a four-letter alphabet. We have now reported the decoding of proteins with a six-letter alphabet. . . . That still makes the hair on the back of my neck stand up.”

Y. Zhang et al., “A semi-synthetic organism that stores and retrieves increased genetic information,” Nature, doi:10.1038/nature24659, 2017.