“I’m more and more inclined to think that we can actually penetrate at least some of the steps by which nature invented the code.” — Charles Carter

The genetic code is one of biology’s few universals*, but rather than being the result of some deep underlying logic, it’s often said to be a “frozen accident” — the outcome of evolutionary chance, something that easily could have turned out another way. This idea, though it’s often repeated, has been challenged for decades. The accumulated evidence shows that the genetic code isn’t as arbitrary as we might naively think. And more importantly, this evidence also offers some tantalizing clues to how the genetic code came to be.

This origins of the genetic code has long been a research focus of University of North Carolina biophysicist Charles Carter, and his UNC enzymologist colleague Richard Wolfenden. They authored a pair of recent papers that suggest behind the genetic code are actually two codes, reflecting key steps in its evolution. Dr. Carter kindly agreed to answer some questions about the papers, which present some interesting results that add to the growing pile of evidence that the genetic code is much less accidental that it may seem.

These papers deal with the machinery that implements the genetic code. Conceptually the code is simple: it is a set of dictionary entries or key-value pairs mapping codons to amino acids. But to make this mapping happen physically, you need, as Francis Crick correctly hypothesized back in 1958, an adapter. That adapter, as most of our readers know, is tRNA, a nucleic acid molecule that is “charged” with an amino acid.

But the existence of tRNAs creates another coding problem: how does the right tRNA get paired with the correct amino acid? The answer to this question is at the heart of the origin of the genetic code, and it’s the subject of these two recent papers. More about this story, as well as the first part of my interview with Dr. Carter, is below the fold.

So how do you get the correct codon/amino acid pairings on a tRNA? This, as you’ll remember from your biochemistry courses, is accomplished through a set of enzymes called tRNA synthetases that “charge” the tRNAs with their corresponding amino acid. tRNA synthetases are central to the secret of how the genetic code evolved. As Dr. Carter noted in a piece about “Thawing the ‘Frozen Accident'”, “The emergence of the genetic code was inseparable from the ancestry of the RNA adaptors and protein catalysts that implement it now.” He’s been interested in exactly that problem — how early RNAs and protein catalysts developed into the universal coding system we have today.

The striking result in the recent work by Carter and Wolfenden has to do with how tRNAs are recognized by certain tRNA synthetases. Their results indicate that tRNAs carry two codes: the well-known one in the anti-codon (the part that directly matches genetic code codons), and a second one in the “acceptor stem.” (See the figure below.)

As it turns out, these two codes aren’t arbitrary, as you might expect from a purely frozen accident perspective. Instead, the nucleic acid sequence of the acceptor stem and the anti-codon both code for distinct physical properties of amino acids. In other words, the codon/amino acid pairings reflect the different physical roles that different amino acids play in the structure of full, folded proteins. Or as Carter and Wolfenden put it:

These and other results suggest that genetic coding of 3D protein structures evolved in distinct stages, based initially on the size of the amino acid and later on its compatibility with globular folding in water.

And now, I’ll hand the mic over to Dr. Carter, who explains how he has approached this question. He also discusses some of the research into the origins of the genetic code conducted over the past several decades, citing some key papers that would make a great start for those who want to dig deeper. On Monday, we’ll dive into the details of the latest paper by Carter and Wolfenden, and present the second half of our interview.

MW: How did you come to this project – what led you to investigate the connection between the physical properties of amino acids and the sequences of their corresponding tRNAs? Is this a question you’ve focused on before, or did this emerge out of a different line of inquiry?



CC: I’ve been interested in the roles of polypeptide and RNA structure in the origin of life since my 1974 paper describing a stereochemical model for interactions of antiparallel extended beta polypeptide chains and RNA. That model led to my interest in the aminoacyl-tRNA synthetases, and hence to the existence of two such families that appeared to be unrelated to each other. Three earlier observations raised my curiosity about the coding properties of tRNA acceptor stem bases:

a. Others noted that synthetases and their cognate tRNAs both have two recognizable interacting modules, and had demonstrated that intact synthetases would acylate “minihelices” derived from tRNA acceptor stems. They suggested that synthetase catalytic domains might have functioned during an earlier stage of evolution to acylate the tRNA acceptor stems. They called this type of recognition an “operational RNA code”. I wondered how that code might have worked, and realized that the first step was to see if acceptor stem bases formed a code related to the properties of the amino acids.

b. My own work on aminoacyl-tRNA synthetases produced “Urzymes”, which are modified forms of the structurally invariant cores shared by all members of an enzyme superfamily that can be expressed separately and that retain major fractions of the catalytic activity of the modern enzymes. Urzymes from Class I and II aminoacyl-tRNA synthetases are roughly 10-30% the size of the full-length enzymes. This means that they are too small to recognize the tRNA anticodon, even though they acylate cognate tRNAs. That observation validated the suggestion that there was an operational code in the acceptor stem.

c. Richard Giegé had published a rather extensive survey of the specific bases in tRNA that were recognized by synthetases, and most of these “identity elements” were located either in the acceptor stem or in the anticodon. Thus, the database necessary to pose the question of how tRNA coding discriminates between different amino acids was suitably complete. I had become adept at using the regression methods necessary to look for coding relationships between amino acid properties and the identity elements summarized by Giegé. I began simply by tabulating all properties I could find of the 20 amino acids. My initial discovery was that the acceptor stem bases, which I had hoped would be correlated with hydrophobicity, were instead correlated strongly with amino acid masses, whereas the anticodon was closely correlated with their hydrophobicities. It became clear that the physical properties of the amino acids studied by my colleague, Dick Wolfenden furnished a compelling and experimentally based pair of independent attributes. In particular, he and I discovered that the vapor-to-cyclohexane transfer free energies were tightly correlated with amino acid masses (i.e., via their volumes). The questions I wanted to address were thus a natural fit with my curiosity, aptitudes, resources, and colleagues.



MW: It’s not obvious to me why the genetic code shouldn’t be almost completely arbitrary. It’s often been cited as an example of a frozen accident — nearly universal in among organisms, but simply the result of a chance evolutionary outcome. Aside from the redundancy of synonymous codons, which reduce the impact of mutations, from a naive perspective we wouldn’t expect the DNA codon sequence or the tRNA sequence to be related to the physical properties of amino acids.



And yet your work, building on previous studies, shows that there is a strong relationship — that the both the anti-codon and acceptor stem sequence correlate with the role of amino acids in folded proteins. Why is nucleic acid sequence so closely related to the physical properties of amino acids?

CC: There are lots of challenging questions wrapped up inside this one, and we’re beginning, I think, to be able at least to think about how to go about answering them. Of course, the evolution of life is indeed a probabilistic process—a game of chance—and for that reason it is a “frozen accident” at some level. However, I’m more and more inclined to think that we can actually penetrate at least some of the steps by which nature invented the code.

a. At a basic level one should appreciate the fact that the purpose of the genetic code is to code for protein structures. Thus, it should not be surprising that Wolfenden first identified correlations between the physical properties of amino acids, protein folding, and the genetic code.

b. Michael Yarus has used the selection of oligonucleotides from complex combinatorial libraries to demonstrate the existence of RNA aptamers that bind to specific amino acids, and to an intriguing extent, these short RNA molecules often contain either the appropriate codons or anticodons. These correlations appear with frequencies much in excess of that expected for random correlations, so they must be related in some fashion to the genesis of the code. However, there are two puzzling aspects of this work (i) cognate triplet associations have been identified for 7 different amino acids activated by Class I synthetases, but only 1 amino acid activated by a Class II synthetase. One might have expected a more balanced result. (ii) codons and anticodons are identified with essentially equal frequency for the 8 amino acids studied. That ambiguity points toward a role for double-stranded RNA in the stereochemical stage of code development, much as the sense/antisense coding of the two aminoacyl-tRNA synthetases does.

c. Marc Delarue had published a remarkable analysis of how the universal genetic code might have become settled by a series of binary choices in successive bases of the anticodon, leading at each step to specification of codons for one new Class I and one new Class II amino acid. That paper furnishes a paradigm that avoids the frozen accident to some extent, and was quite influential in how I thought about the problem. In particular, the redundancy of synonymous codons may have resulted from the successive recruitment of groups of tRNAs to the same amino acid from earlier stages of lower specificity as the code became defined. See my earlier commentary on this point.

d. James Zull identified a curious and fundamental aspect of the code by noting that the codons for amino acids that contribute to cores in folded proteins are actually always anticodons for amino acids associated with the surfaces of proteins. This inversion symmetry implies that proteins coded by opposite strands of the same gene are in some sense “inside out” (Chandrasekaran, et al., 2013).

e. Wolfenden and I have now actually thrown something of a monkey wrench into the notion described in (2) by pointing out that the coding properties in the acceptor stem likely preceded those in the anticodon bases. How that conundrum is eventually resolved should be fun to witness.

Stay tuned for part II.

tRNA image by Yikrazuul via Wikimedia Commons.

*OK, the genetic code, like everything else in biology, has exceptions, but these are clearly derivatives of the original code.