About 4 billion years ago, molecules began to make copies of themselves, an event that marked the beginning of life on Earth. A few hundred million years later, primitive organisms began to split into the different branches that make up the tree of life. In between those two seminal events, some of the greatest innovations in existence emerged: the cell, the genetic code and an energy system to fuel it all. All three of these are essential to life as we know it, yet scientists know disappointingly little about how any of these remarkable biological innovations came about.

“It’s very hard to infer even the relative ordering of evolutionary events before the last common ancestor,” said Greg Fournier, a geobiologist at the Massachusetts Institute of Technology. Cells may have appeared before energy metabolism, or perhaps it was the other way around. Without fossils or DNA preserved from organisms living during this period, scientists have had little data to work from.

Fournier is leading an attempt to reconstruct the history of life in those evolutionary dark ages — the hundreds of millions of years between the time when life first emerged and when it split into what would become the endless tangle of existence.

He is using genomic data from living organisms to infer the DNA sequence of ancient genes as part of a growing field known as paleogenomics. In research published online in March in the Journal of Molecular Evolution, Fournier showed that the last chemical letter added to the code was a molecule called tryptophan — an amino acid most famous for its presence in turkey dinners. The work supports the idea that the genetic code evolved gradually.

Using similar methods, he hopes to decipher the temporal order of more of the code — determining when each letter was added to the genetic alphabet — and to date key events in the origins of life, such as the emergence of cells.

Dark Origins

Life emerged so long ago that even the rock formations covering the planet at that time have been destroyed — and with them, most chemical and geological clues to early evolution. “There’s a huge chasm between the origins of life and the last common ancestor,” said Eric Gaucher, a biologist at the Georgia Institute of Technology in Atlanta.

Scientists do know that at some point in that time span, living creatures began using a genetic code, a blueprint for making complex proteins. It is those proteins that carry out the vital functions of the cell. (The structure of DNA and RNA also enables genetic information to be replicated and passed on from generation to generation, but that’s a separate process from the creation of proteins.) The components of the code and the molecular machinery that assembles them “are some of the oldest and most universal aspects of cells, and biologists are very interested in understanding the mechanisms by which they evolved,” said Paul Higgs, a biophysicist at McMaster University in Hamilton, Ontario.

How the code came into being presents a chicken-and-egg problem. The key players in the code — DNA, RNA, amino acids, and proteins — are chemically complicated structures that work together to make proteins. But in modern cells, proteins are used to make the components of the code. So how did a highly structured code emerge?

Most researchers believe that the code began simply with basic proteins made from a limited alphabet of amino acids. It then grew in complexity over time, as these proteins learned to make more sophisticated molecules. Eventually, it developed into a code capable of creating all the diversity we see today. “It’s long been hypothesized that life’s ‘standard alphabet’ of 20 amino acids evolved from a simpler, earlier alphabet, much as the English alphabet has accumulated extra letters over its history,” said Stephen Freeland, a biologist at the University of Maryland, Baltimore County.

The earliest amino acid letters in the code were likely the simplest in structure, those that can be made from purely chemical means, without the assistance of a protein helper. (For example, the amino acids glycine, alanine and glutamic acid have been found on meteorites, suggesting they can form spontaneously in a variety of environments.) These are like the letters A, E and S — primordial units that served as the foundation for what came later.

Tryptophan, in comparison, has a complex structure and is comparatively rare in the protein code, like a Y or Z, leading scientists to theorize that it was one of the latest additions to the code.

That chemical evidence is compelling, but circumstantial. Enter Fournier. He suspected that by extending his work on paleogenomics, he would be able to prove tryptophan’s status as the last letter added to the code.

The Last Letter

Scientists have been reconstructing ancient proteins for more than a decade, primarily to figure out how ancient proteins differed from modern ones — what they looked like and how they functioned. But these efforts have focused on the period of evolution after the last universal common ancestor (or LUCA, as researchers call it). Fournier’s work delves further back than any other previous efforts. To do so, he had to move beyond the standard application of comparative genomics, which analyzes the differences between branches on the tree of life. “By definition, anything pre-LUCA lies beyond the deepest split in the tree,” he said.

Fournier started with two related proteins, TrpRS (tryptophanyl tRNA synthetase) and TyrRS (tyrosyl tRNA synthetase), which help decode RNA letters into the amino acids tryptophan and tyrosine. TrpRS and TyrRS are more closely related to each other than to any other protein, indicating that they evolved from the same ancestor protein. Sometime before LUCA, that parent protein mutated slightly to produce these two new proteins with distinct functions. Fournier used computational techniques to decipher what that ancestral protein must look like.

He found that the ancestral protein has all the amino acids but tryptophan, suggesting that its addition was the finishing touch to the genetic code. “It shows convincingly that tryptophan was the last amino acid added, as has been speculated before but not really nailed as has been done here,” said Nigel Goldenfeld, a physicist at the University of Illinois, Urbana-Champaign, who was not involved in the study.

Fournier now plans to use tryptophan as a marker to date other major pre-LUCA events such as the evolution of metabolism, cells and cell division, and the mechanisms of inheritance. These three processes form a sort of biological triumvirate that laid the foundation for life as we know it today. But we know little about how they came into existence. “If we understand the order of those basic steps, it creates an arrow pointing to possible scenarios for the origins of life,” Fournier said.

For example, if the ancestral proteins involved in metabolism lack tryptophan, some form of metabolism probably evolved early. If proteins that direct cell division are studded with tryptophan, it suggests those proteins evolved comparatively late.

Different models for the origins of life make different predictions for which of these three processes came first. Fournier hopes his approach will provide a way to rule out some of these models. However, he cautions that it won’t definitively sort out the timing of these events.

Fournier plans to use the same techniques to figure out the order in which other amino acids were added to the code. “It really reinforces the idea that evolution of the code itself was a progressive process,” said Paul Schimmel, a professor of molecular and cell biology at the Scripps Research Institute, who was not involved in the study. “It speaks to the refinement and subtlety that nature was using to perfect these proteins and the diversity it needed to form this vast tree of life.”

This article was reprinted on ScientificAmerican.com.