Proteins traverse the width and breadth of cells to carry signals and cargo from one end to another, package and replicate DNA, build scaffolds to give cells their shapes, break down and take up nutrients, and so much more. But how often do we stop to ask: How did these diverse and sophisticated molecular machines come to be?

Despite proteins' profound impact on life, their origin is not well understood. What caused a string of amino acids to start doing something? Or are strings of amino acids inherently programmed to do things? These are questions with which researchers in the protein-origin field have been grappling.

Researchers have a better grasp of the processes of selection and evolution once a function appears in a peptide. “Once you have identified an enzyme that has some weak, promiscuous activity for your target reaction, it’s fairly clear that, if you have mutations at random, you can select and improve this activity by several orders of magnitude,” says Dan Tawfik at the Weizmann Institute in Israel. “What we lack is a hypothesis for the earlier stages, where you don’t have this spectrum of enzymatic activities, active sites and folds from which selection can identify starting points. Evolution has this catch-22: Nothing evolves unless it already exists.”



Where’s the starting point?

For more than a decade, researchers have been probing the protein-origin question using molecular biology and computer models. The group led by Michael Hecht of Princeton University has made libraries of proteins that are not derived from existing proteins that have undergone millennia of Darwinian selection. Hecht and colleagues made one particular library that contained more than a million polypeptide chains composed of hydrophobic and hydrophilic residues. They demonstrated that, after being expressed in Escherichia coli, the simple polypeptides were capable of folding.

With these folded sequences, Hecht and colleagues next tested if these entities were capable of performing any biochemical function, such as binding small molecules and cofactors and catalyzing reactions. “They don’t do them well, but they do them well above background noise,” says Hecht.

After that, Hecht’s group turned to E. coli strains deleted for genes that provide essential functions for survival. The investigators transformed these strains with their peptide library and found that a couple of their polypeptides were able to rescue the E. coli and let them grow on minimal medium. “Our proteins — made from scratch and never (having) been through evolution — can provide a life-sustaining function,” Hecht says.

In silico experiments complement data from bench-based experiments. Jeffrey Skolnick and Mu Gao at the Georgia Institute of Technology designed homopolypeptides and collapsed them using a structure prediction algorithm. They then selected sequences at random that were proteinlike when matched to folds found in the Protein Data Bank. They found that each cavity in the artificial structures had a match in real proteins. Plus, there weren’t that many cavities. The cavities had the inherent capacity to bind small molecules and other ligands. “You show in a system, which was simply proteinlike but there is no selection for function, that you got a lot of properties — the binding sites, the geometries, the protein-protein interfaces. This would suggest the system fundamentally has the capacity to engage in function. Maybe it’s crummy function, but it’s still function,” says Skolnick. “This is telling you the systems are primed to do biochemistry.”



“If you don’t have a driver for functionality, you will not get complexity”

But Jack Szostak at Harvard University and Andrei Lupas at the Max-Planck Institute for Developmental Biology in Germany say these experiments don’t go far back enough in time. Both think that function had to come well before polypeptide chains became long enough to fold. “Functionality must come before complexity, because something must drive the emergence of complexity,” Lupas says. “If you don’t have a driver for functionality, you will not get complexity” in the form of structure.

Getting function in the first place is tough going. Szostak did an experiment with Anthony Keefe in 2001. They tested 6 trillion peptides, each with 80 randomly selected amino acids, for ATP binding. “We were able to select out small, single-domain proteins that did bind ATP. But they were rare, on the order of one in 1011 sequences,” says Szostak. “Getting function from randomness is hard.” For selection to start happening to peptides, there has to be that spark of function. How that spark appears remains the big, elusive question in the field of protein origin.

Lupas says that evolution of peptides and proteins cannot be considered in isolation. He says it’s conceivable that RNA, considered to be the first biologically active molecule in the primordial soup, co-opted short abiotic peptides. These abiotic peptides, perhaps no more than five amino acids in length, were recruited to carry out some processes that ribozymes are unable to do, such as redox reactions with free radicals. Furthermore, ribozymes are not very thermostable and are easily hydrolyzed. Lupas says it’s possible that ribozymes partnered with abiotic peptides that were able to stabilize them.

As ribozymes picked up these abiotic peptides, the pool of these useful short peptides started to dwindle. “There wouldn’t be enough, so there will be a competitive situation, which would reward those ribozymes that could string up amino acids by themselves,” says Lupas. “If your ribozyme-based organism develops the ability to ligate amino acids, it will have an advantage over others because it doesn’t have to scavenge for the peptides.” This would lead to selection of peptides that have desirable functions. Once those functions were in place, the peptide could grow larger and more complex and begin to adopt folds and cavities.

Lupas thinks that function had to precede structure, because producing a complex structure is an incredibly hard job. “After 3.5 billion years of evolution, nature still has a substantial folding problem,” he states. He points out that, under normal circumstances, about one-third of a modern cell’s resources is devoted to protein quality control and turnover. “We’re not talking about a few proteases here and there. We’re talking about substantial resources of the cell just for this routine maintenance,” says Lupas. “You wouldn’t have to commit this amount of resources if protein folding was not problematic.” While Szostak agrees the hypothesis is elegant, he says there isn’t much experimental evidence to bear it out.

Szostak says that the origin of protein function also brings up the question of how many amino acids were around for making the first proteins. “There is pretty good evidence that at least some of the standard 20 amino acids came in late” in evolution, says Szostak. “Some of the simple, easy-to-make ones, like glycine and aspartate, were probably there right from the beginning.” The reduced number of amino acids plays into the folding issue, because there may be constraints in folding peptides made from a smaller number of amino acids.

Overall, what the field of protein evolution needs are some plausible, solid hypotheses to explain how random sequences of amino acids turned into the sophisticated entities that we recognize today as proteins. Until that happens, the phenomenon of the rise of proteins will remain, as Tawfik says, “something like close to a miracle.”