The two loops, shown in gold near the bottom of the protein structure, delimit the pocket where heme subunit resides and where oxygenation occurs. Credit: Gustavo Caetano-Anollés

Proteins are more than a dietary requirement. This diverse set of molecules powers nearly all of the cellular operations in a living organism. Scientists may know the structure of a protein or its function, but haven't always been able to link the two.

"The big problem in biology is the question of how a protein does what it does. We think the answer rests in protein evolution," says University of Illinois professor and bioinformatician Gustavo Caetano-Anollés.

Geologists have found remnants of life preserved in rock billions of years old. In some cases, preservation of microbes and tissues has been so good that microscopic cellular structures that were once associated with specific proteins, can be detected. This geological record gives scientists a hidden connection to the evolutionary history of protein structures over incredibly long time periods. But, until now, it hasn't always been possible to link function with those structures to know how proteins were behaving in cells billions of years ago, compared with today.

"For the first time, we have traced evolution onto a biological network," Caetano-Anollés notes.

Caetano-Anollés and graduate students Fayez Aziz and Kelsey Caetano-Anollés used networks to investigate the linkage between protein structure and molecular function. They built a timeline of protein structures spanning 3.8 billion years across the geological record, but needed a way to connect the structures with their functions. To do that, they looked at the genetic makeup of hundreds of organisms.

"It turns out that there are little snippets in our genes that are incredibly conserved over time," Caetano-Anollés says. "And not just in human genomes. When we look at higher organisms, such as plants, fungi and animals, as well as bacteria, archaea, and viruses, the same snippets are always there. We see them over and over again."

The research team found that these tiny gene segments tell proteins to produce "loops," which are the tiniest structural units in a protein. When loops come together, they create active sites, or molecular pockets, which give proteins their function. For example, hemoglobin, the protein that carries oxygen in blood, has two loops which create the active site that binds oxygen. The loops combine to create larger protein structures called domains.

Remarkably, the new study shows that loops have been repeatedly recruited to perform new functions and that the process has been active and ongoing since the beginning of life.

"This recruitment is important for understanding biological diversity," Caetano-Anollés says.

One important aspect of the study relates to the actual linkage between domain structure and functional loops. The researchers found that this linkage is characterized by an unanticipated property that unfolds in time, an "emergent" property known as hierarchical modularity.

"Loops are cohesive modules, as are domains, proteins, cells, organs, and bodies." Caetano-Anollés explains. "We are all made of cohesive modules, including our human bodies. That's hierarchical modularity: the building of small cohesive parts into larger and increasingly complex wholes."

Hierarchical modularity also exists in manmade networks, such as the internet. For example, each router represents a "node" that pushes information to different computers. When millions of computers interact with each other online, larger and more complex entities emerge. Caetano-Anollés suggests that the evolution of manmade networks could be mapped in the same way as the evolution of biological networks.

"From a computer science point of view, few people have been exploring how to track networks in time. Imagine exploring how the internet grows and changes when new routers are added, are disconnected, or network with each other. It's a daunting task because there are millions of routers to track and internet communication can be highly dynamic. In our study, we are showcasing how you can do it with a very small network," Caetano-Anollés explains.

The methods developed by Caetano-Anollés and his team now have the potential to explain how change is capable of structuring systems as varied as the internet, social networks, or the collective of all proteins in an organism.

Explore further Study of giant viruses shakes up tree of life

More information: M. Fayez Aziz et al, The early history and emergence of molecular functions and modular scale-free network behavior, Scientific Reports (2016). Journal information: Scientific Reports M. Fayez Aziz et al, The early history and emergence of molecular functions and modular scale-free network behavior,(2016). DOI: 10.1038/srep25058