We know how to get snapshots of what proteins look like. These static pictures tell us where all the atoms of a protein reside within a crystal, which gives us a sense of their structure and lets us design drugs that fit neatly within that structure, altering its activity.

But, in actual cells, proteins are nothing like the static, rigid structures found in crystals. Instead they writhe, buffeted by Brownian motion and constantly shifting among similar energy states. Until we develop a microscope that can resolve all this motion, the best we can do is to run molecular simulations on our computers. Unfortunately, most proteins have a lot of atoms to keep track of, which makes those simulations extremely computationally expensive.

Now, some researchers have figured out how to run the simulations on Google's cloud computing architecture. Although each of the individual simulations is short, they can be aggregated to provide a picture of long-term behavior. And, with this method of aggregating them in place, the system should be able to work with just about any cloud service available.

Typically, it's difficult to split a molecular simulation up into smaller jobs. Distant parts of a protein remain physically connected through a series of chemical bonds, and the structure can involve folding and turns that bring distant parts of the protein close together in space. As a result, each step of the simulation typically has to consider the entire protein at once, and the next step depends on the output of the first. That essentially makes any simulation a single large computation. To get more than a few milliseconds takes a lot of computational power.

And it's important to get more than a few milliseconds. The proteins may shift back and forth between hundreds of states, and the differences between any two can determine whether the protein is active or inactive, susceptible to a drug or not. So, you need to run the simulation longer to make sure that it has time to sample a lot of these states.

Or maybe you don't. Some researchers at Stanford, collaborating with a pair of Googlers, have moved the simulation code over to Google's Exacycle cloud computing system. But the cloud still can't run a single simulation as a massively parallel computation. So, instead, the system ran tens of thousands of simulations at once, each of them for a relatively short amount of time. Individually, they were short (only two milliseconds), but collectively, they explored a lot of the potential energy landscape the protein explores. The trick was writing the code that would merge all the individual simulations into a single picture of the protein's behavior.

Why does this work? Because a protein typically has a limited number of highly stable states and tends to shift back and forth between those and other states that are only occupied briefly. As one of the authors, Russ Biagio Altman, put it to Ars, "there is not a very long 'memory' of previously visited states." There are very few cases where an important structural state is reached by a series of unlikely intermediates, so cutting the simulation short doesn't miss all that much. In fact, Altman said that the system could be set up to prioritize any simulations that come across a rare state.

In this case, the authors looked at a G-protein coupled receptor (GPCR), a member of a huge class of proteins that's involved in a lot of key biological processes and implicated in a number of diseases. The crystal structures for the active and inactive states have been solved, and the authors started out a number of simulations in each. Many of these evolved into a common intermediate state, and all three of these states made short excursions into other temporary states. The authors also showed that the simulations could incorporate things like drugs or the normal chemicals that interact with the receptor.

It's still not clear that this system will work for every protein; some may have transitions that are simply slow enough that the short simulations won't capture important behavior. But it seems likely that a lot of proteins could be handled with this technique. And the approach should be more general than simply running it on Google's system. Altman says it would work on dedicated computing clusters, and "we should be able to do this on other cloud infrastructures going forward."

Nature Chemistry, 2013. DOI: 10.1038/nchem.1821 (About DOIs).