What do you do if you need more than 150,000 CPU cores but don't have millions of dollars to spend on a supercomputer? Go to the Amazon cloud, of course.

For the past few years, HPC software company Cycle Computing has been helping researchers harness the power of Amazon Web Services when they need serious computing power for short bursts of time. The company has completed its biggest Amazon cloud run yet, creating a cluster that ran for 18 hours, hitting 156,314 cores at its largest point and a theoretical peak speed of 1.21 petaflops. (A petaflop is one quadrillion floating point operations per second, or a million billion.)

To get all those cores, Cycle's cluster ran simultaneously in Amazon data centers across the world, in Virginia, Oregon, Northern California, Ireland, Singapore, Tokyo, Sydney, and São Paulo. The bill from Amazon ended up being $33,000.

USC chemistry professor Mark Thompson needed the cluster to design materials that might be well-suited to converting sunlight into solar energy.

"For any possible material, just figuring out how to synthesize it, purify it, and then analyze it typically takes a year of grad student time and hundreds of thousands of dollars in equipment, chemicals, and labor for that one molecule," Cycle Computing CEO Jason Stowe wrote in a blog post today.

Instead of doing that, Thompson uses simulation software made by Schrödinger. With that software running on Amazon, Thompson was able to simulate 205,000 molecules and do the equivalent of 2.3 million hours of science (counting the compute time for each core separately). The cluster ran only last week, so it's too early to find out what its impact on solar science will be. Still, from a computing standpoint, it's impressive.



That’s a “petaflop,” not a petaflop

While Stowe says the Amazon cluster hit 1.21 petaflops, that's the theoretical peak speed rather than the actual performance. In the Linpack benchmark used to test supercomputer speeds, the theoretical peak is always reported, but the real-world results are what count when ranking the world's fastest machines.

The Cycle cluster on Amazon would have a much lower real-world max on the Linpack benchmark. To score high, you need machines that are physically close to each other to reduce latency, Stowe said. Cycle's cluster was spread around the world and did not require a blazing-fast interconnect because the calculations could be performed independently by each virtual machine.

Supercomputing applications tend to require cores to work in concert with each other, which is why IBM, Cray, and other companies have built incredibly fast interconnects. Cycle's work with the Amazon cloud has focused on HPC workloads without that requirement.

"There are whole categories of problems that are pleasantly parallel, and in those cases the [Linpack maximum] number is really not as important because we did intentionally make use of the entire 1.2 petaflops because they were all concurrently executing the workloads," Stowe told Ars. "Maybe there does need to be a different metric for analytics and big data and genomics—and all these pleasantly parallel workloads that are becoming more pervasive."

In the most recent Top 500 list, Amazon itself used its cloud to create the world's 127th fastest supercomputer, with 17,024 cores, a real-world max of 240.1 teraflops, and theoretical peak of 354.1 teraflops, nearly a third of the peak number claimed by Stowe.

Building the cluster

The Cycle cluster's 156,314 cores were spread across 16,788 instances, an average of 9.3 cores per virtual machine.

Cycle kept costs down by mostly using Amazon's auction-style spot marketplace, buying up a variety of instance types. Cycle used a mix of compute-optimized and general purpose instances, including Amazon's cc2.8xlarge instance with 32 cores; the cr1.8xlarge with 32 cores; m3.xlarge with 8 cores; and m3.2xlarge with four cores.

"To deploy this cluster, our software [CycleCloud] automated bidding, acquiring, testing, and assembling this large environment, plus distributing the data and the workload," Stowe wrote.

Cycle also used Opscode's Chef software as well as a new task distribution system Cycle developed, called Jupiter. Jupiter can schedule work for massive amounts of compute cores across regions and data centers, and this work can continue running even when Amazon virtual machines, availability zones, or regions fail.

"In order to reliably move workload tasks between different cloud computing regions on AWS, we needed to build software with low overhead that would be resilient to failure and able to scale to massive sizes," Stowe wrote. "We needed something that supported millions of cores doing tens of millions of tasks. Jupiter was designed to do just this."

Cycle didn't charge Thompson's research team any fee for building the cluster beyond the $33,000 it owed Amazon, giving it a university discount. Most other customers who want Cycle's help with cloud supercomputing will have to pay the usual prices, though.