When Andrew Ng trained Google's army of computers to identify cat videos using artificial intelligence, he hit a few snags.

Google's worldwide network of data centers housed more computers than he needed for the job, but harnessing all that power wasn't easy. When a server broke down—an everyday occurrence when you're using 1,000 machines at a time—it slowed down his calculations.

According to Ng, this is one of the big unreported stories in the world of deep learning, the hottest trend these days in big data and artificial intelligence: it's not necessarily suited to cloud computing—i.e. the techniques the Googles and the Amazons and the Facebooks typically use to run software across tens of thousands of machines.

Not long after Ng's AI experiment, a Stanford University researcher named Adam Coates came up with a better way to do things. He used a different type of microprocessor, called a graphical processing unit, to string together a three-computer system that could do the work of Google's 1,000-computer cloud. It was a remarkable accomplishment.

"That big disparity in resources that you needed to run these experiments is because on the one hand the GPUs are a lot faster themselves, but also because once you have a much smaller system that's much more tightly integrated, there are sort of economies of scale," says Coates, who now works for Andrew Ng at Chinese search giant Baidu.

Gamers know about GPUs because they often buy special GPU cards to speed up their video game experience. But even before Ng was experimenting at Google, academics knew about GPUs too. That's because they have exceptional math-crunching abilities, which make them ideal for deep learning. Initially, researchers only wrote deep learning software for single-computer systems. What Coates had done was show how to build deep learning networks over many GPU-based computers. And his work is now trickling down to so many others. Google and Facebook are using GPUs, but so are the labs that run some of the world's biggest supercomputers: Oak Ridge National Labs and Lawrence Livermore National Laboratory. There, they hope to take advantage of these powerful chips and the kind of ultrafast networking gear that have become widely used in supercomputers.

Supercomputers Meet Deep Learning

On the the east side of Oak Ridge National Laboratory's Tennessee campus, there's an 80-acre research facility called the Spallation Neutron Source, or SNS. Built in 2006, it blasts the world's most intense beams of neutrons at materials to help physicists and chemists understand the inner structure of how materials are formed.

The SNS produces too much data to be completely analyzed, hundreds of terabytes, but Oak Ridge scientists believe that they could use deep learning algorithms to more quickly identify patterns in the data—identifying patterns is a deep learning specialty—and improve their analysis.

The issue is widespread. It's not uncommon for scientific simulations to produce 700 terabytes of data each time they are run. That's more than all of the information housed in the Library of Congress. "In the science world there is a big data problem," says Robert Patton, an Oak Ridge computer scientist. "Scientists are now doing simulations that are simply producing too much data to analyze," he says.

But GPU-powered deep learning could change things—especially when fused with the super-fast networking capabilities of high-performance computers such as Oak Ridge's Titan supercomputer. The Titan is a little different from a Google cloud. It too spans thousands of machines, but it can more quickly swap data in and out of each machine memory and push to another machine. So, at Oak Ridge, researchers have resolved run deep learning algorithms on Titan.

Facebook uses GPUs too, but their lead deep-learning researcher, Yann LeCun isn't writing off the CPU entirely. "We use a GPU-based infrastructure to train our deep learning models. Conventional CPUs are just too slow," he says. "But new CPU chips—with many cores—may approach the performance of GPUs in the near future."

The Big Rewrite

Before they can realize their AI ambitions, the Oak Ridge supercomputer geeks must write deep-learning software that works on their towering supercomputers. And that will likely take years of work, the researchers say.

Andrew Ng's original Google work built models of cat videos that had 1 billion parameters in them—helping the algorithms to build an almost human understanding of the subtleties of the images int he videos and distinguish between, for example, a YouTube cat video and one featuring a chinchilla.

Over at Lawrence Livermore Labs, they've built software that includes 15 billion parameters—15 times as many as the original Google experiment—and they intend to go even higher. "We hope by the end of this project to have built the world's largest neural network training algorithm that's enabled by high performance computing," says Barry Chen, a Knowledge Systems and Informatics group leader with the labs.

The Google Way

What's Google doing? Well, it's moving to GPUs too. But it's taking a different route. The tech giant has already built a new and remarkable deep learning system called DistBelief, and it can run on either GPUs or CPUs within its sprawling cloud.

Google splits up the number crunching job into hundreds of smaller clusters of between 1 and 32 machines, which gradually tweak the data model that Google has assembled. It's a giant compute operation that gradually gives Google's software the ability to distinguish between things like a chair and a stool, or the word shift and the word ship.

So machines can fail inside Google's data center—that's inevitable—but when they do, the consequences aren't so severe. In fact, the entire system is designed so that Google's researchers won't even notice when there is a failure, says Greg Corrado, a Google research scientist.

"The larger question of cloud computing vs HPC [high performance computing] is a matter of taste, company culture, and available resources," he says. "I've done both. I'm very happy with Google's internal system of course."