Google operates what is surely the largest computer network on Earth, a system that comprises custom-built, warehouse-sized data centers spanning 15 locations in four continents. But about six years ago, as the company embraced a new form of voice recognition on Android phones, its engineers worried that this network wasn't nearly big enough. If each of the world's Android phones used the new Google voice search for just three minutes a day, these engineers realized, the company would need twice as many data centers.

At that time, Google was just beginning to drive its voice recognition services with deep neural networks, complex mathematical systems that can learn particular tasks by analyzing vast amounts of data. In recent years, this form of machine learning has rapidly reinvented not just voice recognition, but image recognition, machine translation, internet search, and more. In moving to this method, Google saw error rates drop a good 25 percent. But the shift required a lot of extra horsepower.

Rather than double its data center footprint, Google instead built its own computer chip specifically for running deep neural networks, called the Tensor Processing Unit, or TPU. "It makes sense to have a solution there that is much more energy efficient," says Norm Jouppi, one of the more than 70 engineers who worked on the chip. In fact, the TPU outperforms standard processors by 30 to 80 times in the TOPS/Watt measure, a metric of efficiency.

Google

A Neural Network Niche

Google first revealed this custom processor last May, but gave few details. Now, Jouppi and the rest of his team have a released a paper detailing the project, explaining how the chip operates and the particular problems it solves. Google uses the chip solely for executing neural networks, running them the moment when, say, someone barks a command into their Android phone. It's not used to train the neural network beforehand. But as Jouppi explains, even that still saves the company quite a bit. It didn't have to build, say, an extra 15 data centers.

The chip also represents a much larger shift in the world of computer processors. As Google, Facebook, Microsoft, and other internet giants build more and more services using deep neural networks, they've all needed specialized chips both for training and executing these AI models. Most companies train their models using GPUs, chips that were originally designed for rendering graphics for games and other highly visual applications but are also suited to the kind of math at the heart of neural networks. And some, including Microsoft and Baidu, the Chinese internet giant, use alternative chips when executing these models as well, much as Google does with the TPU.

The difference is that Google built its own chip from scratch. As a way of reducing the cost and improving the efficiency of its vast online empire, the company builds much of its own data center hardware, including servers and networking gear. Now, it has pushed this work all the way down to individual processors.

In the process, it has also shifted the larger market for chips. Since Google designs its own, for instance, it's not buying other processors to accommodate the extra load from neural networks. Google going in-house even for specialized tasks has wide implications; like Facebook, Amazon, and Microsoft, it's among the biggest chip buyers on Earth. Meanwhile the big chip makers—including, most notably, Intel—are building a new breed of processor in an effort to move the market back in their direction.

Focused But Versatile

Jouppi joined Google in late 2013 to work on what became the TPU, after serving as a hardware researcher at places like HP and DEC, a kind of breeding ground for many of Google's top hardware designers. He says the company considered moving its neural networks onto FPGAs, the kind of programmable chip that Microsoft uses. That route wouldn't have taken as long, and the adaptability of FPGAs means the company could reprogram the chips for other tasks as needed. But tests indicated that these chips wouldn't provide the necessary speed boost. "There's a lot overhead with programmable chips," he explains. "Our analysis showed that an FPGA wouldn't be any faster than a GPU."

In the end, the team settled on an ASIC, a chip built from the ground up for a particular task. According to Jouppi, because Google designed the chip specifically for neural nets, it can run them 15 to 30 times faster than general purpose chips built with similar manufacturing techniques. That said, the chip is suited to any breed of neural network—at least as they exist today—including everything from the convolutional neural networks used in image recognition to the long-short-term-memory network used to recognize voice commands. "It's not wired to one model," he says.

Google has used the TPU for a good two years, applying it to everything from image recognition to machine translation to AlphaGo, the machine that cracked the ancient game of Go last spring. Not bad—especially considering all the data construction it helped avoid in the process.