IBM researchers, Tayfun Gokmen and Yurii Vlasov, unveiled a paper in which they invented the concept for a new chip called a Resistive Processing Unit (RPU) that can accelerate Deep Neural Networks training by up to 30,000x compared to conventional CPUs.

A Deep Neural Network (DNN) is an artificial neural network with multiple hidden layers that can be trained in an unsupervised or supervised way, resulting in machine learning (or artificial intelligence) that can “learn” on its own.

This is similar to what Google’s AlphaGo AI has been using to learn playing Go. AlphaGo used a combination of a search-tree algorithm and two deep neural networks with multiple layers of millions of neuron-like connections. One, called the “policy network,” would calculate which move has the highest chance of helping the AI win the game, and another one, called the “value network,” would estimate how far it needs to predict the outcome of a move before it has a high enough chance to win in a localized battle.

Many machine learning researchers have begun focusing on deep neural networks because of their promising potential. However, even Google’s AlphaGo still needed thousands of chips to achieve its level of intelligence. IBM researchers are now working to power that level of intelligence with a single chip, which means thousands of them put together could lead to even more breakthroughs in AI capabilities in the future.

“A system consisted of a cluster of RPU accelerators will be able to tackle Big Data problems with trillions of parameters that is impossible to address today like, for example, natural speech recognition and translation between all world languages, real-time analytics on large streams of business and scientific data, integration and analysis of multimodal sensory data flows from massive number of IoT (Internet of Things) sensors,” noted the researchers in their paper.

The authors talked about how in the past couple of decades, machine learning has benefited from the adoption of GPUs, FPGAs, and even ASICs that aim to accelerate it. However, they believe further acceleration is possible by utilizing the locality and parallelism of the algorithms. To do this, the team has borrowed concepts from next-generation non-volatile memory (NVM) technologies such as phase change memory (PCM) and resistive random access memory (RRAM).

The acceleration for Deep Neural Networks that is achieved from this type of memory alone reportedly ranges from 27x to 2,140x. However, the researchers believe the acceleration could be further increased if some of the constraints in how NVM cells are designed were removed. If they could design a new chip based on non-volatile memory, but with their own specifications, the researchers believe the acceleration could be improved by 30,000x.

"We propose and analyze a concept of Resistive Processing Unit (RPU) devices that can simultaneously store and process weights and are potentially scalable to billions of nodes with foundry CMOS technologies. Our estimates indicate that acceleration factors close to 30,000 are achievable on a single chip with realistic power and area constraints," said the researchers.

As this sort of chip is only in the research phase, and because regular non-volatile memory hasn’t reached the mainstream market yet, it’s probably going to be a few years before we begin to see something like it on the market. However, the research seems promising, and it may raise the attention of companies such as Google, which wants to accelerate its AI research as much as possible. IBM itself is also interested in solving Big Data challenges in healthcare and other domains so the company’s own businesses should benefit from this research in the future.

Lucian Armasu is a Contributing Writer for Tom's Hardware. You can follow him at @lucian_armasu.

Follow us on Facebook, Google+, RSS, Twitter and YouTube.