Neural network processing and AI workloads are both hot topics these days, driving multiple companies to announce their own custom silicon designs or to plug their own hardware as a top-end solution for these workloads. But one problem with neural networks is that they tend to be extremely power intensive, and not necessarily suited to mobile devices or the kind of low-power “smart” speakers that have recently become so popular.

MIT is claiming to have developed a neural network processor that addresses these problems, with an overall power reduction of up to 95 percent. If true, this could change the game for these kinds of applications. Instead of being forced to rely on cloud connectivity to drive AI (and using power to keep the modem active), SoCs could incorporate these processors and perform local calculations.

“The general processor model is that there is a memory in some part of the chip, and there is a processor in another part of the chip, and you move the data back and forth between them when you do these computations,” said Avishek Biswas, an MIT graduate student in electrical engineering and computer science, who led the new chip’s development:

Since these machine-learning algorithms need so many computations, this transferring back and forth of data is the dominant portion of the energy consumption. But the computation these algorithms do can be simplified to one specific operation, called the dot product. Our approach was, can we implement this dot-product functionality inside the memory so that you don’t need to transfer this data back and forth?

A typical neural network is organized into layers. Each node connects to other nodes above and below it, and each connection between nodes has its own weight. Weight, in this context, refers to how much of an impact computations performed in one node will have on the calculations performed in the nodes it connects to. Nodes receiving input from multiple nodes above it multiply the inputs they receive by the weight of each input. The result is called the dot product. If the dot product is above a certain threshold, it gets sent along to nodes farther down the chain. But this process is extremely memory intensive, with each dot product calculation requiring memory accesses to retrieve the weighted values. Those values then have to be stored, and each input to a node has to be independently calculated.

What MIT has done is create a chip that more closely mimics the human brain. Input values are converted to electrical voltages, then multiplied by appropriate weights. Only the combined voltages are converted back into digital representation and stored for processing. The prototype chip can calculate 16 dot products simultaneously. By storing all of its weights as either 1 or -1, the system can be implemented as a simple set of switches, while only losing 2-3 percent of accuracy compared with the vastly more expensive neural nets.

Not bad for an approach that can reduce power consumption up to 95 percent. And it’s a promising concept for a future in which the benefits of the cloud and AI aren’t limited to those with robust internet service in a mobile device or at home.

Top image credit: Chelsea Turner/MIT