AI algorithms take a lot of resources to function. What hardware can handle running them on the edge?

This week, Intel announced a new neuromorphic system using their Loihi chips that mimics the computing processes of eight million neurons. Combined with yesterday's news that the University of Michigan announced a memristor array chip that may allow localized AI (artificial intelligence), this seems like an opportune time to talk about AI hardware, which is starting to become more and more commonplace.

The University of Michigan's memristor array chip seated in a larger custom chip. Photo by Robert Coelius, Michigan Engineering Communications & Marketing.

But what are the major differences between neural nets and neuromorphic systems? And what hardware is available for developers today?

Here's a quick rundown on non-cloud-based AI systems and three examples of hardware designed to bring AI into the palm of your hand.

The Push for Edge Computing for AI

While AI is still in its early days, we can safely say that some form of AI is being regularly included in everyday products. The advantages of including AI in a project are often vastly overlooked as AI can drastically improve user interaction, predict behavior, and therefore allow for reduced response times (such as those found in cars), and help with improving efficiency (such as production lines).

But AI is not without its concerns and there is a growing trend of users who do not want potentially sensitive data (such as conversations) being sent and stored on a datacenter. Since cloud-based AI systems have other inherent issues such as latency and internet availability, there is a push for AI to move to the edge in the form of “edge computing” whereby local devices process data and run AI algorithms locally.

Neuromorphic vs Neural Nets

There is a number of ways in which AI can be implemented, but the main two that are taking the stage are neuromorphic programming and neural networks.

A neuromorphic AI system is one that closely resembles how brain neurons work in that circuity and software are combined to produce neurons which can trigger other neurons, form connections, and respond to stimuli. An example of how hardware could be used to replicate this process would be microcontrollers and FPGAs. Microcontrollers behave in a similar fashion to neurons whereby they can process incoming data and produce an output whereas FPGAs behave in a similar fashion to connections between neurons in which they can create, break, and reroute connections between neurons.

The key feature to neuromorphic systems is that they operate on the same principle as neurons in the brain in that signals can fire neurons which cause them to send signals to other neurons.

A neural net, however, is a series of nodes connected by weighted links that resembles the neurons in a brain. However, unlike neuromorphic systems, neural nets do not have neurons that fire and send pulses to other nodes. Instead, neural nets activate all nodes which take inputs, sum them together, apply coefficients and weights, then produce their outputs.

AI Efficiency in Execution

One of the biggest hurdles of AI is the complexity involved and the difficulty in efficiently executing them. Running AI systems that can recognize objects, determine who is speaking, and respond to its environment can require a lot of resources, which is one of the prime reasons many AI systems run on cloud-based datacenters.

Since there is a large demand for AI services on embedded devices that may or may not have an internet connection, the need for dedicated AI hardware is becoming apparent. Dedicated AI hardware can help to offload work on the main processor which is better left at taking user input and updating graphical interfaces which results in a more responsive system (i.e., less waiting around for Alexa to respond after being asked a question).

AI comes down to two main tasks; learning and execution. The learning phase of an AI system is where it is presented with data and then learns what that data is and how it should behave. For example, an AI system that is designed to recognize cats needs to be shown many pictures of cats and pictures of where there are no cats.

The second task, execution, is where the AI system is fed data and it then processes that data to produce an appropriate response. In the cat example, this would be the AI system looking at random objects and determining if they are cats. The learning phase of AI is something that can be done at a datacenter but the resulting AI algorithm (such as a neural net) can be transferred from a datacenter and then downloaded to a device for use.

So, the neural network that has been taught to recognize cats can then be transferred to a smaller device in which the net is executed. However, the execution phase is still a mammoth task which is why hardware developers are starting to produce AI hardware.

Google Coral — A Neural Network Dev Board

Neural net hardware is something that is starting to be integrated into modern embedded systems. One of the biggest examples of neural net hardware is the Google Coral range of products.

The dev board is a single-board computer that includes all the typical SBC hardware, such as a quad-core ARM53 and 1GB of RAM—but it also includes an AI coprocessor.

The bottom of the Google Coral dev board. Image from Google

The coprocessor is called the Google Edge TPU, which is a device that is dedicated to offloading TensorFlow AI algorithms from the main processor which can speed up AI applications by freeing the CPU of AI data processing but also by using hardware dedicated to AI tasks. Google's TPUs (tensor processing units) have been around for several years and are a part of an initiative to offer access to processing in the cloud—specifically scalable processing that can handle large sums of data for AI. About a year ago, TPU 3.0 was a key element in delighting and/or spooking audiences at the 2018 Google I/O Keynote where Project Duplex sounded uncannily human in a demonstration of text-to-speech AI.

In addition to the dev board, Google has also released the USB Accelerator which is a TensorFlow coprocessor that is housed in a USB device. This can provide TensorFlow capabilities to the connected system which can not only help to reduce processor usage but also remove the need for drastic hardware change (for example, a Raspberry Pi can be upgraded with the use of the USB Accelerator).

Brain Chip — Neuromorphic Computing SoC

While neuromorphic hardware is not as commonplace as neural net hardware, there are some examples. One recent example is BrainChip which utilizes the spiking neural network concept to provide AI execution. Unlike neural networks, only neurons that are being used draw power which means that a spiking neural system can be significantly more efficient than their neural net counterparts.

BrainChip's resulting NSoC (neuromorphic system-on-chip) has 1.2 million neurons with 10 billion synapses and is about 100 times more efficient than the Intel Loihi neuromorphic chip.

The BrainChip Akida NSoC. Image from BrainChip

BrainChip's products include all the common hardware that one would expect from any embedded coprocessor such as PCIe, USB3.0, SPI, UART, and I3S. The BrainChip NSoC, however, also has a conversion complex which includes a pixel-spike converter as well as a generic data-spike converter for efficiently converting incoming data to one which is compatible with the spiked neural network.

Qualcomm Snapdragon AI — Neural Network SoC

Qualcomm is another company that has recognized the importance of AI in embedded design. As a result, they have integrated dedicated hardware into their devices as well as releasing SDKs that can more efficiently use their embedded devices. For example, the Snapdragon 855 includes a Qualcomm Kyro 485 CPU, a Qualcomm Adreno 640 GPU, and a Qualcomm Hexagon 690 DSP which are used together to execute AI neural nets more effectively.

The Qualcomm Snapdragon 855 SoC. Image from Qualcomm

The Snapdragon 845 SoC mobile platform from 2017 was already geared towards heavy data processing tasks, but the 855 platform is reportedly specifically suited for mobile applications and being touted as an important upgrade for 5G devices.

Read More

While AI hardware is still in its infancy, it is becoming commercially available with the majority being centered around convoluted neural networks such as the Coral range of products and the Qualcomm Snapdragon range.

However, spiked neural networks (SNNs) could be the key to the future, especially in edge computing if they really are more efficient in both execution time and energy consumption. On the other hand, their increased complexity may make them a more expensive option.

Either way, AI hardware is already here and it will only be a matter of time before even microcontrollers will include some rudimentary AI engine or peripheral that can speed up trivial AI tasks.