AUSTIN, Texas — Arm sketched the inner workings of its machine-learning core at a press and analyst event here. Engineers are nearly finished with RTL for the design with hopes of snagging a commitment within weeks for use in a premium smartphone in 2019 or later.

Analysts generally praised the architecture as a flexible but late response to a market that is already crowded with dozens of rivals. However, Arm still needs to show detailed performance and area numbers for the core, which may not see first silicon until next year.

The first core is aimed at premium smartphones that are already using AI accelerator blocks from startup DeePhi (Samsung Galaxy), Cambricon (Huawei Kirin), and in-house designs (iPhone). The good news for Arm is that it’s already getting some commercial traction for the neural-networking software for its cores, released as open source, that sits under frameworks such as TensorFlow.

[Sponsored: Learn more about Computer Vision for the Masses]

Winning the hearts and minds of software developers is increasingly key in getting design wins for hardware sockets, said Dennis Laudick, a vice president of marketing for Arm’s machine-learning group. He helped build partnerships around Arm’s Mali GPU cores, once a crowded market led by others but now dominated by Arm.

Long-term, deep-learning accelerators could be even more significant than graphics processors. “This is kind of the start of software 2.0,” said Laudick. “For a processor company, that is cool. But it will be a slow shift, there’s a lot of things to be worked out, and the software and hardware will move in steps.”

In a sign of Arm’s hunger to unseat its rivals in AI, the company has “gone further than we normally would, letting [potential smartphone customers] look under the hood” of the core’s design, he said.

At least one smartphone maker is already kicking the tires of the beta RTL. “A couple [of premium smartphone makers] aren’t interested [in the core], a couple are very interested, and a couple are somewhere in between,” said Laudick, adding that a production release of the RTL is on track for mid-year.

The first core targets 4.6 tera operations/second (TOPS) and 3 TOPS/W at 7 nm for high-end handsets. Arm plans simpler variants using less memory for mid-range phones, digital TVs, and other devices.

Theoretically, the design scales from 20 GOPS to 150 TOPS, but the demand for inference in the Internet of Things will pull it first to the low end. Arm is still debating whether it wants to design a core for the very different workloads of the data center that includes training.

“We are looking at [a data center core], but it’s a jump from here,” and its still early days for thoughts on a design specific for self-driving cars, said Laudick.

Arm’s ML core marries MACs, SRAM, and a streamlined controller on each of up to 16 slices. (Images: ARM)



