Back in September, Arm had announced the new Cortex A76AE CPU with focus on automotive applications. Today, Arm adds to its portfolio by formally announcing a next gen processor with simultaneous multithreading, the new Cortex A65AE.

The previously announced A76AE CPU was Arm’s first to feature the “split-lock” technology that allowed two CPU cores to operate in a configurable lock-step operation mode, allowing the units to process tasks in parallel to each other and compare results for discrepancies – achieving a required level of functional safety of the software that is run on the system.

The Cortex A76AE was pretty much focused on heavy compute tasks and thus also fully takes advantage of the new high performance that is delivered by Arm’s new microarchitecture that came out of the Austin design centre. During the reveal of the A76AE, there were mentions of a “Helios” CPU core – to date it wasn’t too clear what this was meant to be, but it very much seemed like a new class of core that was meant to accompany the A76AE.

The Arm Cortex A65AE

Today, Arm finally formally announces the new Cortex A65AE, and although today’s material isn’t a full technical disclosure of the new CPU core, it does finally shed a little bit of light of what Helios is meant to be.

Much like during the Cortex A76AE release, Arm talked quite a bit about the needs of the automotive market and how cars are becoming increasingly demanding in terms of their need for compute power. Every single part of a car is becoming increasingly computerised, and advances in ADAS and future autonomous applications will explode the amount of processing power needed.

While the Cortex A76AE was focused on applications where high performance is needed, the Cortex A65AE is focused on high-throughput applications. The difference here would be in a sense the difference between demanding single-threaded workloads and demanding highly parallel and numerous multi-threaded workloads. Of the latter scenario, Arm emphasises the requirement of sensor processing in autonomous driving. Here the amount of sensors in a car is said to massively increase, and with it, also the need for higher throughput processing power.

Arm’s First SMT CPU Microarchitecture

Today’s announcement is a bit of an odd one in that we're talking about something important like Arm's first SMT microarchitecture as part of a more mundane automotive IP announcement, and yet the use-case presented here is the perfect fit for it. The Cortex A65AE is Arm’s first multi-threaded CPU core, allowing two threads to be executed per core. At the moment Arm is being very tight-lipped about the details of the microarchitecture, but they were able to comment on a bit of background information of the core.

As we’ve explained in the past, Arm usually has three main design centres which design the Cortex-A lineup of cores: The Cambridge team (A53, A55), the Sophia-Antipolis team (A73, A75), and the Austin team (A57, 72, and the new A76 family). The most interesting aspect of the Cortex A65AE is its heritage: although it was initially started by the Cambridge team, it then became a joint project and then finally finished to production quality by Arm’s newest team in their Chandler design centre in Arizona, making this effectively the first project coming out of this new team.

The reason why I dug into where the core came from is that it gives us greater perspective into what the microarchitecture might look like. Arm was able to disclose that this is indeed an out-of-order CPU core with SMT, but that’s about it in terms of what they were willing to reveal. The fact that the design started in Cambridge very much hints that this is somehow related to previous little cores such as the Cortex A53 and A55 – but the addition of OoO and SMT does make it seem more of distant cousin rather than a successor.

The only performance figure publicised during the presentation is the fact that the new CPU core is advertised as having a 3.5x higher throughput than the prior generation core in the same market segment – in this case a Cortex-A53. Arm usually makes performance projections based on the process node that an IP will typically be built on, again in this case that would be 7nm. Assuming a best-case scenario of 1.8-2x increased throughput through SMT, it still leaves quite a hefty difference that could be accounted for by frequency increases through the process node, or simply IPC improvements to the microarchitecture.

Again, the main benefit of the inclusion of SMT comes from the fact that in the primary automotive use-case of the Cortex A65AE, we’ll be seeing a load of sensors all communicating simultaneously to the central control unit of a car.

Arm’s SMT implementation also looks to be unique in terms of its functional safety features: Much alike “Split-Lock” mode on the Cortex A76AE where two physical cores can operate in lock-step with each other, the Cortex A65AE can also do this not only on a physical core level, but also on a thread level. Here a Cortex A65AE core can effectively have two threads operate in lock-step on the same core with two other threads on a physical shadow core. Here the instruction stream and each instruction output is checked for discrepancies at a hardware level, all transparent to the operating software (Obviously in the case of a failure, an exception would be generated).

In a practical example of an envisioned system, we would see different clusters of Cortex cores dedicated to different workload tasks. In the above diagram, we would see multiple Cortex A65AE cores in a cluster operate independently in “Split” mode, maximising their throughput when working on sensor data collection.

The data processing would then be passed on to different clusters for perception and decision tasks: Here the cores would require higher levels of functional safety, and thus the CPU cores would be operating in lock-step mode. Arm also emphasised its flexibility in terms of the configuration of the Split-Lock layout of the hardware; it’s something that would be determined on a firmware level, and vendors would be able to reconfigure with a software update if they so wished.

The Cortex A65AE is Arm’s second dedicated core meant for the automotive market - beyond the key aspect that this is an ASIL D compliant microarchitecture, the most interesting aspect of today’s announcement is the fact that it is a new microarchitecture that we yet to see in Arm’s traditional mobile and embedded markets. It seems very much a derivative of Arm’s Cambridge line-up of small CPU cores, and today’s presentation does position the core as the more “traditional little core" alongside the bigger Cortex A76AE.

The Cortex A65AE also Arm’s first SMT core, which is undoubtedly going to generate some talk among our readers. My view on this still unchanged - SMT doesn’t make much sense in mobile workloads as the key focus in this market is energy efficiency. From an electrical engineering perspective, an SMT core will never be more efficient than simply spreading out workloads across more physical cores and clock gating functional blocks when they are being underutilised.

Arm first let it slip that it was planning on introducing SMT during its Neoverse infrastructure IP announcement: Here SMT makes significantly more sense as the workloads and throughput requirements would be very different. If there will be a traditional “Cortex-A65” non-AE variant of this core, it will be very interesting to see how Arm is going to position this, and what markets it will be targeted at. For now, we’ll have to remain patient until further disclosures of the microarchitecture.

Arm envisions first silicon products with the Cortex A65AE in 2020.

Related Reading: