ARM is announcing a curious new core called the A12 today, a new 32-bit part for a 64-bit world. At first glance it looks quite out of place but once you think about it, the A12 has a large niche to fill.

The upcoming A12 is the spiritual successor to the A15 and ARM is claiming a 40% boost in speed over the A9. If you take a look at the chart below, you can see it is aimed at the meat of the market, between the high-end A57 and the low-end A53. Unlike those other two paired 64-bit parts, A12 is a 32-bit plus device, 40-bit PAE for 1TB of addressable memory to be exact, paired with a 32-bit ISA.

The future as seen by ARM

Looking at the positioning you have to question ARM’s decision to use a 32-bit ISA on the A12. The A57 above it and the A53 below are both 64-bit and bracket it for die area so why make the middle so memory limited? The response from ARM was a surprise but it did make sense. Traditionally in compute devices, a high-end device from the previous generation becomes the mid-range of the next generation. When the next generation comes out this cycle repeats.

Cell phones are changing the traditional paradigm though, users don’t want the older devices they want the latest and greatest without sacrificing anything. Waterfalling an older A15 or first generation A57 down to the mid-range devices would take more area while offering fewer features than a new core designed for the market. If you understand how silicon is designed and manufactured, this strategy has merit. You can usually do more with less area in a newer core although it isn’t always a clear win.

ARM obviously ran the numbers and decided that a 32-bit core with a refreshed feature set would satisfy the market better than a 64-bit one. The area used for that functionality is put to use for single threaded performance and features rather than wider instructions. Given the common apps on a phone and tablet you could make a case for raw single threaded performance being more of a win than 64-bits but the A53 below it does make it a much harder sell than it would otherwise be. Color us not completely convinced but willing to listen when the details are released.

A vague view of the core

ARM didn’t have many details to give out on A12 for the time being, but there was enough data to get a clear picture of what they are aiming for. The execution units are dual issue with OoO execution and an 11-stage pipeline. Since it is a superset of the A15 ISA it has NEON and a full FPU, virtualization, and Trustzone capabilities. It will also pair with an A7 in a big.little configuration, this more than anything tells you what markets ARM is aiming for.

On the memory side ARM claims they optimized the L1 and L2 sub-systems hopefully fixing a notable weak point of current cores. LPAE is also extended from 36-bits to 40-bits for 1TB of addressable memory but the 32-bit ISA limits each process to 32-bits/4GB. Going above 36-bits seems more theoretically useful than practically useful.

On the interface side we come to a few old favorites, a 128-bit AMBA bus for CPUs and GPUs, an Accelerator Coherency Port (ACP) for IO DMA, and a Peripheral Port for low latency IO. All but the Peripheral Port are coherent so expect some interesting co-processors to be available with this generation, GPUs being the least of them.

An equally vague Mali T622 block diagram

Speaking of GPUs, ARM is introducing a new one this time too, the Mali T622. Once again ARM isn’t giving out many details but is claiming a 50% better energy improvements over the Mali T644. The GPU supports Renderscript Compute and OpenCL1.1 Full Profile but for a 2015 time frame product you would hope for 1.2 if not 2.0 by then. That is counterbalanced by OpenGL ES3.0, a nice touch that along with the OpenCL screams “I am not a high-end part”. Luckily ARM doesn’t pretend that it is a high-end part, just a mid-range device that is better than the existing mid-range devices.

Mali V500 in equally vague form

One really new core is the Mali V500 video engine. ARM hasn’t done a video encode or decode engine prior to this so there isn’t anything to compare it to, but there are many other IP houses offering video cores for this class of device. They claim that a single V500 core will encode or decode 1080p60 video and it will scale up to 8 cores in a device. The high-end V500 is said to do 4K120 video or enough pixels to see the more detail than you want to for most movie stars.

At GDC ARM was talking about AFBC (ARM Frame Buffer Compression) which they claim saves 50% memory bandwidth with fewer losses than most traditional methods. The demos that were shown looked decent enough but as usual the proof will be in independent testing. That said it looks good. One other feature that ARM is quick to point out is that a V500 combined with Trustzone gives a nice secure video path for DRM infections to travel. Studios like this, users don’t but it will be there in V500s no matter what.

A lot of people don’t know much about how ARM goes from IP to devices, that is where the Artisan IP division comes in. Artisan basically offers pre-rolled cores and modules optimized for a particular foundry. For A12 Artisan has two partners, Global Foundries on GF28-SLP and TSMC on their 28HPM process. TSMC also has a Mali T622 on the same process and others are sure to follow. The idea is that you license these cores for a foundry and in theory is just works, all the hard work has been done and everything is pre-debugged. If you are wondering how all these no-name CPU vendors are popping up out of nowhere, Artisan can be a big part of the process.

In the end, ARM has a new core called A12 to replace the aging A9 in 2015. The A57 is the official successor to the A15 so on paper the A12 is not that, but since it is a superset of the A15 and pairs with the A7 lets just say a certain song comes to mind. The idea of trading off power, area, and 64-bit functionality to get a smaller, faster, more efficient A15 is an interesting one. As long as people don’t require 64-bit software by the time this core comes out it has a solid market. If not it will just be a more economical A9/A15 for whatever uses those cores are still apt for. It will be interesting to see how this plays out.S|A