Today’s premium smartphones and tablets are pushing the limits of small form factor graphics processing units (GPU), boasting console quality graphics at display resolutions greater than most living room TVs. But it’s not just the high-end mobile space that requires dedicated graphics hardware these days. Growing markets for smartwatches and compact Smart-TV boxes also make use of GPUs. One of the most prevalent mobile GPU ranges is ARM’s Mali, and we were fortunate enough to be given a closer look at the future plans for the Mali GPU range at ARM’s Tech Day 2015 last week.

Most recently, ARM announced its energy-efficient Mali-T880 and T860 for high-end mobile devices, and its T820 and T830 designs for cost-efficient implementations. The T880 boasts 1.8 times the peak performance of its Mali-T760 design, along with a 40 percent reduction in energy for the same workloads and support for ultra-high resolution 4K content.

ARM hasn’t ruled out a modified Mali-450 design for low power wearables either, if OEMs demand it.

At the low end, which is typically bound by silicon costs, the T830 and T820 aim to reduce die area size by up to 50 percent over the T622, offer scaling for a variety of applications, and still support up-to-date graphics and compute APIs, such as OpenGL ES 3.1 and Microsoft’s Direct X 11.1. In fact, the Mali-T820 is now the smallest OpenGL ES 3.0 compliant design that ARM has.

Despite the introduction of new GPU designs, legacy chips like the Mali-450 are still well suited to less performance demanding applications such as wearables. With support well established, this design could stick around for quite a while. ARM hasn’t ruled out a modified Mali-450 design for low power wearables either, if OEMs demand it.

Midgard Architecture overview

ARM’s latest designs are still all built on its Midgard Tri-pipe architecture, which house most but not all of the key GPU components inside the “shader core”, allowing for scaling of performance by simply adjusting the number of cores. Most other GPU designs do not adopt designs which scale in this way, but this allows ARM to target a range of uses cases with quite similar designs.

At the high-end, the Mali-T860 features 3 ALUs per shader core, compared with the T860 and T760’s 2 ALUs per core, along with the load/store and texture units. This extra ALU offers up to a 50 percent improvement in compute performance per core. Both the T880 and T860 designs can be scaled up from single to 16 coherent core implementations, depending upon the level of performance required by the GPU. With mobile, the biggest limiting factors to performance and power come from the memory. Quite simply, the bandwidth available is much lower than console or desktop graphics equivalents, meaning that performance can be bottlenecked by memory. To overcome this problem, ARM makes use of ASTC, AFBC, Smart Composition, and Transaction Elimination techniques, optimizes its architecture for common workloads such as user interface tasks, and tries to cut down the number of memory transactions by sending higher quality information. This is also why ARM implements tile based rendering, as the active tile of the frame is kept in local memory as long as possible, rather than being pushed to slower main memory.

Jargon Buster: ALU – Arithmetic logic units are digital circuits used to perform integer math and bitwise logic.



Tiled Rendering – breaks a scene down into smaller tiles, which can then be rendered separately to on-chip memory.

Transaction Elimination – reduces processing by skipping duplicate tiles from the previous frame.

AFBC – ARM Frame Buffer Compression saves on memory bandwidth by storing a frame using lossless compression.

Not only that, but constant writing and reading from memory is a power expensive task, consuming somewhere around 100mW of power for 1Gbps of bandwidth with LPDDR4. Instead, ARM suggests that silicon manufacturers spend a little more space on cache to reduce power consumption and help keep as much data as possible on the GPU.

Most other GPU designs do not scale in this way, but this allows ARM to target a range of uses cases

Speaking of power, ARM has also done a lot of work to optimize its latest graphics processors for energy efficiency while performing the most common tasks. Most of this falls under pushing pixels around as the user moves through the UI, which, believe it or not, requires graphics processing. Those smooth homescreen transitions aren’t free.

The lower end T830 and T820 inherit many of these high-end features, but the pipelines with scalar units have been removed from the ALU. The T830 features 2 ALUs per core, while the T820 features just one, and can both be scaled up to 4 shader core GPUs.

Much like the new ARM Cortex-A72 CPU, the latest iteration of Mali is clearly focused on energy efficiency and extracting more performance whilst sticking within the tight power and thermal constraints of mobile platforms. By reducing memory and power requirements, silicon partners should be free to pack in additional GPU cores and thereby increase performance over previous generations.

The future of Mali

Speaking of power, the move to 16nm FinFET processes is also sure to result in decent gains for GPU designs. With power consumption and design sizes both shrinking, ARM’s high-end silicon partners will be able to squeeze additional shader cores into their SoC designs, as we have already seen with Samsung’s eight Mali-T760 core 14nm Exynos 7420. In the lower cost market, GPUs will smaller footprints could either be used to increase the core count or save on increasingly expensive silicon costs.

We’ve previously also covered the need for additional memory bandwidth for high resolution cameras and displays, but this extra bandwidth and associated power consumption could be a big drain on our batteries. ARM’s memory saving techniques and general optimizations could also pay dividends as mobile markets push towards even higher resolution content.

With ARM offering complete POP-IP packages already designed for 16nm FinFET manufacturing, we could well see some more energy efficient and powerful Mali-based SoCs hit the market around the turn of 2016.