Exciting times for graphics on iOS 8!

At its recent World Wide Developers Conference, Apple introduced Metal, a new graphics API that’s low-overhead, high efficient, and designed specifically for the A7 chip. It provides a way for game makers to take full advantage of iOS hardware and achieve far greater realism, detail, and interactivity in their games than ever before.

We’ll be adding support for Metal soon, but in advance, wanted to take you through the new technology and explain why it is such a big deal.

Metal at a glance

Metal has several key ideas in it that enable lower overhead, more predictable performance and better programmability:

Create and validate as much state up-front as possible. Shaders can be compiled and partially optimized offline. Everything that is related to rendering pipeline state: shaders, vertex layout, blending modes, render target formats, etc. can be created and validated before rendering even starts. This means no more state checks every draw call and a lot of CPU processing power freed.

Enable much more versatile multi-threading. Resources can be created from any thread and there are several ways to prepare draw call submission from multiple threads in parallel.

All iOS devices have shared memory for CPU & GPU. There’s no need to pretend that data from the CPU has to be “copied” into some video memory anymore. When you create a buffer, you just get a pointer to it, and that’s the same memory that the GPU sees.

Let the user (engine) handle synchronization. OpenGL ES has to jump through lots of hoops and do lots of guesswork in order to behave in every imaginable scenario. In Metal, synchronization of data between CPU & GPU is user’s responsibility. The engine has much better knowledge of what it tries to do, afterall!

All GPUs in iOS devices are using Tile-Based Deferred Rendering architecture. It is explicitly reflected in Metal API, particularly when it comes to render targets. The API does not try to guess anything anymore – all framebuffer related actions such as tile loads & stores, anti-aliasing resolves are done explicitly.

All the points above translate to much lower CPU overhead and much more predictable performance.

A new C/C++11-based language is introduced for both graphics & compute shaders. This also means that iOS can do compute shaders, atomics, arbitrary buffer writes and similar fancy sounding tricks on the GPU now.

No legacy baggage, the API is very simple & streamlined. Oh, and it also has a super-helpful optional “debug layer” that does extra validation and notifies you of any errors or mistakes you make.

Now let’s go into even more details!

The Draw Call Problem

If you’re making games, particularly mobile games, you’re probably aware of The Draw Call Problem. Each and every object that is rendered in the game has some CPU cost, and realistically on mobile right now you cannot afford more than a few hundred objects being rendered. In a real game, you also very much want to use CPU for other things – gameplay logic, physics, AI, animations and so on. Unity has some measures to minimize the number of draw calls being made – static & dynamic batching, occlusion culling, LOD and distance-based layer culling; you can also merge close objects together, put textures into atlases to reduce number of materials.

A good question is – why there has to be a CPU cost to render something? After all, it’s the GPU that is doing the actual work.

Some of the overhead is on “the engine” side – CPU has to iterate over visible objects, figure out which shader passes need to be rendered, which lights affect which objects, which material parameters to apply and so on. Some of that is cached; some of that is multi-threaded; and generally this is platform-independent code. In each Unity release, we try to optimize this part, and Metal generally does not affect this.

However, other part of the CPU overhead is in the “graphics API & driver” part. Depending on the game, this part can be significant. Metal is an attempt to make this part virtually go away, by being a much better match for modern hardware, somewhat lower level and doing massively less guesswork than OpenGL ES used to do. Up-front rendering state creation & validation; explicit actions related to render target loads & stores; no synchronization dances done on the API side — all these things contribute to much lower CPU overhead.

Based on our testing so far, we have seen API+driver overhead vanish to just a few percent of CPU time. That is a tremendous improvement comparing to 15-40% of CPU time that it used to be before! That means majority of the remaining CPU overhead is in our own code now. I guess we’ll have to continue optimizing that :)

We’re also looking forward to using Metal ability to do rendering submissions from multiple threads; this opens up very interesting optimization opportunities as well.

The Compute Opportunity

With Metal, the GPU can be used for doing computation outside of typical vertex+fragment shaders area — known as “compute shaders”. Basically, this is an ability to run any kind of “parallel computation” on the many little processors inside a GPU. Compute shaders also have a concept of “local storage” – very fast piece of dedicated on-GPU memory that can be used to share data between these parallel work items. This particular piece of memory enables using GPU for things that aren’t easily expressible in ye olde vertex and fragment shaders.

There are tons of interesting areas to use compute shaders for — optimized post-processing effects, particle systems, shadow and light culling and so on.

While we aren’t using compute shaders much in Unity just yet, we’re looking forward to using them for more and more stuff. Exciting times ahead!

FAQ

When can I get this?

We can’t wait to ship this, but would like to avoid promising any actual dates. We have done a lot already, but still some things remain in order to be “shippable”. Our current plan is to first integrate all of the bits of Metal that provide the huge boosts to CPU side performance. Hopefully in Unity 5.0. Later on, we’ll add compute shader support (compute shader support is somewhat more involved on our side).

What would be the platform requirements?

Metal requires iOS 8 and an A7-based device (iPhone 5S, iPad Air, Retina iPad Mini).

What would I have to do to take advantage Metal’s lower CPU overhead?

Generally, nothing. Once we add support for Metal in Unity, using it should be very transparent. All your existing projects, all your shaders and graphics effects should just work. Just enjoy your lower CPU usage!

But what about shaders, since Metal has a different shading language?

We’ll take care of that. Right now you generally write shaders in Cg/HLSL, which we convert into GLSL for OpenGL ES behind the scenes. For Metal, we’ll convert them in a very similar way.

What can I do with lower CPU overhead, again?

Have better physics, AI or more complex gameplay logic. Put more objects on the screen. Or just enjoy lower battery usage. It’s all up to you!