The following blog post, unless otherwise noted, was written by a member of Gamasutras community.

The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

A smooth gameplay is built upon the foundations of a smooth frame rate and hitting the 60 frames per second target on the standard iPhone and iPad devices was a significant goal during the development of our upcoming action platformer game, Shadow Blade. (http://shadowblade.deadmage.com)

The following is a summary from the things we had to consider and change in the game in order to increase the performance and reach the targeted frame rate during the intense optimization sessions.

Once the basic game functionalities were in place, it was time to make sure the game performance would meet its target. Our main tool for measuring the performance was the built-in Unity profiler and the Xcode profiling tools. Being able to profile the running code on the device using the Unity profiler proved to be an invaluable feature.

So here goes our summary and what we learned about the results of this intense measuring, tweaking and re-measuring journey which paid out well at the end and resulted in a fixed 60fps for our target devices.

1 – Head to head with a ferocious monster called the Garbage Collector.

Coming from a C/C++ game programming background, we were not used to the specific behaviors of the garbage collector. Making sure your unused memory is cleaned up automatically for you is nice at first but soon the reality kicks in and you witness regular spikes in your profiler showing the CPU load caused by the garbage collector doing what it is supposed to do, collecting the garbage memory. This proved to be a huge issue specifically for the mobile devices. Chasing down memory allocations and trying to eliminate them became priority number one and here are some of the main actions we took:

Remove any string concatenation in code since this leaves a lot of garbage for the GC to collect. Replace the “foreach” loops with simple “for” loops. For some reason, every iteration of every “foreach” loop generated 24 Bytes of garbage memory. A simple loop iterating 10 times left 240 Bytes of memory ready to be collected which was just unacceptable Replace the way we checked for game object tags. Instead of “if (go.tag == “Enemy”)” we used “if (go.CompareTag (“Enemy”)”. Calling the tag property on an object allocates and copies additional memory and this is really bad if such a check resides in an inner loop. Object pools are great, we made and used pools for all dynamic game objects so that nothing is ever allocated dynamically during the game runtime in the middle of the levels and everything is recycled back to the pool when not needed. Not using LINQ commands since they tended to allocate intermediate buffers, food for the GC.

2 – Careful with the communication overhead between high level scripts and native engine C++ code.

All gameplay code written for a game using Unity3D is script code which in our case was C# that was handled using the Mono runtime. Any requirements to communicate with the engine data would require a call into the native engine code from the high level scripting language. This of course has its own overhead and trying to reduce such calls in game code was the second priority.

Moving objects around in the scene requires calls from the script code to the engine code and we ended up caching the transformation requirements for an object during a frame in the gameplay code and sending the request to the engine only once to reduce the call overhead. This pattern was used in other similar places other than the needs to move and rotate an object. Caching references to components locally would eliminate the need to fetch a component reference using the “GetComponent” method on a game object every time which is another example for a call into the native engine code.

3 – Physics, Physics and more Physics.

Setting the physics simulation timestep to the minimum possible. For our case we could not set it lower than 16 milliseconds. Reducing calls to character controller move commands. Moving the character controller happens synchronously and every call can have a significant performance cost. What we did was to cache the movement requests per frame and apply them only once. Modifying code to not rely on the “ControllerColliderHit” callbacks. It proved that these callbacks are not handled very quickly. Replacing the physics cloth with a skinned mesh for the weaker devices. The cloth parameters can play important roles in performance also and it pays off to spend some time to find the appropriate balance between aesthetics and performance. Ragdolls were disabled so that they were not part of the physics simulation loop and only enabled when necessary. “OnInside” callbacks of the triggers need to be assessed carefully and in our case we tried to model the logic without relying on them if possible. Layers instead of tags! Layers and tags can be assigned to objects easily and used for querying specific objects, however, layers have a definite advantage at least performance wise when it comes to working with collision logic. Quicker physics calculations and less unwanted newly allocated memory are the basic reasons. Mesh colliders are definitely a no-no. Minimize collision detection requests like ray casts and sphere checks in general and try to get as much information from each check.

4 – Let's make the AI code faster!

We use artificial intelligence for the enemies that try to block our main ninja hero and fight with him. The following topics needed to be covered regarding AI performance issues:

A lot of physical queries are generated from AI logic like visibility checks. The AI update loop could be set to something much lower than the graphics update loop to reduce CPU load.

5 – Best performance is achieved from no code at ALL!

When nothing happens, performance is good. This was the base philosophy for us to try and turn anything not necessary at the moment off. Our game is a side scroller action game and so a lot of the dynamic level objects can be turned off when they are not visible in the scene.

Enemy AI was turned off when far away using a custom level of detail scheme. Moving platforms and hazards and their physics colliders were turned off when far away. Built in Unity “animation culling” system was used to turn off animations on objects not being rendered. Same disabling mechanism used for all in level particle systems.

6 – Callback! How about empty callbacks?

The Unity callbacks needed to be reduced as much as possible. Even the empty callbacks had performance penalties. There is no reason for having empty callbacks but they just get left in the code base sometimes in between a lot of code rewrite and refactoring.

7 – The mighty Artists to the rescue.

Artists can always magically help out the hair-pulling programmer trying to go for a few more frames per second.

Sharing materials for game objects and making them static in Unity causes them to be batched together and the resulting reduced draw calls are critical for good mobile performance. Texture atlases helped a lot especially for the UI elements. Square textures and power of two with proper compression was a must. Being a side-scroller enabled our artists to remove all far background meshes and convert them to simple 2D planes instead. Light maps were highly valuable. Our artists removed extra vertices during a few passes. Proper texture mip levels were a good decision especially for having a good frame rate on devices with different resolutions. Combining meshes was another performance friendly action by the artists. Our animator tried to share animations between different characters if it was possible. A lot of iterations on the particles were necessary to find the aesthetic/performance balance. Reducing number of emitters and trying to reduce transparency requirements were among the major challenges.

8 – The memory usage needs to be reduced, now!

Using a lot of memory of course has negative performance related effects but in our case we experienced a lot of crashes on iPods due to exceeding memory limits which was a much more critical problem. The biggest memory consumers in our game were the textures.

Different texture sizes were used for different devices, especially textures used in UI and large backgrounds. Shadow Blade uses a universal build but different assets get loaded when the device size and resolution is detected upon startup. We needed to make sure un-used assets were not loaded in memory. We had to find out a little late in the project that any asset that was only referenced by an instance of a prefab and never instantiated was fully loaded in memory. Stripping out extra polygons from meshes helped. We needed to re-architect the lifecycle management of some assets a few times. For example tweaking the load/unload time for the main menu assets or end of level assets or game music. Each level needed to have its specific object pool tailored to its dynamic object requirements and optimized for the least memory needs. Object pools can be flexible and contain a lot of objects during development, however, they need to be specific once the game object requirements are known. Keeping the sound files compressed in memory was necessary.

Game performance enhancement is a long and challenging journey and we had a fun time experiencing a small part of this voyage. The vast amount of knowledge shared by the game development community and very good profiling tools provided by Unity were what made us reach our performance targets for Shadow Blade.

Here is the game trailer for our game, Shadow Blade:

http://youtu.be/tgSXLVAwZJs

Game website: shadowblade.deadmage.com