During Ludum dare #34 we made a game where you play as the mayor of a procedurally generated town.

You could just play it here, or read on as I discuss more in depth the issues we faced with draw calls and SkinnedMeshRenderers in Unity (and then you play it).

Introduction

“Premature optimization is the root of all evil” – Donald Knuth

In the early stages of development I pay little heed to the cost of any algorithms and whatnots. I tend to go for the simplest solution for any given problem and fix up what needs optimizing later. Doing this speeds up development time immensely, and results in much more clean code. At the end of development we found that the graphical assets were the only bottleneck in our game, and optimizing that bit would prove to be quite difficult. This blog post details the steps I took to optimize the rendering, talking about each iteration from start to finish.

First iteration – Using the non-animated mesh

The most obvious problem is the sheer number of meshes that keeps getting added as the game progresses. Each city block contains roughly 15 meshes each. As the number of city blocks reaches the hundreds, the number of drawcalls becomes too much for most hardware to handle. Unity is able to batch meshes together, resulting in fewer draw calls. But all the meshes are animated, so they use the SkinnedMeshRenderer component rather than MeshRenderer component. Unity can’t batch SkinnedMeshRenderers, so I need to do some work around.

All the buildings and trees are only animated as they appear and fade away. When they’ve emerged they’re completely still, so what about converting them to a MeshRenderer during that time? My first idea was to use the non-animated mesh with a MeshRenderer and toggle between those two when appropriate, but I quickly found that a few non-animated meshes didn’t match up with the final pose of the SkinnedMeshRenderer, so using this technique was not an option.



Image showing the discrepancy between the final pose of a SkinnedMeshRenderer and the mesh it has been animating.

Second Iteration – SkinnedMeshRenderer.BakeMesh()

The SkinnedMeshRenderer has a rather interesting function called BakeMesh(). What it does is it creates a snapshot of the mesh as it is in it’s current state of animation, outputting a new Mesh to be used elsewhere. This would solve the problem splendidly, as I could pass that mesh to the MeshRenderer and watch as I bask in the glory of automatic batching. As Admiral Ackbar so boldly noted it wouldn’t be that easy.

I started by creating a dictionary that would contain each different type of mesh. This is so the MeshRenderers can share a single mesh, allowing Unity to batch them. Whenever a new kind of mesh was found, its mesh would be baked and added to the dictionary.

Issues of scale

The BakeMesh() function will output a mesh that looks exactly like the pose of the SkinnedMeshRenderer. This means that if the SkinnedMeshRenderer is scaled, the baked mesh will also be scaled; so if you were to attempt to use the baked mesh in a MeshRenderer attached to a GameObject with an identical world scale as that of the SkinnedMeshRenderer, the resulting scale will be twice of that of the SkinnedMeshRenderer. In addition, this oddly scaled mesh will be used for all the buildings, many of whom do not share that exact scale.



Issues of scale illustrated

The way to fix this is to temporarily set the world scale of the SkinnedMeshRenderer to 1, bake, and then set it back to its original value. This allows us to later use the same scale for the MeshRenderer GameObjects as that of the SkinnedMeshRenderer GameObjects, ensuring a seamless transition between the meshes.

/StateCapital/blob/9f409ed19f4f5defc57eb2b98c038aed7e7cf404/Assets/CityBlockState.cs // Here we store our already baked poses private static Dictionary < string , Mesh > batchedMeshes = new Dictionary < string , Mesh > ( ) ; ... private void Freeze ( ) // This is called when all the animations in the block has finished { // Note that this function is only called once for a city block // Get all the SkinnedMeshRenderers that belong to this block SkinnedMeshRenderer [ ] renderers = GetComponentsInChildren < SkinnedMeshRenderer > ( ) ; for ( int i = 0 ; i < renderers . Length ; i ++ ) { string key = renderers [ i ] . sharedMesh . name ; // assume name is unique for each mesh if ( ! batchedMeshes . ContainsKey ( key ) ) // If no baked pose exists, make one! { Mesh m = new Mesh ( ) ; Vector3 lScale = renderers [ i ] . GetComponentInParent < Animator > ( ) . transform . localScale ; // Ensure the scale of the mesh is one before baking as this will // be used with other buildings with different scale renderers [ i ] . GetComponentInParent < Animator > ( ) . transform . localScale = Vector3 . one ; // Bake the current pose renderers [ i ] . BakeMesh ( m ) ; batchedMeshes . Add ( key, m ) ; renderers [ i ] . GetComponentInParent < Animator > ( ) . transform . localScale = lScale ; } // Create a new GameObject to house our mesh GameObject staticMesh = new GameObject ( "StaticMeshInstance" ) ; // Setup its transforms staticMesh . transform . parent = renderers [ i ] . transform . parent ; staticMesh . transform . localPosition = renderers [ i ] . transform . localPosition ; staticMesh . transform . localRotation = renderers [ i ] . transform . localRotation ; staticMesh . transform . localScale = Vector3 . one ; // Setup the rendering components staticMesh . AddComponent < MeshFilter > ( ) . sharedMesh = batchedMeshes [ key ] ; staticMesh . AddComponent < MeshRenderer > ( ) . sharedMaterial = renderers [ i ] . sharedMaterial ; // Disable shadows so Unity can get batching staticMesh . GetComponent < MeshRenderer > ( ) . receiveShadows = false ; staticMesh . GetComponent < MeshRenderer > ( ) . shadowCastingMode = UnityEngine . Rendering . ShadowCastingMode . Off ; // Turn off the SkinnedMeshRenderer renderers [ i ] . enabled = false ; } } private void UnFreeze ( ) // Called when the block needs to be animated again for disappearing { // The block will be destroyed soon, so we simply toggle the visibility of the two renderers foreach ( var m in GetComponentsInChildren < MeshRenderer > ( ) ) { m . enabled = false ; } foreach ( var m in GetComponentsInChildren < SkinnedMeshRenderer > ( ) ) { m . enabled = true ; } } // Here we store our already baked poses private static Dictionary<string, Mesh> batchedMeshes = new Dictionary<string, Mesh>(); ... private void Freeze() // This is called when all the animations in the block has finished { // Note that this function is only called once for a city block // Get all the SkinnedMeshRenderers that belong to this block SkinnedMeshRenderer[] renderers = GetComponentsInChildren<SkinnedMeshRenderer>(); for (int i = 0; i < renderers.Length; i++) { string key = renderers[i].sharedMesh.name; // assume name is unique for each mesh if (!batchedMeshes.ContainsKey(key)) // If no baked pose exists, make one! { Mesh m = new Mesh(); Vector3 lScale = renderers[i].GetComponentInParent<Animator>().transform.localScale; // Ensure the scale of the mesh is one before baking as this will // be used with other buildings with different scale renderers[i].GetComponentInParent<Animator>().transform.localScale = Vector3.one; // Bake the current pose renderers[i].BakeMesh(m); batchedMeshes.Add(key, m); renderers[i].GetComponentInParent<Animator>().transform.localScale = lScale; } // Create a new GameObject to house our mesh GameObject staticMesh = new GameObject("StaticMeshInstance"); // Setup its transforms staticMesh.transform.parent = renderers[i].transform.parent; staticMesh.transform.localPosition = renderers[i].transform.localPosition; staticMesh.transform.localRotation = renderers[i].transform.localRotation; staticMesh.transform.localScale = Vector3.one; // Setup the rendering components staticMesh.AddComponent<MeshFilter>().sharedMesh = batchedMeshes[key]; staticMesh.AddComponent<MeshRenderer>().sharedMaterial = renderers[i].sharedMaterial; // Disable shadows so Unity can get batching staticMesh.GetComponent<MeshRenderer>().receiveShadows = false; staticMesh.GetComponent<MeshRenderer>().shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.Off; // Turn off the SkinnedMeshRenderer renderers[i].enabled = false; } } private void UnFreeze() // Called when the block needs to be animated again for disappearing { // The block will be destroyed soon, so we simply toggle the visibility of the two renderers foreach (var m in GetComponentsInChildren<MeshRenderer>()) { m.enabled = false; } foreach (var m in GetComponentsInChildren<SkinnedMeshRenderer>()) { m.enabled = true; } }

The code for the first iteration. [View on GitHub]



This worked splendidly, as the swap was completely seamless. This came at a price as unity doesn’t batch MeshRenderers that use shadows. Turning off shadows lead to the game looking flot, so some other solution would be preferable.



Much more optimized due to batching, but shadows would be a nice thing to have

Third Iteration – Combining Meshes

What if we took a slightly more manual approach to batching? The mesh has the interesting functionality of being able to be combined with other meshes, resulting in one large mesh. It does have its restrictions as the meshes must use the same material and the number of triangles allowed inside a single mesh is limited. Luckily the buildings and trees only use three different materials in total, so we should be able to combine an entire block into a maximum of three meshes.

The Combine() function requires you to use use the CombineInstance type, where the mesh and the transform of each mesh is added for the function to use. The mesh part is easy, as we can simply use the BakeMesh() function to create all our meshes, and add each to a CombineInstance. The tricky part lies with the transform.

The CombineInstance accepts transform as a Matrix4x4, a matrix that represents position, rotation and scale. Working with matrices is a bit lower level than I’m used to, but fortunately part of the work has been done before by other people. Getting position and rotation working was easy enough, but accounting for scale was a quite a different matter. After trying every single combination of matrix operation I finally landed on the solution of using a matrix soley to negate the scale imposed on the mesh.

The resulting mesh gets added to it’s own GameObject, which will be removed when the block fades away.

/StateCapital/blob/834fbd7d4df324c60bac4e22af4c3db71f069363/Assets/CityBlockState.cs private void Freeze ( ) // This is called when all the animations in the block has finished { SkinnedMeshRenderer [ ] renderers = GetComponentsInChildren < SkinnedMeshRenderer > ( ) ; // Here we'll store all our CombineInstances, sorted by the material they use Dictionary < Material, List > combines = new Dictionary < Material, List > ( ) ; for ( int i = 0 ; i < renderers . Length ; i ++ ) // Need to create a CombineInstance for each mesh { if ( ! combines . ContainsKey ( renderers [ i ] . sharedMaterial ) ) combines . Add ( renderers [ i ] . sharedMaterial , new List < CombineInstance > ( ) ) ; List < CombineInstance > combList = combines [ renderers [ i ] . sharedMaterial ] ; Mesh m = new Mesh ( ) ; renderers [ i ] . BakeMesh ( m ) ; // We want to keep scale here CombineInstance combine = new CombineInstance ( ) ; combine . mesh = m ; // easy part done // Get the base transform matrix Matrix4x4 trans = transform . worldToLocalMatrix ; // fiddle with scale Vector3 scale = renderers [ i ] . transform . parent . localScale ; Vector3 scaleMesh = renderers [ i ] . transform . localScale ; scale . x = 1 / scale . x / scaleMesh . x ; scale . y = 1 / scale . y / scaleMesh . y ; scale . z = 1 / scale . z / scaleMesh . z ; Matrix4x4 scaler = Matrix4x4 . TRS ( Vector3 . zero , Quaternion . Euler ( Vector3 . zero ) , scale ) ; // Maths... combine . transform = trans * renderers [ i ] . localToWorldMatrix * scaler ; combList . Add ( combine ) ; renderers [ i ] . enabled = false ; // Disable the SkinnedMeshRenderer } // CombineInstances has been created, time to combine them all that share material foreach ( var mat in combines . Keys ) { // Setup the GameObject that will contain the combined mesh GameObject o = new GameObject ( "Combined Mesh - " + mat . name ) ; o . transform . parent = transform ; o . transform . localPosition = Vector3 . zero ; o . transform . localScale = Vector3 . one ; o . transform . localRotation = Quaternion . Euler ( Vector3 . zero ) ; // Setup the MeshFilter MeshFilter filter = filter = o . AddComponent < MeshFilter > ( ) ; filter . mesh = new Mesh ( ) ; // Do the combine // Note that a mesh can only contain a maximum of // 65535 triangles or vertices. I'm fine in my case, but you // probably want to do some checks. filter . mesh . CombineMeshes ( combines [ mat ] . ToArray ( ) , true , true ) ; // Setup the MeshRenderer MeshRenderer render = o . AddComponent < MeshRenderer > ( ) ; render . material = mat ; // Shadows! if ( Game . useShadow ) { render . shadowCastingMode = UnityEngine . Rendering . ShadowCastingMode . On ; render . receiveShadows = true ; } else { render . shadowCastingMode = UnityEngine . Rendering . ShadowCastingMode . Off ; render . receiveShadows = false ; } } } private void UnFreeze ( ) // Remains unchanged { foreach ( var m in GetComponentsInChildren < MeshRenderer > ( ) ) { m . enabled = false ; } foreach ( var m in GetComponentsInChildren < SkinnedMeshRenderer > ( ) ) { m . enabled = true ; } } private void Freeze() // This is called when all the animations in the block has finished { SkinnedMeshRenderer[] renderers = GetComponentsInChildren<SkinnedMeshRenderer>(); // Here we'll store all our CombineInstances, sorted by the material they use Dictionary<Material, List> combines = new Dictionary<Material, List>(); for (int i = 0; i < renderers.Length; i++) // Need to create a CombineInstance for each mesh { if(!combines.ContainsKey(renderers[i].sharedMaterial)) combines.Add(renderers[i].sharedMaterial, new List<CombineInstance>()); List<CombineInstance> combList = combines[renderers[i].sharedMaterial]; Mesh m = new Mesh(); renderers[i].BakeMesh(m); // We want to keep scale here CombineInstance combine = new CombineInstance(); combine.mesh = m; // easy part done // Get the base transform matrix Matrix4x4 trans = transform.worldToLocalMatrix; // fiddle with scale Vector3 scale = renderers[i].transform.parent.localScale; Vector3 scaleMesh = renderers[i].transform.localScale; scale.x = 1 / scale.x / scaleMesh.x; scale.y = 1 / scale.y / scaleMesh.y; scale.z = 1 / scale.z / scaleMesh.z; Matrix4x4 scaler = Matrix4x4.TRS(Vector3.zero, Quaternion.Euler(Vector3.zero), scale); // Maths... combine.transform = trans * renderers[i].localToWorldMatrix * scaler; combList.Add(combine); renderers[i].enabled = false; // Disable the SkinnedMeshRenderer } // CombineInstances has been created, time to combine them all that share material foreach (var mat in combines.Keys) { // Setup the GameObject that will contain the combined mesh GameObject o = new GameObject("Combined Mesh - " + mat.name); o.transform.parent = transform; o.transform.localPosition = Vector3.zero; o.transform.localScale = Vector3.one; o.transform.localRotation = Quaternion.Euler(Vector3.zero); // Setup the MeshFilter MeshFilter filter = filter = o.AddComponent<MeshFilter>(); filter.mesh = new Mesh(); // Do the combine // Note that a mesh can only contain a maximum of // 65535 triangles or vertices. I'm fine in my case, but you // probably want to do some checks. filter.mesh.CombineMeshes(combines[mat].ToArray(), true, true); // Setup the MeshRenderer MeshRenderer render = o.AddComponent<MeshRenderer>(); render.material = mat; // Shadows! if (Game.useShadow) { render.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.On; render.receiveShadows = true; } else { render.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.Off; render.receiveShadows = false; } } } private void UnFreeze() // Remains unchanged { foreach (var m in GetComponentsInChildren<MeshRenderer>()) { m.enabled = false; } foreach (var m in GetComponentsInChildren<SkinnedMeshRenderer>()) { m.enabled = true; } }

Function that combines all meshes in a block that shares the same material. [View on GitHub]

Getting this to work was a pain. There were so many factors in play which had their own wierd effect on the result that I resorted to trying every single thing – every combination possible. The result was way worth it, though as now the game has good framerate with shadows! If you’re trying to implement this yourself, don’t expect it to work out-of-the box.



The GameObject which has the combined meshes of the red houses is selected. Also, shadows.

Fourth Iteration – The next level

A design feature of the game is that the final block for each faction will remain once transitioned to; so if they’ll never change, why not combine multiple blocks of them together? At first I tried combining every single one but I soon became aware of the aforementioned restriction of mesh size. By splitting the city up in a grid we can combine the meshes of several blocks to a single mesh, allowing for an even greater reduction in drawcalls.

/StateCapital/blob/e387533bd3c2e8a774f55e8ddb775bba966fec87/Assets/CityBlockState.cs private static int bigBatchSizeX = 1 ; private static int bigBatchSizeZ { get { #if UNITY_WEBGL // Webgl seems to have a harder time with combining, so we'll make them smaller return 2 ; #else return 4 ; #endif } } // Contains our multi-block meshes private static Dictionary < Material, List > bigBatches = new Dictionary < Material, List > ( ) ; ... if ( rightTransitionPrefab != null && leftTransitionPrefab != null ) { // One or more transition direction is possible, combine normally ... } else { // Cross-block combination hoy! foreach ( var mat in combines . Keys ) { GameObject o = null ; if ( bigBatches . ContainsKey ( mat ) ) { foreach ( var go in bigBatches [ mat ] ) { // See if any GameObject close enough exists Vector3 dist = ( transform . position - go . transform . position ) * ( 1 . 0f / ( 7 . 0f ) ) ; if ( 0 . 0f < Mathf . Abs ( dist . x ) < Mathf . Abs ( dist . x ) < bigBatchSizeX &&; 0 . 0f < Mathf . Abs ( dist . z ) < Mathf . Abs ( dist . z ) < bigBatchSizeZ ) { o = go ; break ; } } } else { bigBatches . Add ( mat, new List ( ) ) ; } if ( o == null ) { // Add new cross-block mesh container Vector3 pos = transform . position ; pos . x = ( ( int ) pos . x / ( 7 * bigBatchSizeX ) ) ; pos . z = ( ( int ) pos . z / ( 7 * bigBatchSizeZ ) ) ; o = new GameObject ( "Big_Combined_Mesh-" + mat . name + "_[" + ( int ) pos . x + ", " + ( int ) pos . z + "]" ) ; pos . x *= 7 . 0f * ( float ) bigBatchSizeX ; pos . z *= 7 . 0f * ( float ) bigBatchSizeZ ; o . transform . parent = transform . parent . parent . parent ; o . transform . position = pos ; o . transform . localScale = Vector3 . one ; o . transform . localRotation = Quaternion . Euler ( Vector3 . zero ) ; bigBatches [ mat ] . Add ( o ) ; } MeshFilter filter = o . GetComponent < MeshFilter > ( ) ; if ( filter == null ) filter = o . AddComponent < MeshFilter > ( ) ; MeshRenderer render = o . GetComponent < MeshRenderer > ( ) ; if ( render == null ) render = o . AddComponent < MeshRenderer > ( ) ; for ( int i = 0 ; i < combines [ mat ] . Count ; i ++ ) { // Modify each CombineInstance to accountfor the offset in position. CombineInstance comb = combines [ mat ] [ i ] ; comb . transform = Matrix4x4 . TRS ( - o . transform . position , Quaternion . Euler ( Vector3 . zero ) , Vector3 . one ) * transform . localToWorldMatrix * comb . transform ; combines [ mat ] [ i ] = comb ; } // Add the already existing mesh of the cross-block mesh container the list of combines Matrix4x4 matrix = Matrix4x4 . TRS ( filter . transform . position - o . transform . position , filter . transform . rotation , filter . transform . localScale ) ; CombineInstance inst = new CombineInstance ( ) ; inst . mesh = filter . mesh ; inst . transform = matrix ; combines [ mat ] . Add ( inst ) ; filter . mesh = new Mesh ( ) ; // Combine the meshes // Note that a mesh can only contain a maximum of // 65535 triangles or vertices. I'm fine in my case, but you // probably want to do some checks. filter . mesh . CombineMeshes ( combines [ mat ] . ToArray ( ) , true , true ) ; // Shadow! This should probably be moved in retrospect render . material = mat ; if ( Game . useShadow ) { render . shadowCastingMode = UnityEngine . Rendering . ShadowCastingMode . On ; render . receiveShadows = true ; } else { render . shadowCastingMode = UnityEngine . Rendering . ShadowCastingMode . Off ; render . receiveShadows = false ; } } private static int bigBatchSizeX = 1; private static int bigBatchSizeZ { get { #if UNITY_WEBGL // Webgl seems to have a harder time with combining, so we'll make them smaller return 2; #else return 4; #endif } } // Contains our multi-block meshes private static Dictionary<Material, List> bigBatches = new Dictionary<Material, List>(); ... if (rightTransitionPrefab != null && leftTransitionPrefab != null) { // One or more transition direction is possible, combine normally ... } else { // Cross-block combination hoy! foreach (var mat in combines.Keys) { GameObject o = null; if (bigBatches.ContainsKey(mat)) { foreach (var go in bigBatches[mat]) { // See if any GameObject close enough exists Vector3 dist = (transform.position - go.transform.position) * (1.0f / (7.0f)); if (0.0f < Mathf.Abs(dist.x) < Mathf.Abs(dist.x) < bigBatchSizeX &&; 0.0f < Mathf.Abs(dist.z) < Mathf.Abs(dist.z) < bigBatchSizeZ) { o = go; break; } } } else { bigBatches.Add(mat, new List()); } if (o == null) { // Add new cross-block mesh container Vector3 pos = transform.position; pos.x = ((int)pos.x / (7 * bigBatchSizeX)); pos.z = ((int)pos.z / (7 * bigBatchSizeZ)); o = new GameObject("Big_Combined_Mesh-" + mat.name + "_["+ (int)pos.x + ", " + (int)pos.z + "]"); pos.x *= 7.0f * (float) bigBatchSizeX; pos.z *= 7.0f * (float)bigBatchSizeZ; o.transform.parent = transform.parent.parent.parent; o.transform.position = pos; o.transform.localScale = Vector3.one; o.transform.localRotation = Quaternion.Euler(Vector3.zero); bigBatches[mat].Add(o); } MeshFilter filter = o.GetComponent<MeshFilter>(); if(filter == null) filter = o.AddComponent<MeshFilter>(); MeshRenderer render = o.GetComponent<MeshRenderer>(); if(render == null) render = o.AddComponent<MeshRenderer>(); for (int i = 0; i < combines[mat].Count; i++ ) { // Modify each CombineInstance to accountfor the offset in position. CombineInstance comb = combines[mat][i]; comb.transform = Matrix4x4.TRS( -o.transform.position, Quaternion.Euler(Vector3.zero), Vector3.one) * transform.localToWorldMatrix * comb.transform; combines[mat][i] = comb; } // Add the already existing mesh of the cross-block mesh container the list of combines Matrix4x4 matrix = Matrix4x4.TRS( filter.transform.position - o.transform.position, filter.transform.rotation, filter.transform.localScale); CombineInstance inst = new CombineInstance(); inst.mesh = filter.mesh; inst.transform = matrix; combines[mat].Add(inst); filter.mesh = new Mesh(); // Combine the meshes // Note that a mesh can only contain a maximum of // 65535 triangles or vertices. I'm fine in my case, but you // probably want to do some checks. filter.mesh.CombineMeshes(combines[mat].ToArray(), true, true); // Shadow! This should probably be moved in retrospect render.material = mat; if (Game.useShadow) { render.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.On; render.receiveShadows = true; } else { render.shadowCastingMode = UnityEngine.Rendering.ShadowCastingMode.Off; render.receiveShadows = false; } }

Special addition that combines meshes across several blocks. [View on GitHub]

This is obviously an incredibly game-specific optimization, but I wanted to show you that doing this is something to consider, as doing this fairly simple trick reduced the number of drawcalls by up to a factor of 7.



A GameObject containing a subcolumn of meshes spanning several blocks

Result

If we assume each block contains 15 SkinnedMeshRenderers and the game has generated the maximum of 500 blocks; the different solutions would yield:

No optimization: 7500 drawcalls

Iteration 1: N/A (but presumably similar to iteration 2)

Iteration 2: varies, but far less than without optimization. No shadows

Iteration 3: 1500 drawcalls

Iteration 4: 215 – 1500 drawcalls, depending on number of blocks that have reached max-level.

Ending Notes

So I suppose the essence of this post boils down to four points:

Don’t optimize too early on, you can always do it later.

Find the loopholes! If something doesn’t work with X, can you temporarily make it into a Y?

Use the Design; assumptions buys performance.

Reuse materials! Using fewer materials allows you to combine more meshes.

Thanks for reading!

View the project at GitHub.

Play the Game!