One of my pet peeves with our game is that it’s a 2D game but it’s rendering is horrendous. Say a game scene that looks like this:

This doesn’t cover the game’s whole tilemap yet but it does already have lots of objects in it. When I run this through the profiler, this is what it looks like:

The timeline for Camera.Render hovers around 20-30ms. That’s a lot! Remember that you need to run a frame at 16ms at most to achieve 60fps. The rendering alone takes more than that. I don’t have the CPU budget anymore for game logic.

It’s not that I didn’t try to make the effort to optimize rendering and encourage batching. I did a lot. See here, here, here, and here. The game does perform well when the player zooms into the map, which means the camera sees less objects and majority are culled. However, for a simulation game like Academia, players would usually play zoomed out because they want to see how the school is doing. Thus, Camera.Render usually processes more objects and performs slower.

I keep thinking about those 3D games that renders meshes with hundreds to thousands of polygons and still run at 60fps. My hypothesis is that Camera.Render takes too much time calculating for culling and batching quad meshes. That got me into thinking that maybe I don’t need culling for this game. Modern GPUs should be able to chew thousands of polygons easily. I already have a tool that collects sprites into a single mesh and render that one mesh instead. There should be no problem if I just dump these monolithic meshes to the GPU for rendering.

So I made a separate project to test if my hypothesis is correct. I improved my custom sprite manager to handle static sprites more efficiently. I simulated the hypothetical maximum number of sprites in our game. I prepared 5 layers for static sprites that has 16,380 sprites per layer. That’s a total of 81,900 static sprites. I also added one layer for non static sprites that has 4,500 units. My custom renderer can render all these in 7-10ms.

Note that this is already the hypothetical maximum. The game scene I showed isn’t even half the maximum. My custom renderer should do much better than Camera.Render given the same scene.

I decided to integrate my custom renderer to our game. This required a lot of refactoring. I started the integration around late September 2018. I’m not finished yet. I haven’t completely applied it to all objects in the game but I did already cover the majority, especially those that are frequently used. I did another profiling of the game while loading the same scene and this is how it looks:

You can see here that my custom renderer runs at 1.725ms. On the right, you can also see that I’ve reduced the running time of Camera.Render to 2.01ms. Most objects don’t use the standard Unity renderer like SpriteRenderer and MeshRenderer anymore so they’re no longer “processed” by the camera. Adding the running time of my custom renderer and Camera.Render, the total rendering time now hovers around 3-5ms. That’s a huge difference from the original 20-30ms.

Notes:

All profiling are done using a development build (not from editor)

My custom renderer uses Unity’s ECS with Burst compiler turned on