This is a problem that I’ve struggled with for quite some time while developing a strong structural base for BioGrid – even if you can simulate a large dynamic world, how do you render it efficiently?

The easy way out would be to only show a small, zoomed-in section of the world with a global minimap – there are a lot of working examples of games doing just that, mostly using some form of object pooling for optimization. There are a lot of options in that space, and it’s generally a pretty manageable solution. But I feel a proper god-game should strive for something more. I really wanted the ability to smoothly zoom from grass roots to the clouds, and everything in between!

This decision immediately forced me out of my comfort zone, and I needed to really start from scratch here. A straightforward, but ultimately messy solution would have been to start thinking about LOD levels and visual abstractions to cut down on the amount of rendering and unique entities. But sacrificing visual fidelity and introducing pop-in or LOD wasn’t really something I felt comfortable either, I wanted to fully render at least a 1024×1024 map, which translates to a whopping maximum of 1048576 sprites on screen at once! Here are a few of the several stages of grief I went through, trying to reach this goal:

Comedy Option – Rendering a GameObject for every tile

This was an abject and immediate failure – the overhead and instantiation costs become untenable even on a 128×128 map. Clearly a rendered tile should be as lightweight and dumb as possible, we don’t really need an unique object for each tile – most of the rendered sprites will be identical anyway.

Attempt 2 – Rendering a quad for every tile

This actually worked out better than I had hoped. On every simulation tick, I’d assemble all the quads in a script, and assign UVs based on the sprite I needed to display. Then I’d stuff this quad soup in a mesh and assign that to a MeshRenderer. Due to Unity only handling a maximum of 65535 vertices in a mesh, I had to partition my world into 64×64 chunks. This had an unexpected benefit – the chunks that were offscreen were automatically culled by the rendering engine! To my surprise, rendering all those quads was actually acceptably fast – modern GPUs are absolute beasts when it comes to raw vertex-crunching.

What killed this approach was the mesh upload cost. I prepared my quads in a background thread, and this worked well enough, but actually assigning the vertices and UVs to Unity’s mesh needs to be done in the main thread, and apparently there’s no way around it. Frustratingly, it introduces an extremely noticeable and annoying stutter for large meshes. The GPU could render this pretty much brute-force solution with relative ease, but I couldn’t get the data to it quickly and smoothly enough. So, how to reduce the amount of data I’d need to pass through Unity’s mesh modification bottleneck?

Attempt 3 – Rendering a static grid of tiles

This was an incremental update over the previous version – I decided to construct a quad for every tile at startup, and control them via vertex colors. Previously I was updating 4 large arrays on every tick – the vertices, triangles, UV’s and normals, but now I was uploading 2 arrays – an array of vertex colors and sprite UVs! The controlling shader was pretty simple, it’s possible to modify the vertex positions in a vertex shader, based on the vertex color. Here’s the relevant piece in its entirety:

void vert(inout appdata_full v, out Input o) { UNITY_INITIALIZE_OUTPUT(Input, o); o.normal = v.normal; v.vertex.x += (v.color.r * 2.0f - 1.0f) * _PositionOffset; v.vertex.y += (v.color.g * 2.0f - 1.0f) * _PositionOffset; v.vertex.xy += v.normal.xy * (1.0f - v.color.b) * _ScaleOffset; #if defined(PIXELSNAP_ON) v.vertex = UnityPixelSnap(v.vertex); #endif } 1 2 3 4 5 6 7 8 9 10 11 void vert ( inout appdata _ full v , out Input o ) { UNITY_INITIALIZE_OUTPUT ( Input , o ) ; o . normal = v . normal ; v . vertex . x += ( v . color . r * 2.0f - 1.0f ) * _PositionOffset ; v . vertex . y += ( v . color . g * 2.0f - 1.0f ) * _PositionOffset ; v . vertex . xy += v . normal . xy * ( 1.0f - v . color . b ) * _ScaleOffset ; #if defined(PIXELSNAP_ON) v . vertex = UnityPixelSnap ( v . vertex ) ; #endif }

As you can see, R and G channels of the vertex color control the offset of the sprite in its cell. This is used for finer positioning of the sprites – while they are associated with a particular cell, their position in that cell can range from -1 to 1. This is mainly used to hide the tile-based nature of the world and add some randomness – creatures don’t just jump from cell to cell when moving about, they smoothly transition between them, plants don’t have to grow in neat geometric rows and so on.

As for the B channel – that controls the size of the sprite. This is useful to show plant growth, but was mainly used for hiding the empty quads while they were not being used. It requires specially constructed normals for each vertex in a quad – something like this:

With the normals constructed like this, we can scale the quad down to an invisibly tiny point when it’s not being actively used to display something and also scale it up to arbitrary sizes if we so desire:

This approach already bypassed a large part of the mesh updating bottleneck, and it was mostly stutter-free on 512×512 maps. But…I had no performance headroom left for actually adding more logic to my world. In addition, I had planned for drawing several sprites in a single cell – with this system, animals couldn’t be rendered on top of grass for example, it was strictly one sprite per tile. And if I wanted to render several layers, I’d double the cost of mesh modifications and overload the GPU with the sheer number of triangles. I needed something faster and more scalable.

Current approach – Texture bombing

This is a kind of obscure GPU-based technique I’d happened to play with several years ago and mentally filed away as a potentially useful tool. There’s a nice article about texture bombing on GPU Gems and another variation called Tile-based Texture Mapping Generally speaking, it involves partitioning your UV space into a regular grid of cells, and rendering many copies of a separate glyph texture into those cells.

The technique can be extended to draw several glyphs from an atlas texture and even rotate them, even though I don’t really need rotated sprites at this point. Best of all – it’s all done on a single quad, and the cost is constant – it depends on the screen resolution, not the density of the sprites. Sounds great on paper, but of course, there are quite a few gotchas and pitfalls involved in using this method…

The GPU Gems articles give a pretty nice overview of the general algorithm itself using a noise texture, but in practical cases we’d like more control over what we’re drawing and where. The solution is to encode the necessary data into a separate texture – conceptually very similar to previous approach, only difference being that we’re not packing data into vertices of the quads, but into pixels of a texture. Updating large textures every frame isn’t blazingly fast either, but luckily it’s reasonably quick compared to mesh modification. Here’s what that texture might look like:

As before, R and G channels encode the positional offset. Value in the B channel is used for picking the correct sprite from an atlas. As that channel’s value ranges from 0…255, this gives us a plentiful limit of 256 different sprites per sprite layer. Alpha channel is used for sprite scaling, like in previous approach.

Tip: for accurate results, make sure your data texture is stored in an uncompressed format, using point filtering and encoded in linear color space!

Now, on to the actual shader! This time, we’re doing nothing in the vertex stage (it’s just a single large quad, after all), all the action is happening in the fragment stage. I’d rather leave the basic explanation of the algorithm to the really very good articles linked earlier, but here’s a commented version of the shader I’m using for reference:

Texture bombing shader void surf(Input IN, inout SurfaceOutputStandard o) { // compute cell UV float scale = _DataTex_TexelSize.z; // width of the data texture in pixels float2 scaledUV = IN.uv_DataTex * scale; int2 cell = floor(scaledUV); // current fragment belongs into this discrete region - a cell float2 offset; fixed4 c = fixed4(0, 0, 0, 0); // this is where rendered color will be accumulated to fixed4 image; // sprites fixed4 dataTex; // sprite control data // create UV space without discontinuities introduced by flooring float2 derivativeUV = scaledUV * _SpriteLOD; float2 texDdx = ddx(derivativeUV); float2 texDdy = ddy(derivativeUV); int i = -1, j = -1; // current fragment might also be affected by neighbouring cells, not only its containing cell // loops are necessary only because my sprites can partially overlap into neighbouring cells // if a sprite always fits into its cell, we could skip neighbour checking entirely for (i = -1; i <= 1; i++) { for (j = -1; j <= 1; j++) { dataTex = tex2D(_DataTex, IN.uv_DataTex + float2(i, j) / scale); // gets a pixel from the data texture associated with the current cell offset = scaledUV - (cell + float2(i, j)); uint index = dataTex.b * 255.0f; // get the sprite's index from the data texture uint xIndex = index % _SpriteCount; // X-index in sprite atlas uint yIndex = index / _SpriteCount; // Y-index in sprite atlas offset.x -= (dataTex.r * 2.0f - 1.0f) * _PositionOffset; // offset the sprite's position along x-axis offset.y -= (dataTex.g * 2.0f - 1.0f) * _PositionOffset; // offset the sprite's position along y-axis // scale the sprite sampling window according to data texture float scaleFactor = _ScaleOffset / dataTex.a; offset.x = offset.x * scaleFactor + (_SpriteSize - (_SpriteSize * scaleFactor)); offset.y = offset.y * scaleFactor + (_SpriteSize - (_SpriteSize * scaleFactor)); offset.x = saturate(offset.x) * _SpriteSize; offset.y = saturate(offset.y) * _SpriteSize; // move the sampling window to correct position on sprite atlas offset.x += xIndex * _SpriteSize; offset.y += yIndex * _SpriteSize; // sample from texture atlas using a gradient without discontinuities image = tex2Dgrad(_MainTex, offset, texDdx, texDdy); // draw the fragment only if data texture says there should be a sprite in this cell if (dataTex.a > 0.0 && image.a > 0.0) { // blend overlapping sprites correctly float blendLeft = 1.0f - c.a; c += image * image.a * blendLeft; } } } o.Albedo = c.rgb * _Color.rgb * 2.0f; o.Metallic = _Metallic; o.Smoothness = _Glossiness; o.Alpha = c.a * _Color.a; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 void surf ( Input IN , inout SurfaceOutputStandard o ) { // compute cell UV float scale = _DataTex_TexelSize . z ; // width of the data texture in pixels float2 scaledUV = IN . uv_DataTex * scale ; int2 cell = floor ( scaledUV ) ; // current fragment belongs into this discrete region - a cell float2 offset ; fixed4 c = fixed4 ( 0 , 0 , 0 , 0 ) ; // this is where rendered color will be accumulated to fixed4 image ; // sprites fixed4 dataTex ; // sprite control data // create UV space without discontinuities introduced by flooring float2 derivativeUV = scaledUV * _SpriteLOD ; float2 texDdx = ddx ( derivativeUV ) ; float2 texDdy = ddy ( derivativeUV ) ; int i = - 1 , j = - 1 ; // current fragment might also be affected by neighbouring cells, not only its containing cell // loops are necessary only because my sprites can partially overlap into neighbouring cells // if a sprite always fits into its cell, we could skip neighbour checking entirely for ( i = - 1 ; i <= 1 ; i ++ ) { for ( j = - 1 ; j <= 1 ; j ++ ) { dataTex = tex2D ( _DataTex , IN . uv_DataTex + float2 ( i , j ) / scale ) ; // gets a pixel from the data texture associated with the current cell offset = scaledUV - ( cell + float2 ( i , j ) ) ; uint index = dataTex . b * 255.0f ; // get the sprite's index from the data texture uint xIndex = index % _SpriteCount ; // X-index in sprite atlas uint yIndex = index / _SpriteCount ; // Y-index in sprite atlas offset . x -= ( dataTex . r * 2.0f - 1.0f ) * _PositionOffset ; // offset the sprite's position along x-axis offset . y -= ( dataTex . g * 2.0f - 1.0f ) * _PositionOffset ; // offset the sprite's position along y-axis // scale the sprite sampling window according to data texture float scaleFactor = _ScaleOffset / dataTex . a ; offset . x = offset . x * scaleFactor + ( _SpriteSize - ( _SpriteSize * scaleFactor ) ) ; offset . y = offset . y * scaleFactor + ( _SpriteSize - ( _SpriteSize * scaleFactor ) ) ; offset . x = saturate ( offset . x ) * _SpriteSize ; offset . y = saturate ( offset . y ) * _SpriteSize ; // move the sampling window to correct position on sprite atlas offset . x += xIndex * _SpriteSize ; offset . y += yIndex * _SpriteSize ; // sample from texture atlas using a gradient without discontinuities image = tex2Dgrad ( _MainTex , offset , texDdx , texDdy ) ; // draw the fragment only if data texture says there should be a sprite in this cell if ( dataTex . a > 0.0 && image . a > 0.0 ) { // blend overlapping sprites correctly float blendLeft = 1.0f - c . a ; c += image * image . a * blendLeft ; } } } o . Albedo = c . rgb * _Color . rgb * 2.0f ; o . Metallic = _Metallic ; o . Smoothness = _Glossiness ; o . Alpha = c . a * _Color . a ; }

You might wonder what’s up with the tex2Dgrad instruction. If we were to use a regular tex2D sampler with the partitioned UV space, we’d get these hairline artifacts at the tile borders:

But only if the tile atlas is using mipmapping…what’s going on here?

You’ll certainly run into this issue whenever you’re doing some sort of GPU atlasing or clamping regions of your UV space. This artifact appears due to the way GPUs compute mipmapping – a small kernel of neighbouring pixels is analysed, and their UV coordinates are compared. This is all fine and good if you’re not doing anything weird – if you squeeze a huge texture into a tiny dot on the screen, there’s a large discontinuity between the few pixels rendered on the screen and their underlying UV coordinates, so naturally a smaller mip level should be selected.

But due to the fact we’re clamping the UV space into discrete cells, and more than 1 pixel is sampled when doing mipmapping, some texels fall outside the cell boundary!

This introduces a huge discontinuity into UV space, and the border texels are dropped to the smallest mip level. Luckily, there’s a pretty nice fix for this – we supply our own continuous UV derivates to tex2Dgrad sampler using ddx and ddy instructions.

There’s another thing to consider – preparing your atlas texture for correct mipmaps. Tightly packed sprites will start bleeding into each other at smaller mip levels. There are at least two options here – we could either leave more empty space between the sprites or provide manually generated mip chain that respects sprite boundaries. I chose to go with looser packing:

And that’s basically it – it was a bit of a incremental slog, but I’ve successfully moved from mesh-based approaches to purely GPU-based tilemapping with no culling or LOD-based optimizations. This approach probably won’t fit most games, but still, something to consider next time you’re planning on making a grid-based game!

Here’s some rendering stats on that 1024×1024 map I was striving for – the individual sprites are reduced to single pixels at this height, but they’re there, I promise:

Next time I’ll write a bit about the underlying grid simulation itself – it’s multithreaded with a twist – there are no concurrent collections and nary a lock to be found either…stay tuned!