Posted by posila on 2020-02-07

Hello,

We released 0.18.4 this week, same old same old, more bugfixes, more bugs, more changes. At this stage of development, not many interesting things are happening, we are just polishing what we have.

Minor terrain render optimization posila

Just a couple days before the release of 0.18.0 I had an epiphany about a terrain rendering problem that was bugging me for a really long time. When rendering terrain, we reuse the texture from the previous frame. How this was always done, is that we would render the texture shifted to the new position, fill up the gap, and then copy the final result back into the texture for reuse in next frame. So what was bugging me about this? This simple operation would result in rasterizing 2 screens worth of pixels. While that is not a problem for at least half decent GPUs from the past decade, it's a significant work load for integrated GPUs, which in general have an order of magnitude lower memory bandwidth than dedicated GPUs. It could also be equally bad for old low-end dedicated GPUs.

One of the extreme examples is the Intel HD Graphics 3000 - an integrated GPU on the Sandy Bridge CPU architecture. When you sit still and the terrain can be reused without shifting, it would take 'just' 2 milliseconds to copy it to the game view. But when you started to move, the GPU time to render the terrain could go up to 5 milliseconds. And that is only at 1600x900 resolution. Not even 1080p. So, it was bothering me we were spending nearly 1/3rd of a frame time (16.66 ms) to render the terrain, when the engine has much more work to do to render the rest of the game (for comparison GeForce GTX 750Ti or Radeon R7 360 would do the same under 0.5 ms at 1080p).

The realization I had, was that I can 'scroll' the buffer texture. If I remember the offset of the top left corner, I can un-scroll it to the game view, and then instead of copying all the terrain back to the buffer, we can just adjust the offset and update the parts that changed. So, the number of pixels copied is proportional to how much the terrain scrolled. It is so simple I am embarrassed not to have figured this out years ago.

Mp4 playback not supported on your device.

Most people could not have noticed this optimization, as most GPUs people have nowadays did the un-optimal thing in a fraction of a millisecond already. But it still made me very happy to be finally able to remove this inefficiency. Contemporary integrated GPUs are also significantly faster, and while it might not be as much of a challenge to render the game for them, they do share some resources with the CPU - be it the last level of cache, or CPU cooler, so the integrated GPU working hard may cause the CPU to slow down.

However, the point I wanted to illustrate by this post is how broad a range of GPUs there is. People see a 2D game and expect to be able to play it on essentially anything. If we want to live up to that expectations, we have to impose a lot of limitations on ourselves, because 'anything' also includes a couple orders of magnitude slower GPU than is found in an average gaming computer of today. CPUs got a lot faster in the last decade too, but mostly due to increasing the number of cores and adding wider vector computation units. They didn't get that much faster when executing serial code, which is unfortunately most of Factorio's game code. So if you play the game on a laptop with a Core 2 Duo and GeForce 320M, you'll run into framerate issues due to the weak GPU much sooner than a UPS slowdown due to the old CPU.

Side note: You might ask, why do we bother with caching the terrain in the first place and not just re-render it from scratch every frame. Short answer is - because Factorio's terrain rendering is insane due to its complicated tile transition rules, and re-rendering it every frame is just not fast enough.