…on the third frame the tree on the “slower” video is significantly ahead of its counterpart on the correct video (circled in red). You can also notice that this frame apparently took a longer time (circled in yellow).

Wait, wait, wait… if a video is “slower”, and the frame “took more time” how can it be ahead?

Well, to explain this, you have to understand how games, and other 3D interactive applications are actually doing their animation and rendering nowadays. (Experienced developers will excuse me if I’m boring them with things they know here, but I have to make sure all the gamers that might be interested in this can follow the text.)

A brief history of frame timing

A long time ago, in a galaxy far, far away… When developers made first video games, they would normally design for the exact frame rate that the display runs on. In the NTSC regions which run TVs at 60 Hz, it would mean 60 fps, in PAL/SECAM regions which run TVs at 50 Hz, it would mean 50 fps. They would never even exercise a thought of perhaps “dropping a frame”.

Most games were very streamlined and simplified concepts, running on fixed hardware — usually an arcade console, or a well known “home micro-computer”, like ZX Spectrum, C64, Atari ST, Amstrad CPC 464, Amiga, etc. Basically, one designed, implemented and tested for a particular machine and particular frame rate, and was 100% sure that it would never drop a frame anywhere.

Velocities of objects were also stored in “frame” units. So you wouldn’t say how many pixels per second a character would move, but how many pixels per frame. In Sonic The Hedgehog for Sega Genesis, e.g. rolling speed is known to be exactly 16 pixels per frame. Many games even had separate versions for PAL and NTSC regions where animations were hand-drawn specifically for 50 fps and 60 fps respectively. Basically, running at any other frame rate was not an option.

As games started running on more varied machines — notably PCs with expandable and upgradeable hardware — one couldn’t be sure which frame rate the game will run on anymore. Compounding that fact was the fact that games became more complicated and unpredictable — most notably 3D games can have large variances in scene complexities, sometimes even player-driven variances. E.g. everyone loves shooting at a stack of fuel barrels — causing a huge explosion, nice fireworks… and an inevitable frame drop. But we don’t mind the frame drop there — because it’s so much fun.

So it can be hard to predict how long it will take to simulate and render one frame. (Note that on consoles today, we still have fixed hardware, but the games themselves are often quite unpredictable and complex anyway.)

If you cannot be sure which frame rate the game will be running at, you have to measure the current frame rate and continually adapt the game’s physics and animation speed. If one frame is taking 1/60th of a second (16.67 ms), and your character runs 10 m/s, then it moves by 1/6th of a meter in each frame. But if the frame is not 1/60th anymore, rather it suddenly started taking 1/30th of a second (33.33ms) — you have to start moving the character by 1/3rd of a meter (two times “faster”) per frame, so that it continues moving at the same apparent speed on screen.

How does a game do this? Basically —it measures time at the start of one frame, then on the start of the next one and calculates the difference. It’s quite a simple method, but it works very well. Sorry, it used to work very well. Back in the ’90s (remember those “35 fps speeds for serious competitive netplay” from the beginning), people were more than happy with this method. But at that time, a graphics card (remember, they weren’t even called GPUs then) was a very “thin” piece of hardware, and the main CPU had direct control over when things get to the screen. If you didn’t have a 3D accelerator, the CPU was even drawing the things directly. So it knew exactly when they are ending up on screen.

What is actually going on today

Over time, as we started having more complex GPUs, those GPUs became more and more “asynchronous”. That means that when the CPU gives a command to the GPU to draw something on the screen, the GPU just stores that command in a buffer, so that the CPU can go on with its own business while the GPU is rendering. That ultimately results in the situation where the CPU tells the GPU that “this is the end of the frame” and the GPU just stores this as a nice piece of data. But it doesn’t really treat it as something of much urgency. How could it — when it is still processing some of the previously issued commands. It will show the frame on the screen when it’s done with all the work it’s been given before.

So, when a game is trying to calculate the timing by subtracting timestamps at the start of two successive frames, the relevance of that is, to be blunt… quite dubious. Let’s get back to our example from those short videos. We had those frames with camera panning across some trees:

Six consecutive frames from the comparison video, with precise timing. Top is correct, bottom is heartbeat stutter.

Now recall this thing with timing and movements. In the first two frames, the frame timing was 16.67ms (which is 1/60th of a second), and the camera moves by the same amount in the top and bottom cases, so the trees are in sync. In the third frame, (in the bottom, stuttering case) the game saw that the frame time is 24.8ms, (which is more than 1/60th of a second), so it thinks that the frame rate has dropped and rushes to move the camera a bit more… only to find on the next, fourth frame the timing is only 10.7ms, so the camera moves a bit less there, and the trees are now more or less in sync again. (They don’t completely recover until about two frames later when everything reconsolidates finally.)

What happens here is that the game measures what it thinks is start of each frame, and those frame times sometimes oscillate due to various factors, especially on a busy multitasking system like a PC. So at some points, the game thinks it didn’t make 60 fps, so it generates animation frames slated for a slower frame rate at some of the points in time. But due to the asynchronous nature of GPU operation, the GPU actually does make it in time for 60 fps on every single frame in this sequence.

This is what we see as a stutter — animation generated for a varying frame rate (heartbeat) being displayed at actual correct fixed frame rate.

So, basically, there’s no problem whatsoever — everything is running smoothly, it’s just that the game doesn’t know it.

This brings us to the point from the beginning of the article. When we finally figured out that this is what caused the problem (actually, it’s an illusion of a problem — there’s no problem in fact, right?), here’s what we did for a test:

First we observe the “heartbeat” and then we use a little trick to make it go away.

In the first part of the video above, you can see the heartbeat issue from the beginning. Then we change a “magic” option and after that — everything becomes perfectly smooth!

What’s the magic option? In Serious Engine, we call this sim_fSyncRate=60 . In layman’s terms it basically means: “completely ignore all these timing shenanigans and pretend that we are always measuring steady 60 fps”. And it makes everything run smoothly — only because it was always running smoothly to begin with! The only reason why it ever looked stuttering is because the timing used for animation was wrong.

So that’s it? We just do that and everything is great?

Is the solution that simple?

Unfortunately… nope. That was only for a developer test. If we would stop measuring frame rate in real-world situations and just assume it is always 60, then when it does drop below 60 — and on a PC it will drop sooner or later for various reasons: OS running something in the background, power-saving or overheating protection down-clocking the GPU/CPU… who knows —then everything will slow down.

So, if we measure frame time, it stutters, if we don’t, everything can slow down at some points. What then?

The real solution would be to measure not when the frame has started/ended rendering, but when the image was shown on the screen.

So, how can the game know when a frame’s image is actually shown on screen? You might be surprised to learn that, in the current situation— there’s no way to do it!

Shocking, I know. One would expect this would be a basic feature of every graphics API. But it turns out that as things have been changing slowly here and there, everyone basically dropped the ball on this issue. We all forgot about the fine details of what is going on, kept doing basically what we were doing all the time, and the graphics APIs have evolved in all other aspects but this one: There’s no way for the application to know for sure when a frame was actually displayed on the screen. You can know when it finished rendering. But not when it got displayed.

What now?

Worry not, it’s not all that grim. Many people in the graphics ecosystem are currently busily working on implementing support for proper frame timing, under various names for different APIs. Vulkan API already has an extension called VK_GOOGLE_display_timing which was shown useful in a proof of concept implementation, but it is available only for a limited range of hardware and mostly on Android and Linux.

Work is now underway to provide such, and better facilities, hopefully in all the major graphics APIs. When? It’s hard to say, because the problem cuts quite deep into various OS subsystems.

I can promise you, though, that we at Croteam are advocating tirelessly for this problem to be fixed as soon as possible — and everyone in the interactive graphics ecosystem is very understanding and helpful.

We are looking forward to having this available to a broader public, and when that happens, we will provide an update for The Talos Principle that implements this feature.