Introduction

Have you ever been on an overseas phone call or VOIP chat (voice over internet protocol) and noticed a delay between exchanges? Protip: if you want to get a rough sense of this delay, you can perform the following experiment with your chat partner. You say one, and as soon as the other person hears this, they say two, to which you respond three, and so on, until you both get a sense of the delay.

The key is to have the other person react as fast as possible—that way, you can get a sense of how much of the delay is due to the phone line. It’s a trick I’ve used for years, and it can be useful in some situations so that you both have a realistic expectation of how much you can push the pace of whatever conversation you’re planning on having. You can think of the total delay here as resulting from three separate sources:

The speed of sound—how long it takes for a voice to reach the phone receiver, and how long it takes for the sound from the speaker to reach the ear. The latency of the phone line—this can include a myriad of different processes, but we’ll keep it simple and lump them together. Physiology—how fast a person reacts to a particular sentence. This includes mechanical processes in the inner ear, processing in the auditory cortex, cognitive processes in other brain regions, and finally a well coordinated stream of motor signals sent to the muscles that control the face, tongue, larynx, and lungs, to name just a few.

In most phone conversations, and indeed in many online gaming situations, the real bottleneck is physiology. Even in the most simple visual detection task, such as reacting with a button or keypress to the appearance of a visual signal (e.g. pressing the fire button the moment you see an enemy popping out from behind a wall), human reaction times average roughly 200 milliseconds (ms), which is a fifth of a second. On a fast computer, a good internet connection, and a rapidly updated display, the latency of the chain of events from button press to pixels lighting up can be on the order of milliseconds, or even a few hundred microseconds (there are 1000 microseconds in a millisecond).

So if the timescales of human reaction are so large compared to how long it takes for the computer to update the display, why should we care about a few milliseconds? Does it really matter whether our display adds 5 or 10 ms of latency to the chain? We’ll try to answer this question, but before we do, let’s take a closer look at human performance.

Human Reaction Times

If you look up information about human reaction time, the values you find are often averages, and it’s important to be aware of the method of testing. Often, we’re interested in simple reaction times, which is how long it takes to respond in a predefined manner to a change in the environment. For example, in simple visual reaction tasks, an observer may be asked to press a button once a light appears on the screen.

What’s important here is that they don’t have to expend effort (and time) analyzing the stimulus to see whether they should respond or not. If the task was to respond only if a green light appears (and not a red light), then reaction times will be slower. Same kind of deal if they have to respond with one key for a red light and another key for a green light.

To see a classic example of a simple visual reaction task, try your hand at the human benchmark reaction time test. One of the cool things about that site is that you can see some of the statistics. The all time average (median) is around 270 ms, but this includes input lag. In particular, it includes the delay between the moment the program instructs the display to change color (which is presumably when the timer starts), and the moment the pixels change color on your display. It also includes the delay between the moment you press the mouse button, and the moment the program receives the signal from your button press (which is when the timer stops).

If you look at the distribution, you can see that some people are performing at around 150 ms. These are probably younger folks who have excellent reflexes and are on good hardware (it’s also possible that some people are using clever methods to cheat), but 150 ms does seem to be in the ballpark of the limits of human reaction time to a visual stimulus (at least when it comes to pressing buttons with a finger), although there may be a few people who can push this limit lower.

We also respond faster to acoustic stimuli than visual stimuli. Here’s a great explanation from reddit. Basically, the idea is that converting photons to neural signals takes longer than it does to convert pressure waves to neural signals. Because of this, acoustic reaction times are around 30 ms faster.

In many of these experiments, the way the reaction is actually measured can have an impact on the final result. For example, in the humanbenchmark test, there is a slight delay between the moment your finger actually starts to move, and the moment the button actually “clicks”. And depending upon the USB polling rate of the mouse, there could be as much as a 8 ms delay between the moment the button clicks, and the moment the signal from the mouse is registered.

To get a more accurate picture of reaction times, many labs use specialized equipment to get a more precise idea of when the finger (or other body part) starts to actually move. For example, passive optical tracking systems (where retroreflective markers are placed on the target object, such as a finger) or tiny inertial sensors attached to the object are two ways to measure the position of an object across time. Some studies use surface electromyography (EMG) to measure the electrical activity in the muscles themselves.

Surface EMG is an accurate way to measure reflex, but it doesn’t take into account the time between the moment the muscles activate and the moment that force is actually produced across the joint in question (this time is called the electromechanical delay). So while measuring EMG reaction times is a great way to get rid of the “noise” involved in measuring things like button clicks, it doesn’t give us a true picture of how long it takes someone to produce a useful reaction—that is, a reaction that actually allows us to produce a physical response to the environment around us.

One of the posters on the Blur Busters forums (‘flood’, who is also responsible for designing this gem) has reported human benchmark scores of around 140 ms, and can regularly get scores below 160 ms. Here is a video of him sniping bots in CSGO.

The original action was captured at 100 frames per second (fps) while he was playing. By counting the number of frames between the moment the first pixel of the enemy appears, and the moment he fires, you can get an estimate of his reaction time. Across those 9 shots (ignoring the fourth shot) his reaction time averaged 180 ms, and five of these were around 170 ms.

This is a fairly conservative estimate, as his visual system might not have noticed anything until a good chunk of the enemy had slid into view, which may have been two or three more frames after the first pixel came into view (i.e. his actual reaction time may have been around 140 ms in some of those shots).

This is a pretty spectacular level of performance, considering these are responses to visual, and not auditory cues. I have reason to believe, however, that in some situations, people can react even faster. But before we delve into that, we should take an important detour.

Pushing the Limits—the Startle Response

Many animals, including humans have a very useful set of reflexes which together form the startle response. This response is much faster than regular responses and occurs in the presence of highly charged stimuli, such as a sudden loud noise. When you react to a potentially threatening stimulus, you’ll often notice that you react before you’re even aware of what caused your reaction. This is because the circuits involved in the response involve more primitive and ancient parts of the brain, such as the brainstem and amygdala, and do not involve advanced recognition and decision mechanisms in higher cortical areas.

For a wonderful take on this idea, watch this three minute segment from a talk by Jordan Peterson. The startle response can manifest in a number of different bodily areas, all the way from the eyes, to the head, jaw, shoulder, hips, knees, and feet.

You can see why a strong, whole body response can be useful in a situation like this or this. Studies have measured people blinking as early as 30-40 ms after a loud acoustic stimulus, and the jaw can react even faster. The legs take longer to react, as they’re farther away from the brain and may have a longer electromechanical delay due to their larger size.

According to the IAAF, if an athlete reacts within 100 ms of the start signal (which, in the Olympics, is a loud gunshot noise that is produced by a loudspeaker behind each runner), it is considered a false start. In other words, if an athlete starts moving 90 ms after the start signal, it is considered more likely that this was due to a lucky guess than an actual reaction. So much more likely, in fact, that a decision is made to disqualify the athlete than give him or her the benefit of the doubt.

The reaction times in these events are measured with force transducers in the starting blocks, which measure the force produced by the legs. The loud noise signal in a sprint race can produce a startle response, and the IAAF limit of 100 ms seems to be based on assumptions that humans cannot react (with their legs) faster than 100 ms to an auditory signal.

However, it’s not clear whether these assumptions take into account two important facts: First, a startle response improves reaction times, and second, there are humans out there who are on the extreme end of the spectrum and who may not have been represented in most studies of human reaction time. It turns out that this 100 ms limit is based on wrong assumptions.

In 2007, there was a study that accurately measured sprint reaction times of 9 athletes under very well controlled conditions (using loudspeakers and a force transducer in starting blocks just like in the Olympics). One of these athletes had an average reaction time of 87 ms! In 2009, in an IAAF commissioned study, reaction times of seven Finnish national level sprinters were measured. Reaction times were measured for legs and arms (force transducers were placed beneath the hands to measure arm reaction times).

One of the sprinters (female) had an average reaction time in the legs of 79 ms, and her fastest was 42 ms! Her average arm reaction time was 75 ms and fastest arm reaction time was 42 ms. Another sprinter (male) had an average leg reaction time of 73 ms (fastest was 58 ms), and an average arm reaction time of 51 ms (fastest was 40 ms).

Even if we are conservative and assume that the fastest times for some of these sprinters was based on guessing (the study says that each sprinter performed between 5 and 8 trials, so it’s not clear whether some trials were excluded), the averages do not lie. The authors of this study strongly recommended that the IAAF revise the limit to as low as 80 ms. Also note the reaction times were faster in the arms than the legs. This is important to remember as it has implications for gaming.

To get a sense of these timescales, Here is what 200 ms sounds like. When you click play, you’ll hear clicks that are spaced 200 ms apart. So you can imagine one click as the stimulus, and the next click as being the 200 ms response to that stimulus.

200 ms (5 Hz):

And here are a few more…

100 ms (10 Hz):

75 ms (13.33 Hz):

50 ms (20 Hz):

Oh, and if you think these response times are impressive, consider that the long legged fly has a reaction time of under 5 ms to a flash of light! This was measured in a particularly cool way. Researchers aimed a camera at a fly that was perched on a leaf. The shutter speed was set for 1/200 seconds (5 ms), and a flash was synchronized with the opening of the shutter.

They then took a photo of the fly. The flash startled the fly and it responded by briefly jumping into the air. Since the shutter was only open for 5 ms, this means that if the image captured the insect in mid air, the insect must have reacted within that 5 ms window. The researchers estimated that the actual reaction time may have been as low as 2 ms, which is an absolutely insane level of performance!

If you’re curious, this is what 5 ms sounds like:

5 ms (200 Hz):

You can explore these sounds in more detail over on online tone generator (choose saw tooth wave). Notice that at a certain point, the brain can’t hear the individual clicks, and instead interprets the increasing frequency as a rise in pitch.