Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

Time and frequency: Two sides of the same coin

You'll notice I didn't say, "human hearing violates the laws of physics," even though it was very tempting. The truth is that nothing violates the laws of physics, though many things violate the simplified models we use to approximate them.

Take a tone, played continuously for ever and ever. The frequency of the tone is very well-defined, but it has no start or end point. Therefore, the time that the note was played is entirely uncertain. Conversely, when we beat a drum, the sound has a very sharp temporal definition, but the tone is actually a broad spectrum of individual frequencies all added together. These two properties, the timing of a tone and its frequency, are related to each other. The measurement of one limits the measurement of the other (called the Fourier uncertainty principle).

In between our infinitely long note and the drum beat, there are short sharp packets of sound that have their frequency and timing as precisely defined as they can be. Any individual note would have to be longer to measure its frequency more accurately—indeed, the note would have to be longer to have a better defined frequency. But the note would have to contain more frequency components to have a sharper temporal structure. These bits of sound are often called Fourier-limited pulses, since they possess a temporal and frequency uncertainty that are, together, as small as possible.

Humans, are you nonlinear?

These pulses of sound represent the ultimate limits for linear measurements. If human hearing uses a linear form of frequency and temporal sound perception, we should expect that we will not be able to perceive timing and frequency differences that are smaller than these ultimate limits.

To test this, a pair of physicists from Rockefeller University gave a group of subjects tests where they were asked to perceive frequency differences between Fourier-limited sound packets. They were also asked to perceive timing differences between Fourier-limited sounds and to do both simultaneously. The tests were run with distracting high notes being played.

They found humans certainly do not perceive sound in a linear fashion. Indeed, one subject was able to determine the relative timing of notes to an accuracy of about one oscillation period. However, this high temporal precision came at the cost of frequency precision. Even taking the decreased frequency acuity into account, the combined precision was still much better than that given by the limits of a linear model. Likewise, another subject had extraordinary frequency perception at the cost of temporal resolution but still beat the uncertainty limit.

Most subjects clocked in with uncertainty limits about 10 times better than a linear model would suggest, with musicians, composers, and conductors performing best.

Why, yes you are nonlinear

The obvious conclusion, of course, is that humans don't perceive sound linearly. To a large extent, this was already known. We know volume is perceived nonlinearly, but we didn't really know much about temporal/frequency perceptions. Researchers suspected that this was nonlinear—because the brain is anything but linear—but they didn't know which model would accurately represent what goes on in the brain. Researchers and sound engineers have continued to work with linear models because they don't really know what else to use.

As the researchers point out, their results go a long way to eliminate many nonlinear models because most don't predict the combined temporal and frequency resolution found in humans. They also point out the importance of this work for audio encoding. Even now, one of the first steps of many encoders is to use a linear algorithm to break up an audio track into a 2D soundscape, which is then used as input for the actual encoding.

I don't have a lot of time for audiophiles with gold-coated connectors and "unidirectional" coaxial cable, but this data is something I could buy into.

Physical Review Letters, 2013, DOI: 10.1103/PhysRevLett.110.044301