Spectral analysis of a live blues band recording made by John Atkinson, showing content up to 40kHz, from " What's Going On Up There?

With a few exceptions, audiophiles have long advocated high-rez music formats, believing that music should be recorded and presented in the highest fidelity possible, for our pleasure and posterity.

And yet a few people in our community, and many more outside it, have long maintained that CD is good enoughthat, indeed, as a general principle, music at CD resolution is indistinguishable from high-rez music. And until recently, scientific evidence for the audibility of high-rez music, while not negligible, was thin.

It was not for lack of trying. Quite a few articles published in the relevant journals sought evidence of the audibility of higher bit depths and sampling rates. The results were mixed. In 2016, Joshua Reiss of the Queen Mary University of London published a meta-analysis combining results from 18 such studies, involving more than 400 participants and 12,500 trials, concluding that the experiments "showed a small but statistically significant ability of test subjects to discriminate high resolution content, and this effect increased dramatically when test subjects received extensive training." Some, though, thought the studies they chose were cherry-picked and so found the evidence weak (footnote 1).

At the October 2019 Audio Engineering Society convention in New Yorkjust concluded as I writeYuki Fukuda and Shunsuke Ishimitsu, both of Hiroshima City University, presented results that show quite clearly that listeners can distinguish sounds encoded and reproduced at different sampling frequencies. Their trials differed from the previous ones in one important way: Instead of exposing test subjects to music at different resolutions, they used test tones.

Specifically, they used two forms of spectrally "flat" signals: white noise and (Gaussian) impulse signals. White noise and impulse signals both have broadband content; they differ from each other in that in an impulse, all the frequency components are correlated in time, whereas for white noise the phase of the various frequency components is random.

The Japan Electronics and Information Technology Industries Association defines the "CD format" as having sampling frequencies up to 48kHz and a bit depth of 16. Anything higher, in bit depth or sampling frequency, is considered high resolution. For these experiments, the two researchers used a bit depth of 16 in all trials so that their lowest sampling rate data would be classified as CD-rez, the others as high-rez.

Fukuda and Ishimitsu employed Gaussian impulse and white noise test signals at 48kHz, 96kHz, and 192kHz. The test had seven subjects, all young: The average age was right at 22. ABX comparisons were made between 48kHz and 96kHz signals, 48kHz and 192kHz signals, and 96kHz and 192kHz signals. In a different round of testing, they applied a different methodology: MUSHRA, for Multiple Stimuli with Hidden Reference and Anchor. This test, too, involved seven subjects. Two women participated in the ABX round and one woman in the MUSHRA round. For each methodology, both headphones and loudspeakers were employed.

The setups were modest. Loudspeakers were Eclipse TD-M by Japanese manufacturer Fujitsu Ten, a single-driverhence intrinsically time-alignedpowered desktop loudspeaker. The Eclipse, which is out of production but last sold in the US for $1300/pair, accepts digital data via USB and Wi-Fi, but in these experiments appeared to be used via the analog input: a 3.5mm stereo jack. The researchers employed a Fostex HP-A4BL D/A converter, which retails for $600 and is widely available at a street price under $500.

A fast-roll-off linear-phase filter was employed to minimize aliasing distortion. Headphones were the Sennheiser HD 650. Tests were carried out in an anechoic chamber.

Applying the binomial testthe common p<0.05 assessment (footnote 2)all but one of the ABX tests yielded decisive positive results, with p values well below 0.05. The most successful tests were those with loudspeakers and the Gaussian impulse; in those experiments, p was less than 0.0001 in all comparisons. The results of the MUSHRA study, evaluated using a two-way ANOVA analysis, were more tenuous but still supported a conclusion that "there is a possibility of a discrimination between Hi-Res and non-Hi-Res audio data."

In a brief email, Fukuda, the corresponding author on the paper, told me that next they intend to study whether people can distinguish among bit depths of 16, 24, and 32 bits at a 48kHz sampling frequency.

Beyond that, it would be good to see the test repeated with older listenerswith, presumably, less acute high-frequency hearingand with a variety of loudspeakers, to determine which loudspeaker characteristics are most important for hearing such differences; the implications could be very important for loudspeaker design. Next, experiments could be carried out in a regular room, to determine whether such differences are audible in a domestic environment or whether the anechoic chamber is essential. I'd love to see the results of interviews with the test subjects, to hear their subjective listening impressions: What, precisely, did they hear from the high-rez? What are the subjective characteristics of high-rez test signals, relative to their lower-rez counterparts?

Finally, subjects trained using these test signals could then be exposed to carefully chosen music samplesmaybe music with wood block or other percussion, resembling those Gaussian impulse signals but with more going onand then gradually moving on to other kinds of music. Only then will we be in a position to identify the subjective characteristics of high-resolution music in a way that should satisfy objectivists.

Footnote 1: See John Atkinson's discussion of the Reiss analysis and the subsequent comments.

Footnote 2: A p<0.05 means that the possibility of the result occurring by chance is less than 5%.