The author demonstrating stereo microphone techniques at an English audio show in 1981.

For most people the terms hi-fi and stereo are synonymous, and yet it is clear that there is still a great deal of confusion over what the word "stereo" actually means. There isn't even a consensus of opinion amongst producers of records, designers of hi-fi equipment, audio critics and music lovers as to the purpose of stereo, and considering that the arguments show no sign of diminishing in intensity, it is instructive to realise that 1981 sees both the 100th anniversary of Clement Ader's first stereo experiments and the 50th anniversary of Alan Blumlein's classic patent on stereo.

Ader placed telephone microphones in two groups, left and right, on the stage of the Paris Opera and subscribers listened on headphones to the twin signal which was transmitted over telephone lines. Blumlein's work was still experimental, but more theoretical in that it examined exactly what directional information needs to be preserved on a two-channel system in order that an accurate aural picture can be recreated using two loudspeakers.

Sound-Source Location

Before wading deeper into the morass of conflicting opinion, it is worth a look at how a human being perceives the direction of real-life sound-sources. Although the eyes undoubtedly play a major role in determining such directions, the ears provide an essential and evolutionary desirable backup. Any caveman out of his cave on a dark night, and incapable of hearing where that quiet lip-smacking noise (curiously like that made by a hungry sabre-toothed tiger) was coming from, wouldn't stand much chance of passing his genes on to future generations. And so the sabre-toothed tiger, having encouraged the existence of a human hearing mechanism to determine direction, could pass happily into extinction, its destiny fulfilled.

When the wavefront emitted by a sound-source, such as our extinct tiger, reaches the head, it is obvious that unless that source lies in the plane bisecting the head at right-angles to the ear axis, it will reach one ear before it reaches the other. The further away from the median plane the sound-source, the greater the interaural delay, until it reaches a maximum of around 0.7 ms, the time taken for sound to traverse the ear-ear distancewhen the source is to one side along the ear-ear axis (fig.1). For transient-type signals the brain probably acts directly on this time delay to derive the directional information, but for a continuous waveform with a frequency below approximately 700Hz, for which the ear-ear distance represents a half-wavelength, the brain interprets the time delay as a phase difference between the signals picked up by the two ears (fig.2) and correlates this phase difference with direction. For higher frequencies, however, it can be seen (fig.3) that there is more than one direction which will appear to give the same interaural phase difference, and thus the detection of direction by phase correlation becomes ambiguous.

Above this critical frequency, fortunately, another mechanism starts to take over: the head increasingly casts an acoustic "shadow" when its size becomes of the order of, or larger, than the wavelength of the sound. The presence of this shadow means that an amplitude difference is introduced between the sounds perceived by the ears, enabling the brain to deduce the sound-source direction from the ratio of the two amplitudes. The pinnae further modify this amplitude difference with frequency, giving a direction-dependent spectral change which "sharpens up" the mechanism.

Obviously the amplitude and phase mechanisms will overlap over a range of frequencies dependent on head size, and reinforce each other until the frequency is such that the phase difference becomes totally ambiguous, apparently at around 1.2kHz for an average head. Above this frequency one has to rely on the amplitude mechanism alone for steady-state sounds, which has been shown to be less precise. However, if a transient occurs in an otherwise continuous high frequency waveform, then this is equivalent to dropping in an audio "marker" to give the brain some additional time delay information. The tiger treads on a stick and our caveman immediately has an unambiguous clue as to the tiger's direction, and lives to pass on the relevant hearing mechanism to his descendants. Without the transients, the brain has to try somehow to reinforce the weak amplitude difference clues and in fact the head is in constant slight motion, its side-to-side scanning enabling the brain to superimpose information about the rate-of-change of amplitude differences upon those same differences.

When the sound-source lies exactly on the median planethe vertical central plane between the earsall the primary mechanisms mentioned cause the brain to come to the same conclusion, ie, that the sound-source is dead central. Whether it is above or below, in front or behind, is somewhat harder to resolve, and the brain has to interpret the spectral information from the pinnae and secondary clues such as reverberation to determine this aspect. The brain is relatively good at "in front or behind?" decisions (provided the source isn't exactly on the median plane), but apparently no good at "above or below?" This isn't particularly important, as one doesn't normally depend on hearing alone, the eyes being the main source of such information.

Amplitude Stereo

The genius of Alan Blumlein lay in his recognition that if the interaural phase differences are reproduced as amplitude differences between the signals fed to two loudspeakers, this alone is sufficient to define direction completely, provided the listener is equidistant from the two loudspeakers. If the listener is not equidistant the resulting additional time delay give conflicting information, with confusing and ambiguous results. John Crabbe covered the subject of off-centre stereo listening in great detail in his series of "Broadening the Stereo Seat" articles (HFN/RR, June/July/September 1979), and to avoid unnecessary complexity the use of the word "stereo" throughout this article will imply "central listener" exclusively.

So, to precis Blumlein, for a central listener the perceived position of any sound-source can be represented by a precise ratio of the voltages fed to the two (identical) loudspeakers. If the voltages are equal, then we have the "double-mono" situation where the sound should appear to come from a point halfway between the two speakers. As a ratio is a dimensionless entity, the image produced by any such voltage-ratio should not occupy any space, but should be perceived as a point-source situated somewhere on the line joining the acoustic centres of the two speakers. Ideally, ignoring room effects, there would be no reason for the position of this point, or its lack of width, to change with frequency. As long as the program has been recorded in such a fashion that positions are faithfully represented by inter-channel voltage ratiosand there lies the ruba central listener will perceive discrete images correctly positioned all the way along the line (actually an arc centred on the listener) joining the speakers.

Apart from a small percentage of human beings who can't be fooled by Blumlein's "amplitude for phase-differences" trick, two information channels can completely define a lateral stage, the second dimensionimage depthbeing provided by recorded reverberation, the brain automatically interpreting the presence of reverberation as evidence that a sound-source is further away. Whether this depth is subjectively convincing depends totally on the relationship between the recorded reverberation and the primary lateral images. Only a wavefront-sampling mike technique will preserve that relationship accurately, but more on that subject later.

Note the use of the word "completely." For a central listener, one has an absolute yardstick for assessing the quality of stereo imaging, without any reference to musical debate, the "real thing," direct/reverberant ratios, concert hall acoustics, the subjective experience, emotion quotient, or any other philosophical red herrings. Once we have our two information channels, as long as there is no crosstalk between channelswhich will modify the voltage ratiosand provided the loudspeakers and their interactions with the listening room don't introduce any "widening" or "smearing" of the point images produced, then the sum of all those point images will form a continuum which accurately represents the recorded stereo image. As long as the narrow central image produced by a "double-mono" signal remains narrow and central at all frequencies, then the system must be inherently accurate as far as stereo is concerned. Any deficiencies then heard can only be related to the program. Likewise, philosophical discussions can then only apply to the manner in which the program was reduced to the two information channels, and the relationship of that program with the original live event, and not to the loudspeakers themselves.

Imaging Accuracy

Take, for instance, the argument put forward by Julian Hirsch in the October 1979 issue of Stereo Review. While agreeing that if a sound originates from a certain direction in space, then an ideal stereo recording would reserve that direction, he writes: "I do not experience this sort of definite localisation of sound when I attend a concert...I can usually tell if the source is at the right or left of the stage, or perhaps in the centre...Even when I have spotted the soloist visually, closing my eyes blurs his physical relationship to the rest of the orchestra."

Many writers have commented on this imaging problem in the concert hall, and although the degree of uncertainty varies according to the listener, it is nevertheless a real attribute of live sound. But to develop from this observation an argument that the ability of a loudspeaker to reproduce the point images discussed earlier is unnecessary, is spurious. For instance, to quote Hirsch again: "Often when I receive speakers for testing, the manufacturer emphasises the stereo-imaging qualities of his product...I cannot comment on these qualitiesin most cases because I do not find their presence or absence to have much to do with how 'good' I find the speaker's sound to be. It is very easy to hear differences between speakers and many of them could probably be described as 'stereo-imaging' qualities. It is not easy to decide which, if any, of these qualities is the most accurate or realistic" (my italics). Hirsch goes on from there in another article (Stereo Review April 1980) to conclude that sonic imaging can only be a matter of individual preference.