7.3.1.1. Hemispheric Differences

It was apparent from the earliest PET studies of speech perception that when subjects heard speech relative to a baseline of silence, activity was almost equally distributed between the left and right superior temporal gyri (e.g., see Petersen et al., 1988; Wise et al., 1991). This absence of asymmetry indicated that a crude subtractive methodology was not going to replicate the asymmetry evident from clinical observations, namely that impaired comprehension after aphasic stroke was a consequence of left hemisphere lesions, particularly those centered around the lateral (Sylvian) sulcus (Caplan, 1987). Yet if the early PET studies did not demonstrate anything “special” about the response of the left superior temporal gyrus (STG) (which is the location of unimodal primary and association auditory cortex) to heard words, this was hardly surprising. Spoken language is the most complex sound that we routinely encounter, and over the range of spectral and temporal detail conveyed by speech we can detect phonemes, syllables, stress, and variations in amplitude and pitch. These convey verbal information, in the form of phonetic cues and features, obviously, but also non-linguistic information that both supports comprehension of the verbal message and allows the listener to deduce the affect, sex, age, and individual identity of the speaker. Further, the categorical perception of a sequence of sounds as a word, irrespective of whether the “perceptual unit” is at the level of phonemes or syllables, is remarkably robust, and we can tolerate considerable distortions to speech before it becomes totally incomprehensible. This redundancy in the speech signal suggests that many separate cues and features are processed in parallel, and perception and comprehension is further assisted by top-down processing; we hear next what we expect to hear, given the sense of what has gone before. Moreover, the evidence from neurological cases suggests that although pure word deafness is often only observed after bilateral superior temporal lesions, left-sided lesions alone can result in impaired speech perception, and this impairment does not occur after purely right-sided lesions (Griffiths et al., 1999).

Therefore, the hypothesis was that a refined study design would show with functional imaging that left hemisphere activity predominated over right during speech perception and comprehension. As Scott and Johnsrude (2003) emphasized, the selection of the baseline condition is critical. There have been a series of PET studies which used a variety of non-linguistic acoustic stimuli as the baseline condition; for example, pure tones (Demonet et al., 1992), signal correlated noise – the time-amplitude envelopes of speech filled with white noise, resulting in some temporal but no spectral information – (e.g., Zatorre et al., 1992; Mummery et al., 1999), reversed speech signal (speech played backwards) (Crinion et al., 2003), and spectrally rotated speech (Scott et al., 2000) . The advantages and disadvantages of these baseline stimuli have been reviewed in Scott and Wise (2004). One problem with non-linguistic baseline stimuli is that even when they match speech closely in terms of acoustic complexity, they invariably distort or abolish affective prosody and information about the speaker. Therefore, a contrast of speech against one of these baseline stimuli will include responses to both verbal and non-verbal information carried by the speech signal. The review also discusses the use of unfamiliar foreign languages, which, while they might appear to be the best unintelligible baseline to contrast with intelligible native speech, as they will include prosodic and speaker information, nevertheless they also include the confound of unfamiliar phonemes and different rules for combining phonemes; for example, the Japanese word structure is strictly CVCV, whereas English allows a CCCVCCC structure. What influence these confounds will have on observed activity in a functional imaging study is largely unknown. Given such considerations, one can see, why fMRI might compound such subtle problems with its added noise.

Nevertheless, left lateralization of signal in response to speech perception and comprehension has been increasingly observed. One of the first PET studies that demonstrated clear lateralization used a combination of intelligible and unintelligible sentences in a 2 µ 2 factorial design (Scott et al., 2000). Sentences presented as clear speech were acoustically matched with the same sentences after spectral rotation (inversion) to render them unintelligible. A further set of sentences was distorted by a technique known as noise-vocoding (Shannon et al., 1995), whereby temporal information is largely preserved but the spectral information is reduced to a few broad frequency bands (six in this study). Perceptually, this distorted speech, which simulates the acoustic information reaching the auditory nerve after a cochlear implant, sounds like a harsh whisper, and it is intelligible after a brief period of familiarization. The “matched” baseline stimulus for the noise-vocoded sentences were made by spectral inversion. The data demonstrated that the left STG responded equally to speech, rotated speech and noise-vocoded speech relative to rotated noise-vocoded speech. This was interpreted as a response to phonetic cues and features, present in both versions of intelligible sentences and also present in the unintelligible rotated speech, but not in the rotated noise-vocoded sentences. Intelligibility, confined to the clear and noise-vocoded speech, activated a left anterior region, centered on the superior temporal sulcus. The main response of the right temporal lobe across contrasts was to clear speech and its spectrally rotated version, stimuli that contained a strong sense of pitch and intonation. Therefore, this study demonstrated a left–right asymmetry in the responses to speech and stimuli that were derived from speech. It also demonstrated a rostral–caudal asymmetry, with intelligibility activating the anterolateral left temporal cortex.