Abstract Autism Spectrum Conditions (ASC) are characterized by heterogeneous impairments of social reciprocity and sensory processing. Voices, similar to faces, convey socially relevant information. Whether voice processing is selectively impaired remains undetermined. This study involved recording mismatch negativity (MMN) while presenting emotionally spoken syllables dada and acoustically matched nonvocal sounds to 20 subjects with ASC and 20 healthy matched controls. The people with ASC exhibited no MMN response to emotional syllables and reduced MMN to nonvocal sounds, indicating general impairments of affective voice and acoustic discrimination. Weaker angry MMN amplitudes were associated with more autistic traits. Receiver operator characteristic analysis revealed that angry MMN amplitudes yielded a value of 0.88 (p<.001). The results suggest that people with ASC may process emotional voices in an atypical fashion already at the automatic stage. This processing abnormality can facilitate diagnosing ASC and enable social deficits in people with ASC to be predicted.

Citation: Fan Y-T, Cheng Y (2014) Atypical Mismatch Negativity in Response to Emotional Voices in People with Autism Spectrum Conditions. PLoS ONE 9(7): e102471. https://doi.org/10.1371/journal.pone.0102471 Editor: Piia Susanna Astikainen, University of Jyväskylä, Finland Received: March 24, 2014; Accepted: June 19, 2014; Published: July 18, 2014 Copyright: © 2014 Fan, Cheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files. Funding: The study was funded by the Ministry of Science and Technology (MOST 103-2401-H-010-003-MY3), National Yang-Ming University Hospital (RD2014-003), Health Department of Taipei City Government (10301-62-009), and Ministry of Education (Aim for the Top University Plan). The funders had no role in the study design, data collection and analyses, decision to publish, or preparation of the manuscript. Competing interests: Yawei Cheng is an Editorial Board member of PLOS ONE; this does not alter the author's adherence to PLOS ONE Editorial policies and criteria.

Introduction In Autism Spectrum Conditions (ASC), abnormalities in social skills usually coexist with atypical sensory processing and aberrant attention. Social deficits are characterized by difficulty in understanding others' mental status, including the recognition of emotional expressions through voices [1], [2]. Sensory dysfunction includes abnormalities in auditory processing, indicative of hyposensitivity or hypersensitivity to sounds [3], [4]. Aberrant attention typically shifts orientation from social to nonsocial stimuli [5]. To comprehensively understand the pathophysiology of autism, determining whether voice processing is selectively impaired in people diagnosed with ASC and whether this impairment is associated with sensory dysfunction and attention abnormalities is necessary. Previous studies have suggested that ASC causes difficulty in encoding and representing the sensory features of physically complex stimuli [6]. Such a deficit causes people with autism to have a disadvantage when processing social information, because affective facial and vocal expressions are multifaceted. However, ASC does not cause certain types of complex auditory inputs, such as music, loudness, and pitch discrimination, to be misperceived [7], [8], [9]. Furthermore, people with ASC are considered to exhibit a fragmented mental representation and lack causative association because of slow voluntary attention shifting [10], [11]. A highly dynamic and interactive social realm should be highly susceptible to such impairments. However, studies on social-stimulus-specific deficits resulted from ASC have not distinguished sensory from attention processes nor have they evaluated the effects of physical stimulus complexity on their brain responses [5], [12]. Voice communication, a part of social interaction, is critical for survival [13], [14]. During the first few weeks following birth, infants can recognize the intonational characteristics of the languages spoken by their mothers [15], [16]. Typically developing infants can discriminate affective prosodies at 5 months of age [17] and react to affective components in vocal tones by 6 months of age [18]. However, young children with ASC do not show a preference for their mother's voice to other auditory stimuli [12], [19]. Adults with ASC exhibit difficulty in extracting mental state inferences from voices [1] and prosodies [20]. In a study of adults with ASC, the superior temporal sulcus, a voice-selective region, failed to activate in response to vocal sounds; however, the adults exhibited a normal activation pattern in response to nonvocal sounds [21]. Neurophysiological processing of emotional voices is atypical among people with ASC [22], [23]. Regarding superior temporal resolution, electroencephalographic event-related brain potentials (ERPs) enable the distinct stages of sensory and attentional processing to be examined. Mismatch negativity (MMN), which is elicited by perceptibly distinct sounds (deviants) in a sequence of repetitive sounds (standards), can be used to investigate the neural representation underlying automatic central auditory perception [24], [25]. Compared with standard stimuli, deviant stimuli evoke a more pronounced response at 100 to 250 ms and maximal amplitudes elicited over frontocentral regions [24]. The amplitude and latency of MMN indicate how effectively sound changes are discriminated from auditory background [26], [27], [28]. Recent studies have reported that MMN can be used as an index of the salience of emotional voice processing [29], [30], [31], [32]. Previous MMN findings regarding ASC are mixed [33]. When children with ASC were exposed to pitch changes in previous studies, the MMN responses were early peak latencies, [34], strong amplitudes [35], weak amplitudes [36], and no abnormality [11], [37], [38]. MMN was preserved when children with ASC attended to stimuli, but decreased in unattending conditions [39]. When presented with frequency deviants in streams of synthesized vowels, children with high-functioning ASC yielded MMN amplitudes compatible with those of controls [10]. MMN was preserved in response to nonspeech sounds, but diminished in response to speech syllables [19]. When elicited by one-word utterances, MMN in response to the neutral syllable as the standard, compared with the commanding, sad, and scornful deviants, was diminished in adults with Asperger's syndrome [23], whereas MMN elicited by commanding relative to tender voices in boys with Asperger's syndrome yielded the opposite result [22]. These discrepant findings may be related to population characteristics, stimulus features, and task designs. In particular, the corresponding acoustic parameters have not been controlled to a degree. P3a that follows MMN is an ERP index of attentional orienting [40]. If deviants are perceptually salient, then an involuntary attention switch is generated to elicit P3a responses [10]. In a previous study, people with ASC exhibited P3a amplitudes similar to those of people with mental retardation and controls when inattentively listening to pure tones [34], [35]. Children with ASC exhibited P3a comparable to nonspeech sounds [41], but diminished responses to speech sounds [10], [11], [42]. Impaired attention orienting to speech-sound changes might affect social communication [10]. ASC cause speech-specific deficits in involuntary attention switching as well as normal orienting to nonspeech sounds. To quantitatively control physical stimulus complexity, we presented meaningless emotionally spoken syllables, dada, and acoustically matched nonvocal sounds, representing the most and least complex stimuli, respectively, in a passive oddball paradigm, to people with ASC and matched controls. We hypothesized that people with ASC produce impaired MMN responses to emotional syllables and nonvocal sounds when general deficits in auditory processing are present. When the deficits are selective for voices, emotional syllables rather than nonvocal sounds diminish MMN responses among people with ASC. When involuntary attention orienting among people with ASC is speech-sound specific, P3a relevant to emotional syllables rather than nonvocal sounds would becomes atypical. In addition, to examine the relationship between electrophysiological responses and autistic traits, we conducted correlation analyses to determine the extent to which emotional MMN covaried with the Autism Spectrum Quotient (AQ) and receiver operating characteristic (ROC) analyses to evaluate the diagnostic utility of emotional MMN.

Materials and Methods Participants 22 people with ASC and 21 matched controls participated in this study. Because of poor electroencephalogram (EEG) qualities, such as excessive eye movements and blink artifacts, 20 people with ASC and 20 controls were included in the data analysis. The participants with ASC, aged between 18 and 29 years (21.5±3.8 y, one female participant), were recruited from a community autism program. We reconfirmed the diagnosis of Asperger's syndrome and high-functioning autism by using Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV diagnostic criteria as well as the Autism Diagnostic Interview-Revised (ADI-R) [43]. The participants in the age-, gender-, intelligence quotient (IQ)-, and handedness-matched control group, aged between 18 and 29 years (22.0±3.7 y, one female participant), were recruited from the local community and screened for major psychiatric illness by conducting structured interviews. The participants did not participate in any intervention or drug programs during the experimental period. Participants with a comorbid psychiatric or medical condition, history of head injury, or genetic disorder associated with autism were excluded. All of the participants exhibited normal peripheral hearing bilaterally (pure tone average thresholds <15 dB HL) at the time of testing. All of the participants or parents of the participants provided written informed consent for this study, which was approved by the Ethics Committee of Yang-Ming University Hospital and conducted in accordance with the Declaration of Helsinki. Auditory Stimuli The stimulus materials were divided into two categories: emotional syllables and acoustically matched nonvocal sounds (Table S1 and Figure S1 in File S1). For emotional syllables, a female speaker from a performing arts school produced the meaningless syllables dada with three sets of emotional (neutral, angry, happy) prosodies. Within each set of emotional syllables, the speaker produced the syllables dada for more than ten times (see [29], [30], [31], [32] for validation). Emotional syllables were edited to become equally long (550 ms) and loud (min: 57 dB; max: 62 dB; mean 59 dB) using Sound Forge 9.0 and Cool Edit Pro 2.0. Each set was rated for emotionality on a 5-point Likert-scale. Two emotional syllables that were consistently identified as ‘extremely angry’ ad ‘extremely happy’ and one neutral syllables rated as the most emotionless were selected as the stimuli. The Likert-scale (mean ± SD) of angry, happy, and neutral syllables were 4.26±0.85, 4.04±0.91, and 2.47±0.87, respectively. To create a set of control stimuli that retained acoustic correspondence, we synthesized nonvocal sounds by using Praat [44] and MATLAB (The MathWorks, Inc., Natick, MA, USA). The fundamental frequencies (f0) of emotional (angry, happy, neutral) syllables were extracted to produce the nonvocal sounds using a sine waveform and then multiplied by the syllable envelope. In this way, nonvocal sounds retained the temporal and spectral features of emotional syllables. All of the stimuli were controlled with respect to their length (550 ms) and loudness (min: 57 dB; max: 62 dB; mean 59 dB). Procedures Before the EEG recordings were performed, each participant completed a self-administered questionnaire, the AQ, used for assessing autistic traits [45]. During the EEG recordings, participants were required to watch a silent movie with Chinese subtitles while task-irrelevant emotional syllables or nonvocal sounds in oddball sequences were presented. The passive oddball paradigm for emotional syllables involved employing happy and angry syllables as deviants and neutral syllables as standards. The corresponding nonvocal sounds were applied in the same paradigm but were presented as separate blocks. Each stimulus category comprised two blocks, the order of which was counterbalanced and randomized among the participants. Each block consisted of 600 trials, of which 80% were neutral syllables or tones, 10% were angry syllables or tones, and the remaining 10% were happy syllables or tones. The sequences of blocks and stimuli were quasirandomized such that the blocks of an identical stimulus category and the deviant stimuli were not presented successively. The stimulus-onset asynchrony was 1200 ms, including a stimulus length of 550 ms and an interstimulus interval of 650 ms. Electroencephalography Apparatus and Recordings The EEG was continually recorded at 32 scalp sites. Please refer to Supplementary Materials (File S1) for details. The number of accepted standard and deviant trials between groups did not differ significantly irrespective of emotional syllables (ASC – Neutral: 750±149, Happy: 81±15, Angry: 83±11; Controls – Neutral: 746±112, Happy: 85±11, Angry: 83±13) or nonvocal sounds (745±189, 78±15, 76±17; 781±170, 78±11, 80±10). The paradigm was edited using MATLAB. Each event in the paradigm was associated with a digital code that was transmitted to the continual EEG, enabling offline segmentation and averages of selected EEG periods to be obtained for analysis. The ERPs were processed and analyzed using Neuroscan 4.3 (Compumedics Ltd., Australia). MMN source distributions were qualitatively explored using current source density (CSD) mapping (http://psychophysiology.cpmc.columbia.edu/software/CSDtoolbox/index.html). The CSD method, as a measure of the strength of extracellular current generators underlying the recorded EEG potentials [46], computes the surface Laplacian over the surface potentials implying the dipole sources oriented normal to local skull [31], [47]. Statistical Analysis The MMN and P3a amplitudes were analyzed as an average within a 100-ms time window surrounding the peak latency at the electrode sites, Fz, Cz, and Pz according to previous knowledge [31], [32], [48]. The MMN peak was defined as the highest negativity in the subtraction between the deviant and standard sound ERPs, during a period of 150 to 250 ms after sound onset. Only the standards before the deviants were included in the analysis. The P3a peak was defined as the highest positivity during a period of 300 to 450 ms. Statistical analyses were conducted, separately for each category (emotional syllables or nonvocal sounds), using a mixed ANOVA with deviant type (angry, happy), and electrode (Fz, Cz, or Pz) as the within-subject factors, and the group (ASC vs. control) as the between-subject factor with additional a priori group by deviant type ANOVA contrasts calculated within each electrode site [49]. The dependent variables were the mean amplitudes and peak latencies of the MMN and P3a components. Cohen's d was calculated to estimate the effect size (i.e., the standardized difference between means). Degrees of freedom were corrected using the Greenhouse-Geisser method. Bonferroni testing was conducted when preceded only by significant main effects. To determine whether electrophysiological responses were associated with the severity of autistic traits, we conducted Pearson correlation analyses between MMN amplitudes and AQ scores. To examine the degree to which the MMN and P3a amplitudes could be used to differentiate between the participants with ASC and the controls, we conducted ROC analyses, which can identify optimal thresholds in diagnostic decision making.

Discussion This study investigated whether people with ASC exhibit selective deficits during emotional voice processing. The results indicated that people with ASC failed to exhibit differentiation between angry MMN and happy MMN. By contrast, in response to acoustically matched nonvocal sounds, people with ASC differentiated angry-derived MMN from happy-derived MMN to a low degree. P3a specific to emotional voices was reduced in people with ASC, indicating atypically involuntary attention switching. The significant correlation between the MMN amplitudes elicited by angry syllables and the total scores on the AQ indicated that angry MMN amplitudes were associated with autistic traits. ROC analyses revealed that angry MMN amplitudes yielded an AUC value of 0.88 (p<.001) for diagnosing ASC. People with ASC failed to exhibit negativity bias in responses to emotional voices. In a previous study involving the same paradigm, we determined that negativity bias to affective voice emerges early in life [30]. Angry prosodies elicited a more negative-going ERP and stronger activation in the temporal voice area than did happy or neutral prosodies among infants [50]. Angry and fearful syllables evoked greater MMN than did happy or neutral syllables among adults and infants [30], [51]. A recent visual MMN study determined that an early difference occurred during 70 ms to 120 ms after stimulus onset for only fearful deviants under unattended conditions [52]. From an evolutionary perspective, threat-related emotion processing (e.g., anger and fear) is particularly strong and indicates independence of attention [53]. Negativity bias in affective processing occurs as early as evaluative categorization into valence classes does [54]. In this study, the stronger amplitudes observed in angry MMN compared with happy MMN among the controls were obscured among the people with ASC. The human voice not only contains speech information but can also carry a speaker's identity and emotional state [55]. One MMN study determined that the MMN amplitudes were higher in response to intensity change in vocal sounds than in response to intensity change in corresponding nonvocal sounds. Although vocal intensity deviants may call for sensory and attentional resources regardless of whether they are loud or soft, comparable resources are recruited for nonvocal intensity deviants only when they are loud [56]. Thus, emotional syllables are considered to be more complex than nonvocal sounds and beyond low-level acoustic features [29], [30], [31], [32]. Because emotional MMN, instead of corresponding nonvocal sounds, exhibited a correlation with autistic traits and a positive predictive value for ASC, we speculated that low-level sensory deficits cannot be ascribed completely to social impairments in people with ASC. In addition to lacking differentiation between angry and happy MMN, people with ASC exhibited reduced MMN in response to nonvocal sounds. The discrepancy between the results of this study and those of previous reports may be reflective of the heterogeneous characteristics of clinical participants, auditory stimuli, and task design [11], [34], [35]. For example, people with low-functioning autism might exhibit different MMN from those with high-functioning autism [35]. In one MMN study, basic acoustic features in the stimuli, specifically, emotional-neutral standards and emotional-laden deviants, were not controlled [23]. Furthermore, using one-word utterances or vowels as the auditory stimuli might cause variable familiarity or meaning, thus exerting potentially confounding effects on MMN responses [10], [22]. Involuntary attention orienting to emotional voices was atypical in people with ASC, as indicated by diminished P3a amplitudes to angry syllables. P3a is reflective of the involuntary capture of attention to salient environmental events [57]. In a previous study, vowels compared with corresponding nonvocal sounds, produced stronger P3a [10]. The attention-eliciting effect may be particularly pronounced when threat-related social information is involved [58]. We detected P3a for only emotional syllables, not for acoustically matched nonvocal sounds. Consistent with the results of previous studies [10], [59], [60], [61], our results indicated weaker P3a to emotional syllables among people with ASC compared with controls, suggesting that attention orienting in people with ASC is more selectively impaired to social stimuli than to physical stimuli. In consistent with previous MMN studies [31], [62], our explorative CSD analyses suggested that the major contribution to deviance-standard difference responses comes from the bilateral auditory cortex. Furthermore, a slight trend toward to posterior enhancement observed in ASC for angry and angry-derived deviants could possibly reflect an additional posterior temporal source. The posterior lateral non-primary auditory cortex could be sensitive to emotion voices as indicated by functional neuroimaging [63]. However, given the known inaccuracies with EEG source localization, there CSD findings needs to be confirmed with more accurate source approaches. ROC analyses revealed that the amplitudes of angry MMN yielded a sensitivity of 95% and a specificity of 50% for diagnosing ASC. Strong amplitudes of angry MMN were coupled with low total scores on the AQ when the ASC and control groups were combined. MMN changes can be reliably observed in people with autism [34], [64]. The AQ is a valuable instrument for rapidly determining where any given person is situated on the continuum from autism to normality [44]. AQ scores were determined to be associated with the ability to recognize mental state of others according to voices and eyes [65]. Thus, emotional MMM, particularly in response to angry syllables, is potentially useful as a neural marker for diagnosing autism. Two limitations of this study must be acknowledged. First, regarding sample homogeneity, the generalizability of the results may be limited because people with low-functioning autism were not included. Second, stimuli that lack a quantitatively controlled function related to physical stimulus complexity, for instance, pure tones spectrally matching the fundamental frequency envelope of emotional syllables [29], [30], [31], [32], may limit the selectivity of emotional MMN. This may not be the optimal design, and future studies in which people with severe autism are recruited and a larger sample size and stimuli with greater acoustic correspondence are included are warranted.

Conclusions This study revealed that ASC involves general impairments in affective voice discrimination as well as low-level acoustic distinction. In addition to reduced amplitudes of MMN in response to acoustically matched nonvocal sounds, people with ASC failed to differentiate between angry and happy syllables. Weak amplitudes of angry MMN were coupled with severe autistic traits. The ROC analysis revealed that the amplitude of angry MMN is suitable for predicting whether a person has a clinical diagnosis of ASC. The ability to determine the likelihood of an infant developing autism by using simple neurobiological measures would constitute a critical scientific breakthrough [66]. Considering the advantages of clinical population assessment [67] and the presence of emotional mismatch response in the human neonatal brain [30], future studies must examine the ability of emotional MMN to facilitate the early diagnosis of infants at risk for ASC.

Supporting Information File S1. Electroencephalography apparatus and recordings, Figure S1, and Tables S1–S3. Figure S1. Acoustic properties of stimulus materials. Table S1. Physical and acoustic properties for the stimuli. Table S2. Mean amplitudes and peak latencies of MMN to emotional syllables and nonvocal sounds within a time window of 150 to 250 ms at predefined electrodes in each group (Mean ± SEM). Table S3. Mean amplitudes of P3a to emotional syllables within a time window of 300 to 450 ms at predefined electrodes in each group (Mean ± SEM). https://doi.org/10.1371/journal.pone.0102471.s001 (DOC)

Acknowledgments The authors deeply thank the participants and their parents who were included in the study.

Author Contributions Conceived and designed the experiments: YC YTF. Performed the experiments: YTF. Analyzed the data: YTF YC. Contributed reagents/materials/analysis tools: YC. Contributed to the writing of the manuscript: YTF YC.