Participants

Sixteen young adults aged between 21 and 34 years old (M=26.2±4.7, 8 females) and 16 older adults aged between 65 and 75 years old (M=70.4±3.5, 9 females) participated in the study. All participants gave written informed consent. The study was approved by the University of Toronto and Baycrest Hospital Human Subject Review Committee. All participants were native English speakers and right-handed. Pure-tone hearing levels for both groups of participants are shown in Fig. 1a. Data from all participants entered analyses.

Stimuli and task

The stimuli were four naturally produced English consonant-vowel syllables (/ba/, /ma/, /da/ and /ta/), spoken by a female talker (standardized UCLA version of the Nonsense Syllable Test39). Each syllable token was 500-ms in duration and matched in terms of average root-mean-square sound pressure level. The vowel was always /a/ (as in father) because its formant structure provides a superior SNR relative to the MRI scanner spectrum. The four phonemes were chosen for their balanced features on place of articulation (labial /b/ and /m/ versus alveolar /d/ and /t/). A 500-ms white-noise segment (4-kHz low-pass cutoff, 10-ms rise-decay envelope) starting and ending simultaneously with the syllables was used as the masker. Sounds were presented via circumaural MRI-compatible headphones (HP SI01, MR Confon, Magdeburg, Germany), acoustically padded to suppress scanner noise by 25 dB. The intensity level of syllables was fixed at 85 dB, the noise level was adjusted to 97, 94, 91, 87, 77 or 0 dB, leading to five levels of SNR (−12, −9, −6, −2 and 8 dB) and the NoNoise condition. SNR was thus inversely related to the overall sound level. The SNR levels were chosen on the basis of a pilot behavioural study with five young adults, which revealed a quasi-linear relationship with accuracy for all four syllables.

Before scanning, syllables were presented individually without noise (four trials per syllable), and participants identified the syllables by pressing one of four keys on a parallel four-button pad using their right hand fingers (index to little fingers in response to /ba/, /da/, /ma/ and /ta/ sequentially) with an accuracy of 94% or better. During scanning, 80 noise-masked syllables (four trials per syllable per SNR) and 20 syllables alone (five trials per syllable) were randomly presented in each block with an average inter-stimuli-interval of 4 s (2–6 s, 0.5 s step), and five blocks were given in total. Participants were asked to listen carefully and identify syllables as fast as possible by pressing corresponding keys on a parallel four-button pad using their right fingers as trained outside the scanner. No counterbalance on finger-syllable associations among participants was applied.

Behavioural data analysis

Both the percentage of trials correctly identified and RT (using both correct and incorrect trials) were computed for each syllable at each noise condition. To exclude the influence of restricted range of the percent accuracy at 0 to 100%, the statistics was applied to the percent accuracy after arcsine transformation22

where, y is the arcsine transformed accuracy, x is the percent accuracy.

Arcsine-transformed accuracy and RT across syllables were then subjected to a mixed ANOVA with age as the between-subject factor and SNR as the within-subject factor separately. Older adults’ overall accuracies and mean pure-tone thresholds were additionally subjected to a Pearson’s correlation to reveal the relationship between peripheral hearing level and performance.

MRI acquisition and data pre-processing

Participants were scanned using a Siemens Trio 3T magnet with a standard 12-channel ‘matrix’ head coil. T2*-weighted functional images were collected with a continuous echo-planar imaging sequence (30 slices, matrix size=64 × 64, 5-mm thick, TR=2,000 ms, TE=30 ms, flip angle=70°, FOV=200 mm, voxel size=3.125 × 3.125 × 5 mm). High-resolution T1-weighted anatomical images were acquired after three functional runs using SPGR (axial orientation, 160 slices, 1-mm thick, TR=2,000 ms, TE=2.6 ms, FOV=256 mm).

The fMRI data were pre-processed using Analysis of Functional Neuroimages software (AFNI 2011 (ref. 40). In the pre-processing stage, fMRI data were spatially co-registered to correct for head motion using a 3D Fourier transform interpolation. For each run, images acquired at each point in the time-series were aligned volumetrically to a reference image acquired during the scanning session using the 3dvolreg plugin in AFNI. The pre-processed images were then concatenated and analysed by univariate General Linear Model (GLM) and MVPA.

GLM analysis

Single-subject multiple-regression modelling was performed using the AFNI program 3dDeconvolve. Data were fit with different regressors for the four syllables and six SNRs. The predicted activation time course was modelled as a ‘gamma’ function convolved with the canonical hemodynamic response function. For each noise level, the four syllables were grouped and contrasted against the baseline (silent inter-trial intervals), as the GLM revealed similar activity across syllables. Individual contrast maps were normalized to Talairach stereotaxic space, re-sampled (voxel size=3 × 3 × 3 mm), and spatially smoothed using a Gaussian filter (FWHM=6.0 mm).

Individual maps at each noise level were then subjected to separate mixed ANOVAs with age as the between-subject factor to test the random effects for each group as well as the age difference in BOLD activity at each SNR. Since the accuracy at −6 (75.4±2.8%) and −2 dB (88.4±2.3%) SNRs in young adults equalled the accuracy at −2 (75.2±3.1%) and 8 dB (87.6±2.4%) SNRs in older adults, respectively, the mean activity at −6 and −2 dB SNRs in young adults and the mean activity at −2 and 8 dB SNRs in older adults were subjected to an additional mixed ANOVA to reveal the age difference on BOLD activity under equal performance. To correct for multiple comparisons, a cluster spatial extent threshold was applied by using AlphaSim with 1000 Monte Carlo simulations and contrast-specific smoothness of residual errors. This procedure yielded a PFWE<0.01 by using an uncorrected P<0.001 and removing clusters<15 voxels for activity at the NoNoise condition in both groups. For group difference maps, this yielded a PFWE<0.01, with an uncorrected P<0.01, and cluster size≥16 voxels for the NoNoise condition and 35 voxels for the equal performance condition. To display statistics at the group level, the statistic of interest was projected onto a cortical inflated surface template using surface mapping with AFNI (SUMA).

Four 8-mm radius spherical ROIs in the left POP (−50, 14, 18), left preCG/postCG (−43, −16, 45), left STG/MTG (−51, −20, −6) and right STG/MTG (50, −14, −4) were centred at the peak voxels as showing significant age difference in activity under equal performance (PFWE<0.01). The preCG/postCG ROI occupied a part of both areas, so as the STG/MTG ROI. To reveal the relationship between activity in those ROIs and performance under noise masking conditions, individuals’ mean activities across −12 to 8 dB SNRs in each ROI and mean accuracies across syllables and SNRs (−12 to 8 dB) were subjected to a Pearson’s correlation for each group separately. Multiple comparisons were corrected with a FDR q=0.05 using Benjamini–Hochberg41 procedure. For each ROI, the correlation coefficients from two groups were also converted into z-scores using Fisher’s r-to-z transformation42 and compared using the formula22 as follows:

where z 1 and z 2 are the Fisher’s z-scores of each group’s correlation, n is the sample size of each group. This test gave a z-value that had a statistical signification indicating whether the difference between two correlation coefficients was significant.

MVPA

Given the likelihood of high inter-subject anatomical variability and fine spatial scale of phoneme representations, we trained pattern classifiers to discriminate neural patterns associated with different phonemes and then tested these classifiers on independent test trials within anatomically defined ROIs. To do so, we first used the AFNI program 3dLSS (Least Square Sum regression43) to estimate univariate trial-wise β-coefficients for all brain voxels from the concatenated data.

We then used Freesurfer’s (version 5.3 (ref. 44) automatic anatomical labelling (‘aparc2009’ (ref. 45) algorithm to define a set of 148 cortical and subcortical ROIs using individual’s high-resolution anatomical scan. For each noise level, MVPA was carried out within each anatomical ROI using shrinkage discriminant analysis23 as implemented in the R package ‘sda.’ Shrinkage discriminant analysis is a form of linear discriminant analysis that estimates shrinkage parameters for the variance-covariance matrix of the data, making it suitable for high-dimensional classification problems. To evaluate classifier performance, we used five-fold cross-validation where each fold of data consisted of the β-regression weights of four of the five runs, with one run held out for testing. The shrinkage discriminant classifier produces both a categorical prediction (that is, the label of the test case) as well as a continuous probabilistic output (the posterior probability that the test case is of label x). The continuous outputs were used to compute the AUC metrics, and the AUC scores were used as an index of classification performance because they are robust to class imbalances and are better able to incorporate the relationship between probabilistic classifier output and discrete category membership. Because the experiment had four phoneme categories, we used a multiclass AUC measure that was computed as the average of all the pairwise two-class AUC scores.

We then limited the statistical analyses in ROIs known to be sensitive to tasks involving the production and perception of speech. We used Neurosynth46 to create a meta-analytic mask using the search term ‘speech.’ This resulted in a coordinate-based activation mask constructed from 424 studies and encompassing the language-related areas in the temporal and frontal lobes. We intersected this meta-analytic mask with the Freesurfer aparc 2009 ROI mask as defined in MNI space. If any of the intersected ROIs had≥10 voxels, we included that ROI in our analyses. To ensure hemispheric symmetry, if a left hemisphere ROI was included so as its right hemisphere homologue. This resulted in an ROI mask consisting of 38 ROIs (19 left and 19 right, Fig. 3).

Because MVPA was performed in anatomically defined ROIs specific to each participant, no spatial normalization was applied. Since the AUC score did not differ with phonemes in selected ROIs (POp, HG, STG and PT in the left hemisphere, F 3,45 <2.46, P<0.075, repeated-measures ANOVAs), significance of classification at the group level in each of the 38 ROIs at each noise level was evaluated by a one-sample t-test on individuals’ phoneme-averaged AUC scores, where the null hypothesis assumed a theoretical chance AUC of 0.5. The effect size was estimated using Cohen’s d47. Multiple comparisons were corrected with a FDR q=0.05 using Benjamini–Hochberg41 procedure. The AUC scores were also subjected to a mixed ANOVA with ROI as the within-subject factor, group as the between-subject factor and SNR as the covariate to evaluate the group difference in classification. To display statistics at the group level, the statistic of interest was projected on the parcellated (aparc 2009 (ref. 45) cortical inflated map associated with the Freesurfer average template (‘fsaverage’) using SUMA.

MVPA was performed within anatomical ROIs rather than a moving ‘searchlight’48 because we wished to preserve borders between spatially adjacent areas (for example, IFG and STG) that exhibited differential phoneme specificity at noisy conditions14. It would also improve classification sensitivity for certain regions (for example, STG) that showed distributed phonological representations49. For the left preCG, regional MVPA may not be optimal to disentangle speech- and response-related activities in articulatory and hand areas of left premotor/motor cortex, respectively. Although the classifiers were trained to discriminate speech-related rather than response-related activities, the classification may capture the button/finger decoding in addition to the phoneme category decoding in the left preCG. Indeed, the classification on responses using all the incorrect trials across SNRs was significant in the left preCG (t 15 =3.5, P=0.003, one-sample t-test, Supplementary Fig. 1), suggesting reliable button/finger decoding in the left preCG. Also, the classification performance on stimuli and/or responses using all the correct trials was higher than the classification on responses using all the incorrect trials in the left preCG, although the difference was not significant (t 15 =1.447, P=0.168, paired t-test). This supports the possibility of button/finger decoding component in stimulus-based classification in the left preCG. Note that we do not emphasize the classification performance in the left preCG in our study, and the contamination of button/finger decoding on phoneme classification was not found in other regions. For instance, in the hand-control area (right preCG), adjacent somatosensory cortex (left postCG) and four core regions of the sensorimotor integration model (left POp, HG, STG and PT), the classification on stimuli using correct trials was significant (all t 15 >3, P<0.01), but the classification on stimuli or responses using incorrect trials was not significant (all t 15 <1, P>0.1).

To reveal how sensorimotor integration as a function of noise differed with age, AUC scores in frontal POp and three auditory ROIs (HG, STG and PT) in the left hemisphere, core regions of the sensorimotor mapping model (Fig. 5a), were tested by mixed ANOVAs with ROI (2 levels: POp and one of the auditory ROIs) and SNR as the within-subject factors and group as the between-subject factor. AUC difference scores between pairwise ROIs were further subjected to mixed ANOVAs with SNR as the within-subject factor and group as the between-subject factor to evaluate the group difference in sensorimotor mapping function. This was followed by one-way (SNR) repeated-measures ANOVAs and one-sample t-tests to reveal the pattern of sensorimotor integration function for each group separately.

To determine whether differences in phoneme classification between regions or between age groups were related to differences in BOLD activity, the mean activities across syllables in two critical anatomical ROIs (left POp and left STG) were extracted for each noise level and each group. A mixed ANOVA with ROI and SNR as the within-subject factors and group as the between-subject factor was used to test the main effects and interactions.

Finally, the relationships between activity in the left POp spherical ROI (−50, 14, 18; 8-mm radius, defined as showing age-related upregulation of activity with age-equivalent performance), phoneme specificity in four core regions (the left POp, HG, STG and PT) and behavioural accuracy were investigated to unravel the nature of age-related frontal upregulation. For each group, individuals’ mean AUC scores across −12 and 8 dB SNRs in each of the four ROIs were correlated with mean activities across those SNRs in the left POp spherical ROI and the mean behavioural accuracies across those SNRs by Pearson’s correlations followed by FDR correction41. For each ROI, the correlation coefficients from two groups after Fisher’s r-to-z transformation were also compared and corrected for FDR41.

Data availability

Data that support the findings of this study are available from the corresponding author on request.