Subjects

Fifteen healthy right-handed university students (8 men and 7 women) participated in this study as unpaid volunteers. They earned academic credit for their participation. Their mean age was 22.8 years, ranging from 20 to 27 years. All had normal or corrected-to-normal vision and reported no history of neurological illness or drug abuse. Their right-handedness and right ocular dominance were confirmed using the Italian version of the Edinburgh Handedness Inventory, a laterality preference questionnaire. All experiments were conducted with the understanding and written consent of each participant. No participant was excluded for technical reasons. The experimental protocol was approved by the ethics committee of the University of Milano-Bicocca.

Stimuli and materials

The stimulus set consisted of 300 complex ecological scenes. The pictures were downloaded from Google Images (the examples reported in Fig 6 are custom-made and copy-right free). The two classes of stimuli (sound and non-sound) were matched for their size (350 × 350 pixels), luminance (41.92 cd/cm2), affective value and presence of animals or persons. Half of the images (150) evoked a strong auditory image (sound stimuli), whereas the other half were not linked to any particular sound (non-sound stimuli). The stimulus set was selected from a larger set of images by presenting them to a group of 20 judges (10 men and 10 women) and asking them to score whether they evoked an auditory association using a 3-point scale (with 2, 1 and 0 being strong, weak and absent auditory content, respectively).

Figure 6 Example images of stimuli in the sound and non-sound categories. Full size image

To provide a clear distinction between the sound and non-sound stimulus groups, pictures scoring an average value of 0.5–2 were placed in the sound category, whereas pictures scoring a value of 0 were placed in the non-sound category. A t-test applied to the 2 groups confirmed that their auditory contents were significantly different (Sound = 1.41, SE = 0.37; Non-sound = 0; t-value = 46.58; p < 0.05). Three hundred (150 sound and 150 non-sound) images meeting the above criteria were then selected to create the final stimulus set; some example images are shown in Fig. 6.

The stimuli in the 2 classes were also matched for their affective value by presenting the pictures to a group of 10 judges (5 men and 5 women) different than those used above and asking them to evaluate the stimuli in terms of their affective content using a 3-point scale (with 2, 1 and 0 being strong, weak and null affective value, respectively). A t-test applied to the 2 groups confirmed that their affective values were not significantly different (Sound = 0.76; Non-sound = 0.66; t-value = 1.68; p = 0.09).

Twenty-five additional photos depicting a cycle race were included in the stimulus set for the subjects to perform a secondary task (described below); these images were of similar average luminance, size and spatial distribution as the other images. The sound and non-sound images were presented in random order together with the 25 cycle race photos. The stimulus size was 14.2 × 14.2 cm subtending a visual angle of 6°43′01″. Each image was presented for 1000 ms against a dark grey background at the center of a computer screen with an ISI of 1500–1900 ms.

Task and procedure

The participants were comfortably seated in a darkened test area that was acoustically and electrically shielded. A high-resolution VGA computer screen was placed 120 cm in front of their eyes. The subjects were instructed to gaze at the center of the screen (where a small circle served as a fixation point) and to avoid any eye or body movement during the recording session. The stimuli were presented in random order at the center of the screen in 6 different randomly mixed short runs lasting approximately 2 minutes and 40 seconds. To keep the subject focused on the visual stimuli, the task consisted of responding as accurately and quickly as possible to photos displaying cycle races by pressing a response key with the index finger of the left or right hand; all other photos were to be ignored. The left and right hands were used alternately throughout the recording session and the order of the hand and task conditions were counterbalanced across the subjects. For each experimental run, the target stimuli varied between 3–7 and the presentation order differed among the subjects.

EEG recording and analysis

The EEG data were continuously recorded from 128 scalp sites at a sampling rate of 512 Hz. Horizontal and vertical eye movements were also recorded and linked ears served as the reference lead. The EEG and electro-oculogram (EOG) were filtered with a half-amplitude band pass of 0.016–100 Hz. Electrode impedance was maintained below 5 kΩ. EEG epochs were synchronized with the onset of stimulus presentation. Computerized artifact rejection was performed prior to averaging to discard epochs in which eye movements, blinks, excessive muscle potentials or amplifier blocking occurred. The artifact rejection criterion was a peak-to-peak amplitude exceeding 50 μV and resulted in a rejection rate of ∼5%. Evoked-response potentials (ERPs) from 100 ms before through 1000 ms after stimulus onset were averaged off-line. ERP components (including the site and latency to reach maximum amplitude) were identified and measured with respect to the baseline voltage, which was averaged over the interval from −100 ms to 0 ms.

The peak amplitude and latency of sensory P1 response was measured at mesial occipital (O1, O2) and lateral occipital (POO9h, POO10h) electrode sites, in the 80–120 ms time window. The mean amplitude of frontal N1 and N2 were measured at the left and right central (C1, C2, C3, C4), frontal (F1, F2, F3 and F4) and fronto-central (FC1, FC2, FC3 and FC4) electrode sites in the 100–120-ms and 200–275-ms time windows, respectively. The mean amplitude of the temporal P3 component was measured at the posterior temporal and temporo-parietal (T7, T8, TTP7h and TTP8h) electrode sites in the 600–800-ms time window. Multifactorial repeated measures were applied to the ERP data using the following within factors: stimulus category (Sound, Non-Sound), electrode (according to the ERP component of interest) and hemisphere (Left, Right). Multiple comparisons of means were performed by the post-hoc Tukey test. The alpha inflation due to multiple comparisons was corrected by means of Greenhouse-Geisser correction. The degrees of freedom accordingly modified are reported, together with ε and corrected probability level.

Low-Resolution Electromagnetic Tomography (LORETA) was performed on the ERP waveforms at the latency stage where the sound/non-sound difference was greatest, namely, at N1, N2 and P3 levels. LORETA22 is a discrete linear solution to the inverse EEG problem and corresponds to the 3D distribution of neuronal electrical activity that has maximally similar (i.e., maximally synchronized) orientation and strength between neighboring neuronal populations (represented by adjacent voxels). In this study, an improved version of standardized weighted LORETA was used; this version, called swLORETA, incorporates a singular value decomposition-based lead field weighting method. The source space properties included grid spacing (the distance between two calculation points) of 5 points and an estimated signal-to-noise ratio (which defines the regularization; a higher value indicates less regularization and therefore less blurred results) of 3. SwLORETA was performed on the group data and identified statistically significant electromagnetic dipoles (p < 0.05) with larger magnitudes correlating with more significant activation. A realistic boundary element model (BEM) was derived from a T1-weighted 3D MRI data set by segmentation of the brain tissue. This BEM model consisted of one homogenous compartment comprised of 3,446 vertices and 6,888 triangles. The head model was used for intracranial localization of surface potentials. Both segmentation and generation of the head model were performed using the ASA software program.