The goal of the present study was to test whether musical expertise can modify the on-line neural correlates of speech segmentation. Both musicians and nonmusicians showed a progressively emerging fronto-central negative component in the 350–550 ms latency band. Nonetheless, while musicians showed an inverted U-shaped N400 curve, nonmusicians showed a rather linear N400 curve (see Figure 4). Interestingly, the level of performance in the linguistic test could be predicted as a function of the time bin having the maximum N400 amplitude; participants for whom the N400 reached its maximum in an early time bin had a higher level of performance that those where the N400 amplitude reached its maximum later (see Figure 5).

The behavioural results confirm our previous study with adults and children [37], [38], [49] as well as other recent evidence showing that musicians outperform nonmusicians in implicit segmentation tasks [35], [50], [51], possibly due to a greater sensitivity to the statistical properties of the auditory input stream in experts than in non experts [52]. We found no evidence of learning in either group in the music condition. This is probably partly due to a lack of musical significance in the stream and most importantly to a greater interference in the musical test due to the presence of foils (spanning word boundaries) that are highly competing with the melodies of the language due to the relative nature of pitch sequences (intervals). This lack of learning of the musical dimension in both groups is important because it supports the notion that the learning effect in musicians in the language dimension was not driven by musical characteristics of the words.

Of great interest here is the fact that the participants who were most accurate on the linguistic test were those showing maximum N400 amplitude early in the exposure phase. Moreover, neither the maximum amplitude of the N400 nor the increase in N400 amplitude predicted the level of performance in this test. These results are important for two reasons. First, they show that musicians and nonmusicians not only have different segmentation abilities, but that these skills rely on different neural dynamics as estimated from EEG during the exposure phase. Second, N400 modulations are a powerful predictor of the success in the following test. This means that a completely implicit and non-interfering measure such as the dynamics of the N400 during passive exposure can be a valuable indicator of speech segmentation competences. This finding may have in turn strong implications in fundamental and clinical research when working for instance with babies, young children or pathologic populations (e.g. patients with executive functions or speech disorders). Finally, the different patterns of ERP modulations found in these 2 groups extend our knowledge on general theories of learning such as the time-dependent hypothesis of learning.

Faster word extraction in Musicians than in nonmusicians

Modulations of the amplitude of early ERP components (N1 and/or P2) during exposure have been previously described in nonmusicians using similar paradigms [44], [53]. Recently, an effect of musical practice was found on the P50 component using a stream of tones [54]. In the present study, while during the first minute of exposure (first time bin), musicians seem to show larger N1 than nonmusicians, this difference did not reach significance. This discrepancy with previous research may be due to the acoustic features of the stimuli used in our study; the set of consonants we used had heterogeneous attack times probably resulting in larger ERP latency variability compared to studies using piano tones for instance. Future experiments will be needed to confirm the involvement of these early ERP components in the segmentation process and their interactions with musical expertise.

Nonetheless, despite a lack of significance on the early ERP components, the dynamic patterns of N400 modulations along the exposure phase clearly differentiated the two groups before the behavioural test: musicians showed an inverted U-shaped N400 amplitude curve while a linear N400 amplitude curve was observed in nonmusicians. A previous study using both EEG source reconstruction and fMRI with a similar artificial language learning (ALL) paradigm has described the middle temporal gyrus as a possible generator of this fronto-central component [43]. The fact that no learning related modulations were found on auditory ERP components whereas we found modulations on the N400 component suggests that the difference between the 2 groups goes beyond the auditory cortices possibly at the level of the superior temporal plane [55] and middle temporal gyrus [43].

Musicians showed a significant increase in N400 amplitude as soon as the second time bin of the exposure phase (i.e. between 1'20'' and 2'40''). Previous studies using similar artificial language learning paradigms with speech and tone streams have reported a similar steep increase in N400 amplitude after 2 minutes of exposure in the group of good learners only [43]. This N400 increase has been interpreted as reflecting the building of proto-lexical representations. While at the beginning the parsing unit is possibly the syllable, due to the statistical properties of the material the three syllables comprising a given word are little by little perceived as a unique pattern: a new word candidate. Thus, a faster N400 increase in musicians points to a faster ability to take advantage of the statistical structure of the stream to segment the words. Interestingly, the superior temporal plane seems to be sensitive to the statistical regularities of the input [55] and metabolic activity within this region is positively related to participants' ability to recognize words during the behavioural test of a similar artificial language learning experiment [56]. Importantly, at the structural level, musicians show larger planum temporale than nonmusicians [57], [58]. Thus, the anatomo-functional reorganization induced by musical practice within this region may well be at the origin of musicians' superiority in speech segmentation. Additionally because the speech stream used was sung, it might be that musicians were more sensitive to the pitch patterns contained in the speech stream than nonmusicians. However, as previously mentioned, the lack of learning in the musical dimension supports the notion that the learning effect reported in musicians in the language dimension was not driven by musical characteristics of the words. Rather musicians may take advantage of their rhythmic skills that may allow them to orient attention at the most salient time points of the stream (word boundaries). In other words, as long as attention remains "entrained" at the syllable level, words are not segmented. As soon as attention is oriented at longer time windows (here three syllables), words may start to pop out of the stream.

The steep increase in N400 amplitude was immediately followed by a 2-minute asymptote that could reflect the saturation of the network. This N400 plateau could reflect the consolidation of word memory traces within a fronto-temporal network allowing for later word recognition. One may make the hypothesis that increasing the duration of the exposure phase for nonmusicians would result in a similar but delayed asymptote. In other words the neural mechanisms of this type of learning are probably not fundamentally different in musicians and nonmusicians. Differences would simply be quantitative, with musicians having a faster segmentation than nonmusicians; comparing musicians to non-musicians who were equally good language learners one would expect the learning curves to be similar. Interestingly, this is the case for the one nonmusician having a good behavioural performance (72% correct) who also shows a peak of N400 amplitude at the second time bin. This gives again the impression that the U-shape curve does predict learning to some extent.

An alternative explanation of this asymptote could rely on the implication of the working memory system and in particular its articulatory rehearsal subcomponent that has been shown to play an important role in speech segmentation and word learning [43], [56]. Indeed, disrupting the rehearsal mechanism with an articulatory suppression procedure along the exposure phase leads to unsuccessful word segmentation [59]. Interestingly, a recent study has revealed that musicians have better functioning and faster updating of working memory than nonmusicians [60]. In the same vein, it has been shown that compared to nonmusicians, musicians can hold more information and for longer periods in their auditory memory [61]. Thus, musicians may have been relying more on an articulatory rehearsal mechanism than nonmusicians leading to better word segmentation. Because there is now evidence of greater working memory in musicians [60], [61], future research will need to bridge working memory and segmentation abilities and the extent to which inter-individual differences in working memory may subsequently drive differences in segmentation abilities.

Finally, the last 2 minutes of the exposure phase showed a decrease in N400 amplitude in musicians but not in nonmusicians. A similar decrease has been reported in two previous studies on ALL and on tone stream segmentation [43], [45]. Additionally, when a word is known, its familiarity and repetition will typically engender a reduction in N400 amplitude [39], [40], [62]. In the case of ALL experiments, a decrease in N400 amplitude has also been interpreted as reflecting a phonemic template pattern matching/recognition process probably involving the Inferior Frontal Gyrus/PreMotor Cortex complex (IFG/PMC) [43], [63]. Interestingly, this area is also involved in harmonic music perception [64], [65] and has an increased gray matter density and volume in musicians compared to nonmusicians [66].

Finally, musical practice has been shown to increase both structural and functional connectivity within the speech-processing network in patients recovering from stroke [67] and in children [68]. Both adult musicians and 8-year old children who followed 2 years of musical training show a more developed arcuate fasciculus than nonmusicians [68]–[70]. This fiber bundle is crucial in the mapping of speech sounds to articulatory gestures by connecting the posterior part of the Superior Temporal Gyrus to the IFG/PMC [71], [72]. Lesions of the arcuate fasciculus induce impairment not only of phonological and word repetition but also in verbal short-term memory [73]–[75]. Interestingly, a recently published study has revealed that the arcuate fasciculus is crucial in mediating word learning [76]. Thus, increased connectivity between auditory and motor regions might lead to better segmentation skills.

To conclude, the present results bring new evidence showing that musicians are not only better but also faster at segmenting an artificial language compared to nonmusicians. The modulation of the purported neural correlates of learning were evident earlier in the exposure phase in musicians than in nonmusicians suggesting that word segmentation is achieved more quickly during the exposure phase. The different patterns of ERP modulations during exposure as well as the significant correlation with behavior in a following test provide additional validity to the time-dependent hypothesis stating that an increasing activation of the network sustaining a specific learning process should be limited to the initial learning periods and should not be visible after the learning is accomplished [46].