In recent years, numerous studies have provided converging evidence that word meaning is partially stored in modality-specific cortical networks. However, little is known about the mechanisms supporting the integration of this distributed semantic content into coherent conceptual representations. In the current study we aimed to address this issue by using EEG to look at the spatial and temporal dynamics of feature integration during word comprehension. Specifically, participants were presented with two modality-specific features (i.e., visual or auditory features such as silver and loud) and asked to verify whether these two features were compatible with a subsequently presented target word (e.g., WHISTLE). Each pair of features described properties from either the same modality (e.g., silver, tiny = visual features) or different modalities (e.g., silver, loud = visual, auditory). Behavioral and EEG data were collected. The results show that verifying features that are putatively represented in the same modality-specific network is faster than verifying features across modalities. At the neural level, integrating features across modalities induces sustained oscillatory activity around the theta range (4–6 Hz) in left anterior temporal lobe (ATL), a putative hub for integrating distributed semantic content. In addition, enhanced long-range network interactions in the theta range were seen between left ATL and a widespread cortical network. These results suggest that oscillatory dynamics in the theta range could be involved in integrating multimodal semantic content by creating transient functional networks linking distributed modality-specific networks and multimodal semantic hubs such as left ATL.

Introduction

The embodied framework of language suggests that lexical-semantic knowledge (i.e., word meaning) is stored in part in modality-specific networks that are distributed across the cortex [1]–[4]. For example, words denoting colors (e.g., red, green) have been shown to engage parts of the ventral visual stream [5], while words denoting actions (e.g., kick, pick) engage the dorsal motor network [6]. In recent years, much has been done to understand the automaticity, flexibility and reliability of the link between action/perception and word meaning [5], [7]–[10]. The current study extends this body of literature by addressing the question of how distributed lexical-semantic features are integrated during word comprehension.

Although ample evidence for the link between word meaning and perception/action systems exists, the bulk of research in this field has reduced lexical-semantic information to one dominant modality (e.g., vision for red and action for kick). The motivation for focusing on single modalities is clearly methodological: by focusing on words with a clear association to one modality, good hypotheses can be generated for testing empirically. However, words clearly refer to items that are experienced through multiple modalities in the real world (e.g., a football is associated with both a specific visual form and a specific action), and embodied accounts of language have done little to address how multimodal information interacts during the processing of word meaning. The one exception to this rule has been the attempt to understand how lexical-semantic processing can be focused flexibly on information from one modality versus another. For example, van Dam and colleagues [10] demonstrated that words denoting objects that are strongly associated with both action and visual information (e.g., tennis ball) reliably activate both motor and visual pathways in the cortex. Interestingly, motor pathways also responded more strongly when participants were asked to indicate what to do with the object rather than what it looks like. Likewise, Hoenig and colleagues [8] have shown that even for objects with dominant modality-specific features (e.g., actions for artifacts), the pattern of activation in visual and motor networks is differentially modulated if a dominant (action) or non-dominant (visual) feature is primed. Notably, modality-specific networks show a stronger response to the target if the prime was not a dominant feature. Taken together, the studies by van Dam et al. [10] and Hoenig et al. [8] suggest that word meaning is partially stored in a network of areas that are recruited in a modality-specific and flexible way. However, it should also be pointed out that most of this evidence is of a correlational nature. As yet, little is known about the causal role of modality-specific networks in lexical-semantic processing, and how they are related to more abstract semantic knowledge [11], [12].

While studies highlighting the flexible recruitment of different types of modality-specific information confirm that single words are associated with multiple types of perceptual experience, it is still unknown how information from multiple sources in the brain (e.g., visual and action features) is united to form a coherent concept that is both visual and motoric. Cross-modal integration has been studied extensively with respect to object perception [13]–[16]. However, its role in forming lexical-semantic representations has been largely neglected, even within the embodied framework. Several theoretical perspectives have argued for the existence of amodal integration ‘hubs’ or foci, at which information relevant for lexical-semantic processing is combined [17], [18]. Neuropsychological data has provided compelling evidence that the anterior temporal lobes (ATL) may be a good candidate for such a hub [18], [19]. Thus, there is a general acceptance that information from distributed modality-specific networks is integrated in some way, somewhere in the brain. However, virtually no research has looked at what the neural mechanisms underlying semantic integration might be in these hub regions or more widely across the brain.

One way to investigate the mechanisms underlying integration across cortical areas is to study modulations in oscillatory power in EEG and MEG signals that have been related to network interactions at different cortical scales [20], [21]. Specifically, low frequency modulations (< 20 Hz) are often reported when tasks require the retrieval and integration of information from distant cortical sites, which is generally the case for memory and language [22]–[25]. In contrast, modulations in high frequency bands (>30 Hz) are observed when tasks require local, modality-specific, network interactions such as saccade planning or visual object binding [26], [27]. According to this framework, the specific network dynamics underlying the integrating of lexical-semantic features across different modalities should be reflected in a modulation in low frequencies.

The aim of the current study was to investigate what mechanisms underlie the integration of semantic features across modalities. This question was addressed in two experiments using a dual property verification task. Participants were asked to indicate whether a feature pair (e.g., silver, loud) is consistent with a target word (e.g., WHISTLE). Critically, the feature pair could either be from the same modality (e.g., both visual), or from different modalities (e.g., visual and auditory). In Experiment 1 we analyzed verification times for cross-modal and modality-specific feature contexts to investigate whether integrating multimodal semantic content, that is content, which is represented in distributed semantic networks, incurs a processing cost. Specifically, we hypothesize that integrating features represented within a single modality-specific network is faster than integrating features across modalities. In Experiment 2, we used EEG to measure changes in oscillatory neuronal activity during the target word when participants were asked to integrate features from the same or different modalities. Oscillatory neuronal activity could be a neural mechanism that contributes to semantic integration by linking modality-specific networks to multimodal convergence zones such as ATL. In line with this idea, we hypothesize that integrating semantic information from multiple modalities will be reflected in enhanced low frequency oscillatory activity in multimodal convergence zones, as well as substantial network interaction between these regions and a widespread cortical network.