Most language users agree that some words sound harsh (e.g. grotesque) whereas others sound soft and pleasing (e.g. lagoon). While this prominent feature of human language has always been creatively deployed in art and poetry, it is still largely unknown whether the sound of a word in itself makes any contribution to the word’s meaning as perceived and interpreted by the listener. In a large-scale lexicon analysis, we focused on the affective substrates of words’ meaning (i.e. affective meaning) and words’ sound (i.e. affective sound); both being measured on a two-dimensional space of valence (ranging from pleasant to unpleasant) and arousal (ranging from calm to excited). We tested the hypothesis that the sound of a word possesses affective iconic characteristics that can implicitly influence listeners when evaluating the affective meaning of that word. The results show that a significant portion of the variance in affective meaning ratings of printed words depends on a number of spectral and temporal acoustic features extracted from these words after converting them to their spoken form (study1). In order to test the affective nature of this effect, we independently assessed the affective sound of these words using two different methods: through direct rating (study2a), and through acoustic models that we implemented based on pseudoword materials (study2b). In line with our hypothesis, the estimated contribution of words’ sound to ratings of words’ affective meaning was indeed associated with the affective sound of these words; with a stronger effect for arousal than for valence. Further analyses revealed crucial phonetic features potentially causing the effect of sound on meaning: For instance, words with short vowels, voiceless consonants, and hissing sibilants (as in ‘piss’) feel more arousing and negative. Our findings suggest that the process of meaning making is not solely determined by arbitrary mappings between formal aspects of words and concepts they refer to. Rather, even in silent reading, words’ acoustic profiles provide affective perceptual cues that language users may implicitly use to construct words’ overall meaning.

Introduction

Human language has generally been considered to be entirely symbolic in that words convey meaning through conventional and arbitrary links to concepts they refer to [1]. From this perspective, phonemes (i.e. the speech sounds that constitute words) have no inherent semantic content nor have they any stand-alone contribution to words’ meaning. Nevertheless, even a naïve reader—without prior knowledge of such literary devices as cacophony or euphony—would experience how, for instance, in Poe’s verse “…Hear the loud alarum bells—Brazen bells!—What tale of terror, now, their turbulency tells!” [2], the explosive consonant /t/ and other harsh and discordant sounds (e.g. hissing sibilants /s/ and /z/) evoke a feeling of “terror” provoked by “brazen” bells.

Within literary studies, many have noted that poetry achieves much of its affective aesthetic impact through sound manipulation, and that phonological structure has a semantic function beyond the decorative [3–5]. In a similar fashion, swear words usually possess specific phonological patterns that can potentially amplify the negative emotional response that they mean to evoke [6]. Looking at the famous seven words listed by American comedian George Carlin that “you can never say on television” [7] reveals that all of these words contain voiceless stops (/t/ and /k/) or hissing sibilants (/s/ and /ʃ/), which are fortis consonants, articulated with greater oral pressure and relatively higher muscular force compared to their lenis counterparts.

However, despite the fact that influential linguists and experimental psychologists throughout the last century promoted the idea that the sound of a word may have a synchronic, productive effect on overall meaning construction [8–10], the notion of the arbitrariness of the linguistic sign [1] has generally dominated research on human language.

More recently, a growing body of research challenges the idea of absolute arbitrariness by providing evidence for non-arbitrary sound-to-meaning correspondences (see [11–13] for reviews) including some universal patterns across various languages of the world [14]. These results assign a supplementary function to sound-to-meaning correspondences that structure vocabulary [15,16] and play an important role for both phylogenetic language evolution [16–18] and ontogenetic language development [18,19]. Nonetheless, despite the increasing number of studies examining sound-to-meaning associations, to the best of our knowledge, there has been no empirical study examining whether specific properties in the sound of a real word play a part in contributing to its overall meaning. With the present study, we aimed at addressing this research question. By focusing on the ‘affective meaning’ of words, and by providing reliable quantitative measures for ‘affective sound’ of words, we investigated how the sound of a word potentially contributes to its meaning as perceived and evaluated by the listener. A further goal of this study was to explore the affective acoustic cues and their underlying phonetic features that may implicitly influence language users when evaluating words’ affective meaning.

Motivation for the present study Our approach was motivated by a number of limitations evident in previous work. Experimental research based on behavioral data has hitherto merely investigated the links between some selective, rather isolated attributes of meaning (e.g. the physical size of the referent) and some aspects of sound (e.g. intrinsic pitch of vowels) mainly by using nonword stimuli (see [20] for supporting a graded relationship between sound and meaning, and [21], for an evolutionary perspective on the phenomenon). Such approaches exhibit three major limitations that we aimed to address in the present study. The first limitation relates to the focus on semantic effects of phonemes in nonwords instead of natural words. Such studies are motivated by the fact that natural words in a language are linked to predetermined semantic concepts that are automatically activated during word recognition. In order to disentangle the effect of phonology from that of semantics, the majority of previous studies therefore relied on nonword stimuli usually used in a forced-choice paradigm thus limiting the generalizability of the results to real words. For instance, the phonemes /ɑ/ and /ɪ/ when used in experimentally manipulated nonwords—as in “mal” and “mil” in the seminal study by Sapir [9]—have repeatedly been suggested to denote big and small objects, respectively [11,12]. However, in a natural language like English, they appear in the corresponding semantic concepts in the opposite way: /smɑl/ and /bɪɡ/. This begs the question to what extent the results of these studies can be linked to natural language processing and whether the assumed quality of phonemes has, if any, effects on the evaluation of meaning for real words. A second issue relates to the problem of deciphering the likely cause(s) of sound-to-meaning correspondences. Proposals on non-arbitrariness of language distinguish between two types of motivations for such sound-meaning mappings [12]: Iconicity, which is based on perceptual similarities between sound and meaning (e.g. onomatopoeia), versus systematicity which is based on statistical regularities in language that link specific patterns of sound to specific semantic or grammatical concepts [22,23]. Besides some familiar and straightforward examples of iconicity—such as onomatopoetic words—research in this field still faces the question of whether existing findings on the relationship between sound and meaning are caused by specific distributions of phonemes in a language (i.e. systematicity), or by perceptual qualities that phonemes inherently convey (i.e. iconicity). The phonaestheme /sn-/ appearing as an initial sound cluster in many English words related to ‘mouth’ or ‘nose’ may serve to illustrate this dilemma [24]. In this case, there has been no empirical support showing whether there is a specific (nasal) quality in the sound of /sn-/ that is linked with the concepts of ‘mouth’ or ‘nose’, or rather the organization of the vocabulary is designed in a way that this specific sound cluster over-proportionally appears in words that are related to these concepts. The third and presumably most important issue is that the operationalization of meaning in this field of research has so far been restricted to only some selective aspects of sensorimotor information (e.g. shape, movement). The role of affect as a most basic human experience shaping the learning, representation, and processing of language [25–29] has been surprisingly neglected. Indeed, affective dimensions of words, in particular, valence and arousal, are essential features defining a two-dimensional semantic space allowing for a very basic and potentially the most relevant distinction between different concepts; as empirically established by semantic differential [30]. In an attempt to provide a quantitative measure for words’ meaning, Osgood [30] defined 100 different lexical dimensions and asked participants to allocate the meaning of words for each dimension in an experiential continuum definable by a pair of polar terms (e.g. soft/hard, long/short, angular/rounded). Factor analyses conducted on the wide variety of verbal judgments indicated that most of the variance was accounted for by three major semantic dimensions: The two primary dimensions of ‘valence’ and ‘arousal’, and a third, less strongly-related dimension (in terms of the explained variance) of ‘dominance’ or ‘control’ [31]. Therefore, these factors have been considered basic dimensions of the semantic space within which the meaning of any concept can be specified. Moreover, the expression and perception of affective states are fundamental aspects of human communication [32,33] that have been proposed as the original impetus for language evolution; with mimetic vocalization of emotional sounds supposedly allowing early hominids to efficiently share biologically significant information [32,34,35]. Therefore, we would expect the effect of iconicity to be most evident in the communication of affect and in the relationship between words’ affective sound (i.e. how emotionally words sound) and words’ affective meaning (i.e. their position in the bi-dimensional affective space of lexical valence and arousal). Thus iconicity can serve as an interface for accomplishing the need to map linguistic form to human affective experience as a vital part of meaning making.

An embodied view on affective meaning It is important to consider that the notion of “affective meaning” may not be shared by all theories on linguistic meaning. Our approach in this work is based on an embodied view of language which proposes that meaning is grounded in behavior (perception and action) and neural circuitry of the producer or the interpreter of linguistic signs [25,28,36–40]. Ultimately, part of the meaning of any utterance is its effect on the (physical and emotional) well-being of the person saying or hearing it, and everything that matters is represented in each individual person’s brain and its neurophysiological systems. Presumably, the most fundamental such system is affect: in order to make meaning, we need to know what object/event in our environment requires us to react with alert or to keep calm, to approach or to withdraw. Moreover, the ability to distinguish between such affective contexts or reactions is linked to attention systems that select specific sensory input for further processing, and also to motor systems that select specific actions for output. Both systems (i.e. sensory and motor) provide crucial information for the construction of meaning by language users. Findings on the role of affective meaning in modulating various cognitive processes, such as learning, memory, attention or language processing, [25,26,28,41] support the idea that affective meaning is intertwined with other lexico-semantic aspects and has an essential and basic contribution to the process of meaning making.