Communicating emotions to conspecifics (emotion expression) allows the regulation of social interactions (e.g. approach and avoidance). Moreover, when emotions are transmitted from one individual to the next, leading to state matching (emotional contagion), information transfer and coordination between group members are facilitated. Despite the high potential for vocalizations to influence the affective state of surrounding individuals, vocal contagion of emotions has been largely unexplored in non-human animals. In this paper, I review the evidence for discrimination of vocal expression of emotions, which is a necessary step for emotional contagion to occur. I then describe possible proximate mechanisms underlying vocal contagion of emotions, propose criteria to assess this phenomenon and review the existing evidence. The literature so far shows that non-human animals are able to discriminate and be affected by conspecific and also potentially heterospecific (e.g. human) vocal expression of emotions. Since humans heavily rely on vocalizations to communicate (speech), I suggest that studying vocal contagion of emotions in non-human animals can lead to a better understanding of the evolution of emotional contagion and empathy.

1. Introduction

Emotions are intense, short-term valenced (positive or negative) states triggered in response to specific internal or external stimuli of importance for the organism, and their main function is to guide behavioural decisions (e.g. approach or avoid stimuli [1]). Because the emotions of non-human animals have long been considered as unobservable processes that could not be objectively studied, scientific interest in this topic is relatively recent [2]. Over the past two decades in particular, significant advances in this field of research have been made, mainly for human benefit (e.g. pharmaceutical development), but also to study animal behaviour and assess animal welfare. As a result, new frameworks that offer researchers methods to study animal affective states have emerged [3,4]. For instance, Mendl and co-workers' framework [4] proposes the assessment of the two main dimensions of emotions, valence and arousal (bodily activation/excitation), using the neuro-physiological (e.g. heart rate, skin temperature, neuroendocrine and brain activity [5,6]), behavioural (e.g. ear and tail postures, facial and vocal expressions [7,8]) and cognitive (e.g. judgement biases [9]) changes that accompany emotions and that can be objectively measured in animals.

Some of the changes accompanying emotions (e.g. facial and vocal expressions) can be detected by conspecifics, which might lead to emotional contagion. This phenomenon occurs when an emotion is transmitted through a signal of a certain modality (e.g. olfactory, visual or vocal) from the emitter to the receiver of the signal and automatically (without necessarily requiring conscious and effortful processing) triggers state matching between the two individuals. It is the common denominator of all empathic processes (the ability to be affected by, and share the emotions of others [10,11]). Emotional contagion serves an important function among group-living animals; sharing emotions regulates social interactions and improves the transfer of information between individuals, resulting in higher coordination and cohesion among group members [1]. For example, contagion of negative emotions (e.g. fear) enables rapid defensive behaviours towards predators. Conversely, transmission of positive emotions (e.g. joy) can strengthen social bonds [12]. Moreover, emotional contagion can lead to more cognitive forms of empathy, including sympathetic concern and empathic perspective-taking, which require the receiver to downregulate its own emotional response triggered by affective sharing when necessary, in order to effectively help the conspecific in need [11,13].

As the first stage of empathy, emotional contagion is acknowledged to be widespread in the animal kingdom [11]. However, because of the former lack of methods to study affective states in animals, our knowledge of this phenomenon in non-human species is still relatively poor, particularly concerning contagion of positive emotions [12,14]. Moreover, the situations in which spread of emotions is most likely to occur (e.g. individuals involved, past experience, context) and the prevalent modalities used for transmitting emotions (e.g. olfactory, visual or vocal) have not often been tested in emotional contagion studies [15]. Vocalizations have been shown to reflect the emotion of emitters in numerous species [16,17]. They are also salient, discrete events that can be transmitted over long distances, despite obstacles and can be detected in low visibility conditions (e.g. foggy/dark environments) [18]. Furthermore, because conspecific emotional vocalizations can convey important information about the outcome of social interactions (e.g. affiliation or aggression) or about the environment (e.g. the presence of food or danger [1]), it is likely that they trigger strong emotions in receivers, which should be associated with high motivation to respond [19], and thus clear matched responses between emitters and receivers when necessary. As a result, vocalizations are very likely to play a crucial role in emotional contagion [20].

This review aims to assess the potential of vocalizations to trigger emotional contagion in non-human animals and to establish criteria to determine the existence of this phenomenon. To achieve this goal, I will first review the evidence for discrimination of vocal expression of emotions, because this is a necessary step for emotional contagion to occur. I will then discuss potential proximate mechanisms underlying vocal contagion of emotions, propose criteria for assessing this phenomenon, and review the existing evidence. I will apply the two-dimensional framework (valence and arousal) to vocal contagion of emotions as follows; contagion of valence occurs if a vocalization indicating a positive state triggers a change in valence from negative to positive, or from neutral to positive (and vice versa for negative vocalizations) in a receiver. Moreover, the emotional arousal (low or high) indicated by this vocalization could also be transmitted and modify the receiver's arousal accordingly (contagion of arousal). On the emitter's side, this process might be passive or active (the emitter does not, or does actively aim to affect the receiver's emotion, respectively) [11]. On the receiver's side, the emotion triggered by this process could be consciously experienced or not (be accompanied by a subjective component or not) [14]. Since the question of how much consciousness is involved in this process is beyond the scope of this review, I will not discuss it here.

2. Evidence for discrimination of vocal expression of emotions

In order to assess the potential of vocalizations to lead to emotional contagion, we first have to ensure that the animals have the ability to discriminate and, therefore, potentially perceive vocal expression of emotions. Indeed, in order for contagion of emotional valence to occur, animals should be able to discriminate between vocalizations produced while the emitter is experiencing positive and negative emotions of similar arousal. Similarly, in order for contagion of emotional arousal to occur, conspecifics should be able to discriminate between vocalizations reflecting various levels of arousal and similar valence. In this section, I will review the evidence for discrimination of vocal expression of valence and arousal. Since the evidence concerning the arousal dimension of emotions is stronger than concerning valence, I will start describing findings related to arousal before valence.

(a) Discrimination of emotional arousal

The most direct method to investigate discrimination of vocal expression of emotional arousal, is to play back vocalizations associated with different arousal levels, and to test if the behavioural responses of animals exposed to these various sound treatments differ. Using this method, Fischer et al. [21] showed that chacma baboons (Papio cynocephalus ursinus) looked at the loudspeaker for longer when typical alarm barks produced in response to dangerous predators (indicating higher arousal) were played back, rather than when intermediate alarm barks and intermediate contact barks (intermediate forms between typical contact and alarm barks), as well as typical contact barks (all indicating lower arousal), were played back. Similarly Slocombe et al. [22] demonstrated that wild chimpanzees (Pan troglodytes schweinfurthii) looked at the speaker for longer when the screams of victims produced in response to severe aggression (indicating higher arousal) were broadcast, compared to screams produced in response to mild aggression and tantrum screams emitted during social frustration (indicating lower arousal). Thus, such a method constitutes a good tool for testing discrimination of vocalizations associated with various levels of arousal. Moreover, in cases where stronger responses (e.g. faster and longer orienting response towards the loudspeaker or longer movement duration) are observed when higher-arousal compared to lower-arousal calls are broadcast, like in the two above-mentioned studies [21,22], it might suggest that emotional contagion occurred (see the Contagion of emotional arousal section for more details).

In cases where the responses triggered by the various sound treatments used in the above-mentioned method do not differ, two interpretations can be made. Either the animals are not able to discriminate sounds associated with various arousal levels, or they have this ability but do not respond differently to the sound treatments, because they are not motivated to do so in the context of the playback. A useful alternative method that increases chances to highlight perceptual abilities, even in cases where the responses to the broadcasted sound is qualitatively and quantitatively similar, is the habituation-recovery paradigm (e.g. [23]). This paradigm consists of a habituation phase, during which a set of vocalizations from a given type, or a set of vocalizations from a given variant of the same type, is played back until habituation occurs. Vocalization types are defined as biologically meaningful sound classes, which differ by their acoustic structure (ideally defined by a classification analysis; e.g. unsupervised cluster analysis [24] or fuzzy clustering [25]) and often also by their function or context of production (e.g. cat meows and purrs). Conversely, variants of vocalization types constitute acoustically graded intermediates of a given type (cat meows associated with different levels of arousal) [26]. Once the animal is habituated to the first vocalization type (e.g. a set of cat meows) or variant (e.g. a set of cat meows associated with low arousal), as revealed by a decrease in physiological and/or behavioural response to the habituation sounds, another type or variant associated with a different arousal level is being played (e.g. a set of cat purrs or of cat meows indicating higher arousal, respectively). If the animals discriminate between the two sound treatments used in the habituation and dishabituation phases, and if the relevant information provided by these two sounds differs, they should resume responding to the playback during the dishabituation calls.

Using the habituation-recovery paradigm, Schehka & Zimmermann [27] revealed that treeshrews (Tupaia belangeri) resumed responding to chatter calls (calls produced in high-intensity disturbance contexts) of higher arousal after habituating to this same call type produced in lower-arousal situations, and tended to do the same for higher- and lower-arousal scream calls (calls produced in the context of immediate physical danger). Fischer et al. [21] found that chacma baboons differentiated between typical contact barks (indicating lower arousal) and typical alarm barks (indicating higher arousal), but not between typical contact barks and intermediate alarm barks. Finally, Kastein et al. [28] demonstrated that, although bats (Megaderma lyra) did not show any evidence for discrimination of aggressive calls produced during lower- versus higher-arousal agonistic interactions, they differentiated between response calls (calls in response to aggression calls) associated with lower versus higher arousal. Moreover, the bats resumed responding during the dishabituation phase only if the response calls used in the habituation phase were of lower arousal than the dishabituation calls, and not vice versa. Overall, these studies suggest that the habituation-recovery paradigm is a good method to test if animals can discriminate vocal expression of emotional arousal. Nevertheless, this paradigm might, similar to the direct-playback method described above, sometimes fail to reveal abilities to discriminate between calls indicating different arousal levels. This might occur when subjects lack the motivation to respond to the dishabituation calls [21]. Therefore, when using this paradigm, the order in which higher- and lower-arousal calls are presented in the subsequent phases should be alternated between playbacks (high-arousal calls in the habituation phase followed by low-arousal calls in the dishabituation phase, or vice versa), in order to increase the chances of revealing any existing discrimination abilities.

(b) Discrimination of emotional valence

Evidence for discrimination of vocal expression of emotional valence is sparse. My colleagues and I have tested the ability of horses (Equus caballus) to differentiate between positive and negative whinnies by directly playing back these two whinny variants separately (without habituation) [29]. This allowed us to investigate, simultaneously, discrimination and contagion of valence (see the Contagion of emotional valence section for the results regarding contagion). We recorded physiological (e.g. heart rate and skin temperature) and behavioural (e.g. locomotion and head position) responses to whinnies of both familiar (same farm) and unfamiliar (different farm) horses produced in negative (social separation from group members) and positive (social reunion with group members) contexts, which had been recorded and validated during a previous study [8]. We found that physiological and behavioural responses to playbacks of separation and reunion whinnies differed when these calls were produced by familiar horses. However, this was not the case for whinnies of unfamiliar horses, suggesting that familiarity with the emitter plays a crucial role in discrimination of vocal expression of emotions, as predicted by models of empathy [10].

To my knowledge, the habituation-recovery paradigm has not been used yet to test for discrimination of vocal indicators of valence. This would consist of habituating subjects to positive vocalizations, followed by a dishabituation phase where negative vocalizations of the same arousal level are broadcast, or vice versa. A response recovery during the dishabituation phase would provide evidence that the subjects perceived the difference in valence conveyed by the vocalizations. This method might be a useful tool for studying discrimination of vocal expression of valence.

To summarize, several studies have shown, by directly comparing the responses of animals to various vocalization types or variants associated with different emotions, or by using the habituation-recovery paradigm, that non-human animals have the potential to discriminate vocal expression of emotional arousal and valence. An alternative method that might be promising to test for discrimination of vocal expression of emotions is the use of head-orienting response biases to study lateralized attention to acoustic stimuli (e.g. [30]), which informs about hemispheric asymmetries [31]. It is important to note that ideally, in order to investigate discrimination of emotional valence and arousal separately, only one dimension should vary at a time; animals should be exposed to sounds varying in valence but not arousal, or vice versa. In the next section, I will describe the mechanisms through which vocalizations can have an impact on receivers' affective states.

3. Mechanisms underlying vocal contagion of emotions

Although the mechanisms underlying vocal contagion of emotions in non-human animals have been poorly experimentally studied, several mechanisms through which vocalization could affect surrounding conspecifics, and hence potentially lead to emotional contagion, have been suggested [20,32]. Such mechanisms have been notably well described by Owren & Rendall [33], to support their affect-conditioning model of non-human primate vocal signalling. This model, which is based on learning theory concepts, suggests that vocalizations can have a direct (unconditioned response) or indirect (learned response) affective impact on receivers. Unconditioned effects can occur without previous exposure to a vocalization, simply as a result of the activation of the autonomic nervous system or hypothalamo–pituitary–adrenocortical axis by some specific acoustic features. The acoustic startle reflex (ASR) is a well-documented and widespread direct influence of vocalizations on receivers. This reflex prepares animals for a ‘fight-or-flight' stress response when loud (greater than 80 dB) and abrupt (steep amplitude rise time) sounds are heard, without substantive cortical mediation. It is present in young animals immediately after the onset of hearing, and thus, does not require previous experience. Its magnitude, however, can be later increased (e.g. through sensitization and fear-potentiation) or decreased (e.g. through habituation or attenuation by positive affect). Overall, the ASR causes physiological changes, interruption of activities, and a shift in attention towards the sound source [34].

Many types of animal sounds display some acoustic features (e.g. fast amplitude rising time, energy pulses, upward frequency sweeps, rapid amplitude modulations and spectral noise) that can attract the attention of conspecifics and directly affect their arousal (‘attention-and-arousal-inducing' sounds) [20]. For example, infant distress vocalizations are produced by offspring when in danger (isolated or captured by a human or predator) in order to attract caregivers, and have a similar structure across species; they are continuous tonal sounds with a rich harmonic structure, which often have a simple pattern of frequency modulation (chevron, flat, or descending pattern) [35]. This structural convergence is robust enough for these calls to trigger responses from taxonomically and ecologically distant species, as long as the fundamental frequency (lowest frequency of the sound; ‘F0') falls within the species-specific range [36]. In the same way as infant distress vocalizations, alarm calls show structural similarities across species [37]. In most birds, they tend to be high pitched (high F0) and pure tone, which makes them difficult for predators to locate. Conversely, in other birds (e.g. Australian birds [38]) and mammals, alarm calls cover a range of frequencies [37].

Besides direct effects, vocalizations can influence receivers' affective states indirectly through learning or conditioning [20]. Such indirect effects are likely to take place when a given vocalization type or variant is tightly coupled with an external event. The affect-conditioning model [33] differentiates between ‘affective learning’ and ‘learned affect'. Affective learning might occur when vocalizations that trigger affective responses directly (e.g. attention-and-arousal-inducing sounds) are produced simultaneously with salient events [20]. As a result, the affective experience and heightened attention caused by vocalizations could facilitate further learning about the event. A good example of this phenomenon is the ontogeny of alarm call responses. Although these vocalizations trigger generalized startle responses from a young age, appropriate differentiated escape responses to each type of predator only develop later [39]. Conversely, learned affect could occur through conditioning, when a given vocalization type or variant is associated with an emotionally inducing stimulus. Following this process, the vocalization would constitute a conditioned stimulus that could elicit an emotion (conditioned response) independently of the presence of the unconditioned stimulus. For example, if agonistic vocalizations are regularly emitted by dominant individuals before aggressive interactions, negative emotions triggered in subordinate individuals by such interactions could then be elicited by agonistic vocalizations alone [20]. The same process could occur with affiliative interactions and positive emotions, and has been proposed to play a role in contagion of positive emotions through laughter in humans [40]. Moreover, learned affect could also occur through ‘autoconditioning', where an individual learns to associate its own vocalizations with an emotional situation, and further generalizes this conditioning to similar conspecific sounds [41].

The neural mechanisms of empathic processes in non-human animals are poorly known [15]. According to Panksepp [42], all mammalian brains possess at least seven emotional systems that contribute to the construction of basic emotions, and which rely on deep subcortical brain structures. Panksepp & Panksepp [13] proposed that, similarly as in humans [43], contagion of emotions might activate these same brain structures involved in the first-hand experience of basic emotions (primary processes). Secondary processes, largely supported by the basal ganglia, might then mediate emotional learning and memory (e.g. conditioning), without requiring any level of consciousness [13]. The neuronal mechanisms underlying the acoustic startle reflex have been extensively studied, and findings revealed that it is induced by a short pathway connecting the auditory nerve to brainstem regions controlling arousal and activation [34]. The evidence from studies on rats suggests that during presentation of positive calls (50 kHz), neuronal activity is increased in regions responsible for behavioural indicators of positive emotions (approach behaviour; behavioural activation (secondary motor cortex) and motivated behaviour (nucleus accumbens)). Conversely, during presentation of negative calls (22 kHz), activity is increased in regions responsible for fear and anxiety [44]. In humans, exposure to vocal emotional expression modulates a complex neural network including the amygdala nuclei and the basal ganglia, which are responsible for emotional responses [45]. Moreover, activation of motor regions associated with the production of facial expressions was revealed during exposure to emotional vocalizations, and particularly to those reflecting positive-valence and high-arousal emotions [46]. The neurobiological processes underlying vocal contagion of an emotion is, therefore, likely to consist in the activation (directly or following learning), by the acoustic features of the vocalization, of the brain regions responsible for generating this particular emotion and the associated behavioural responses. Such ‘mirroring' of emotions could occur through pre-wired audio-visual mirror neurons, which have been shown in non-human primates to discharge both when animals perform an action and when they hear the sound related to this action [47]. Alternatively, an induced emotion could activate some neurones from a population of equivalent neurons controlling that particular emotion [15].

To summarize, vocalizations can potentially affect receivers' emotions through direct and indirect impacts of sound features. Emotional vocalizations might however often contain features that can have both direct and learned effects on receivers' emotional states. Moreover, it is likely that acoustic cues to emotions and their effects on receivers have evolved in parallel, as suggested by the similarity between acoustic parameters that increase with the emotional arousal of emitters and those that are known to have a direct effect on receivers (e.g. high amplitude, high F0 and spectral noise [16,20]). This parallel evolution could have occurred if matched emotional responses to acoustic features providing important information about the environment were selected by evolution. Alternatively, acoustic cues most effective in producing matched emotional responses in contexts where coordination between individuals is important for the survival of the emitter or related individuals, might have been selected by evolution as vocal expression of emotions. I suggest that cases where the emotion induced in the receiver by any of the above-mentioned direct and indirect effects, or their combination, matches the emotion of the emitter of the vocalization could constitute evidence for emotional contagion. The contagion strength is likely to depend on the context in which the receiver hears the sound, its current affective states, as well as, similarly to other forms of empathy, on both social factors (e.g. past experiences, familiarity, phenotypic similarity [10]) and physiological factors (e.g. oxytocin [48], glucocorticoid [49]). In the next section, I will review the existing evidence for vocal contagion of emotions.

4. Evidence for vocal contagion of emotions

(a) Criteria for assessing vocal contagion of emotions

I will propose here several criteria that could help strengthen the evidence for vocal contagion of emotions. First, clear evidence for this phenomenon should include an assessment of the emotional state of both the emitter and receiver (e.g. physiological, behavioural or cognitive indicators [4]). Ideally, this assessment should demonstrate that the vocalizations triggered a change in the receiver's emotional state, towards the state that was experienced by the emitter during sound production. Second, it should be clear that the change in emotion observed in the receiver is due to the vocalizations that it was exposed to, and not to other external events [14]. This can be done using playback experiments instead of natural observations, in an emotionally neutral context (a context that does not elicit an emotional reaction). Third, stronger evidence for emotional contagion could come from studies that test if two or more conspecific sounds trigger matched emotional states in receivers and emitters, in addition or not to non-biological control sounds. Otherwise, if responses to only one conspecific sound are compared to a non-biological or heterospecific sound, it is unknown if the reaction to the conspecific sound is a result of emotional contagion or a typical response to a species-specific vocalization. Fourth, although higher emotional arousal often results in the production of vocalizations at higher amplitudes [16], broadcasting loud sounds could result in stronger responses because such sounds are easier to detect, independently of whether emotional contagion occurred or not. Therefore, it is important to control for this confounding factor by, for example, normalizing the amplitude of the different sound treatments used and equalizing their sound pressure levels after broadcasting [23]. Alternatively, if the sound treatments are purposely broadcast at different amplitudes, a non-biological sound treatment could be played at the loudest amplitude used as a control. Non-biological sounds, or artificially modified vocalizations, in which relevant acoustic parameters are modified or absent, can also be used to test which parameters are responsible for emotional contagion (e.g. [50]). Finally, in the same way as for discrimination of emotions, in order to investigate contagion of emotional valence and arousal separately, only one dimension should be tested at a time; animals should be played back sounds varying in valence but not arousal, or vice versa. However, this is not an easy task, and such clear evidence for vocal contagion of emotions is rare. In the rest of this section, I will review the existing evidence for vocal contagion of emotions that fulfil most of these criteria.

(b) Evidence for vocal contagion of emotions

A detailed literature search (May–June, 2017), revealed a few papers that were specifically aimed at testing emotional contagion through vocalizations in non-human animals. However, other studies that were not aimed at testing emotional contagion provide good evidence for this phenomenon as well (e.g. urgency-based alarm calls). The majority of studies that I will describe in this section include a behavioural and/or physiological assessment of the receiver's emotional state upon hearing emotional vocalizations, and some knowledge of the emitter's emotional state during vocal production, or of the context of production. They also include a comparison between receivers' responses to two or more conspecific sounds, which differed in valence or arousal. In the same way as for discrimination, because the evidence concerning the arousal dimension of emotions is stronger than concerning valence, I will first describe findings related to arousal before valence.

(i) Contagion of emotional arousal

Strong evidence for contagion of emotional arousal emerges from studies on urgency-based alarm calls (alarm calls that vary as a function of the urgency level, independently of the predator type; e.g. Sciuridae [51]). In mammals and birds with urgency-based alarm call systems, playbacks have been conducted with the aim of testing how conspecifics react to calls produced under different levels of urgency. These studies generally showed that alarm calls produced in higher-urgency situations (e.g. in the presence of more dangerous predators) trigger stronger or faster reactions in receivers compared to lower-urgency situations (e.g. the presence of non-dangerous animals; e.g. [52,53]). Furthermore, other studies revealed that responses to alarm calls that have been artificially modified to mimic higher-urgency levels (the parameters indicating urgency have been increased) are stronger [51,54]. These findings clearly suggest that conspecifics discriminate between alarm calls encoding various levels of negative arousal and that arousal is transmitted to these individuals. The urgency content of alarm calls might even be transmitted to heterospecifics in certain cases (e.g. [55]).

Contagion of emotional arousal through other types of vocalizations than alarm calls has been investigated using various methods. In primates, Fichtel & Hammerschmidt [54] tested if emotional contagion occurs between squirrel monkeys (Saimiri sciureus) using mobbing calls in which parameters had been artificially increased or decreased, mimicking higher or lower arousal in emitters, respectively. Subjects showed a longer or shorter orienting response towards the loudspeaker after calls with increased or decreased frequencies were broadcast, respectively, compared to the corresponding unmanipulated calls. Similar results were obtained when the amplitude of the calls was manipulated [54]. In wild chimpanzees, Dezecache et al. [56] investigated the use of infrared thermography to measure skin temperature changes in individuals exposed to naturally occurring conspecific vocalizations. Their results showed a significant decrease in temperature in the nasal area when aversive vocalizations (particularly barks) occurred compared to the period preceding the vocalization, while neutral vocalizations induced a significant increase in temperature in the ear region. Although the mechanism behind the observed increase in ear temperature is not clear, nasal temperature is known to decrease with arousal (e.g. [57]). Therefore, these results suggest that aversive vocalizations triggered high-arousal levels in receivers, which likely results from emotional contagion.

Concerning non-primate species, Perez et al. [58] provided one of the clearest pieces of evidence for emotional contagion, using a physiological indicator of emotional arousal, corticosterone, in zebra finches (Taeniopygia guttata). In this study, the corticosterone concentrations of zebra finch females increased when hearing distance calls emitted by their pair mate following oral administration of exogenous corticosterone, compared to regular distance calls. Calls from unfamiliar males, however, did not have such an effect. In dogs, Quervel-Chaumette et al. [59] compared the responses of receivers to distress vocalizations (whines) of both familiar and unfamiliar conspecifics, and used artificial sounds that were acoustically similar to dog whines as control (short harmonic sounds with average F0). Their study revealed that dogs showed more stress-related behaviour during playbacks of whines compared to control sounds. Moreover, dogs showed comfort-offering behaviour toward a familiar partner, particularly following the familiar whine treatment.

Further examples of studies, in which vocalizations indicative of lower versus higher emotional arousal were broadcast and which observed responses in receivers that could have resulted from emotional contagion (stronger responses to higher-arousal vocalizations) include, among others, African elephants (Loxodonta africana) control rumbles (indicating lower arousal) versus rumbles in response to bees (indicating higher arousal) [60], and male green treefrog (Hyla cinerea) advertisement calls (indicating lower arousal) versus aggressive calls (indicating higher arousal) [61]. Overall, the evidence suggests that vocalizations can lead to contagion of negative arousal (contagion of arousal within negative situations, e.g. urgency, alarm, aversion and aggression). Further studies are needed to investigate if vocal contagion of arousal also occurs within positive contexts.

(ii) Contagion of emotional valence

Vocal contagion of emotional valence has only been tested in a few species, including rats (reviewed in [62,63]), dogs [64], kea parrots (Nestor notabilis [65]) and horses [29]. The most detailed investigation of contagion of valence has been conducted in rats (e.g. [62,63]). Adult rats produce two major types of ultrasound vocalizations (USVs), at 22 and 50 kHz. Ethological, pharmacological and brain stimulation studies have provided strong evidence demonstrating that these two types of USVs reflect the emitter's valence; 50 kHz USVs are mostly produced in appetitive situations, including reward anticipation, social play and tickling, while 22 kHz USVs are emitted in aversive situations, such as anticipation of punishment and social defeat [66]. Playback studies have shown that rats display signs of positive emotions (e.g. approach behaviour) when played 50 kHz USVs, and signs of negative emotions (e.g. freezing and avoidance behaviour) when played 22 kHz USVs. As a result, 50 kHz USVs have been suggested to function as affiliative and social-cooperating vocalizations, and be a primal form of laughter, while 22 kHz USVs could constitute warning or alarm vocalizations [62,63]. Further studies showed that, after hearing 50 kHz USVs, rats judged ambiguous cues as more similar to learned cues predicting a positive outcome (positive judgement bias), while they judged the same ambiguous cues as more similar to negative learned cues after hearing 22 kHz USVs (negative judgement bias) [67]. Playbacks of 22 kHz USVs also enhanced the acoustic startle reflex, confirming that these vocalizations induce anxiety-related negative affective states in receivers [68]. Therefore, rat USVs play an important function in contagion of emotional valence.

In dogs, Huber et al. [64] tested responses to positive and negative vocalizations of unfamiliar conspecifics (play barks and isolation whines), unfamiliar humans (non-speech sounds; laughing and crying), and non-emotional stimuli (abiotic and neutral, ‘non-emotional' heterospecific sounds). Dogs displayed more behaviours characteristics of negative arousal during playbacks of emotional compared to neutral sounds. They also approached their owner more after playbacks of positive than negative human sounds. Finally, dogs showed more behaviour characteristics of negative arousal when hearing negative compared to positive vocalizations, independently of the species. These results suggested that vocal contagion of emotional valence occurs both between dogs, and from humans to dogs [64].

One issue arising from the above-mentioned studies on rats and dogs is that, in order to test for contagion of valence, different vocalization types associated with positive and negative valence were broadcast (e.g. human laughing versus crying). As a result, the effect of the valence and the vocalization type on receivers' responses cannot be disentangled. Although the extensive evidence on rat vocal contagion of emotions clearly shows that different vocalization types can induce emotions of matched valence in receivers [62], I can see two issues potentially arising with the use of different vocalization types. First, distinct vocalization types are often associated with distinct functions (e.g. attracting conspecifics, signalling an upcoming aggression or danger [69]). Therefore, they can trigger different behaviours in receivers (e.g. attract or repel) without necessarily inducing different underlying emotions, and it might be difficult to differentiate between these behavioural responses and emotional responses arising from emotional contagion. This ambiguity between function and emotion is apparent in Schwing et al. [65], who showed that kea play calls induce more play behaviour than control vocalizations (other kea calls, heterospecific and abiotic sounds), suggesting that contagion of positive emotions occurred. However, because the authors did not report an increase in other behavioural indicators of positive emotions than play itself during play call playbacks, it is difficult to know if their results provide evidence for behavioural contagion (the spread of behaviour from one individual to the next, which might be unrelated to underlying emotions [70]), or for emotional contagion (the spread of emotions from one individual to the next). Such behavioural contagion through vocalization has been previously termed the ‘neighbour effect', and has also been shown to occur notably in common marmosets (Callithrix jacchus [71]) and chimpanzees [72] following playbacks of affiliative and agonistic vocalizations.

The second issue that can arise from the use of different vocalization types to study vocal contagion of emotions is as follows; emotions of different valence could be induced in receivers by two vocalization types, because of the different functions of these vocalizations and the associated meaning that animals extract from it, instead of because of the information on the emitter's emotional state that their acoustic structure encodes. For example, a food call could induce a positive emotion in receivers because of the meaning animals extract from it (the presence of food), independently of the emotional state of the emitter and the vocal expression of its internal state. This would be similar to the distinction between a positive emotion induced in humans by the meaning of a sentence (e.g. ‘there is food here'; speech information) versus the voice parameters (prosody) of a happy person (affective information [73]). Therefore, if the aim of an experiment is to test if emotions can be transmitted between individuals through acoustic cues independently of the function of vocalizations, different variants of the same vocalization type should ideally be used. In humans, both approaches have been used and shown that contagion can occur through prosodic cues [74], as well as by being exposed to specific types of non-verbal expression of emotions (e.g. laughter [40]).

Differentiating between the effect of vocalization-type function and emotional valence is a difficult task, because animals often produce distinct vocalization types in contexts of opposite valence. This problem is less obvious when studying contagion of emotional arousal, because changes in the emitter's arousal more often result in changes in the acoustic structure of a given vocalization type (e.g. increase in duration or F0 [16]) than to changes in vocalization type [16,26]. As a result, most studies on contagion of arousal used arousal-specific graded variants of one given vocalization type in their playback experiment (see the Contagion of emotional arousal section). When studying contagion of valence, ‘multi-context' or ‘functionally flexible' vocalizations [75], which are produced in both positive and negative contexts could be used. Such sounds seem to be produced by a wide range of species; examples include goat (Capra hircus) bleats, produced notably during anticipation for food (positive), food frustration (negative) and social isolation (negative) [6], and African elephant (Loxodonta Africana) rumbles, emitted during both affiliative (positive) and dominance (negative) interactions [76]. Other examples comprise vocalizations emitted in both play (positive) and aggression interactions or alarm situations (negative), such as dog growls [77], and dog and pigs barks [78,79]. This is what we attempted in horses, by playing back, as described in the Discrimination of emotional valence section, whinnies produced during social reunion (positive) and separation (negative). However, we did not find clear evidence for state matching between emitters and receivers, because horses did not display more behaviours indicating negative emotions (head high [8]) during playbacks of negative whinnies, nor more behaviours suggesting positive emotions (chewing motion [8]) during playbacks of positive whinnies [29]. Similar tests conducted in other species would thus be useful to show if contagion of valence can indeed occur within a given vocalization type and thus result from the information about emotional valence conveyed in these calls, more than their function.

To summarize, the evidence for vocal contagion of emotional arousal is stronger than for emotional valence, considering the number of studies published on this topic and the fact that studies investigating contagion of valence are often weakened by a confounding effect of vocalization type. Moreover, most of the studies on vocal contagion of valence did not control for the confounding effect of arousal. It is, therefore, not known if some of the above-mentioned results could in fact be explained by contagion of arousal instead of valence.

5. Conclusion

In this review, I showed that non-human animals have, similarly to humans [80], the ability to discriminate vocal expression of emotions. Moreover, vocalizations have the potential to influence the affective states of receivers through direct (e.g. acoustic startle reflex) or indirect effects (e.g. affective learning and learned affect [33]), which could result in state matching. The evidence described in this paper suggests that in many cases, from zebra finches to dogs, vocalizations do play a role in emotion contagion. Vocalizations could also have an important function in triggering appropriate responses from caretakers (e.g. [81,82]), and there is some evidence suggesting that they might even facilitate higher, cognitive empathic processes (e.g. close-proximity calls of Asian elephants (Elephas maximus) for consolation [83]). Therefore, vocalizations are an important channel to focus on when investigating emotional contagion and its evolution. Further studies using playback experiments in controlled environments, including knowledge of the emotional state of both the emitter during vocal production and the receiver upon hearing emotional vocalizations, would be very valuable to strengthen the evidence on vocal contagion of emotions. In particular, playback experiments using several emotion-specific variants of the same vocalization type, instead of different vocalization types would reveal the effect of vocal expression of emotions on receivers, independently of the effect of vocalization types and their associated function (or context/meaning). Finally, playback experiments that are aimed at investigating contagion of emotional arousal should ideally compare responses to various sounds indicating different levels of arousal but the same valence (e.g. mild versus strong urgency). Similarly, experiments that are aimed at studying contagion of emotional valence should compare responses to various sounds indicating opposite valence but similar arousal (e.g. food anticipation as positive versus food frustration as negative). Since the acoustic channel is the main channel of communication in humans (speech), the study of vocal contagion of emotions across species should be encouraged in order to decipher the evolution of empathic processes.

Data accessibility

This article has no additional data.

Competing interests

I declare I have no competing interests.

Funding

I thank the Swiss National Science Foundation for funding the project on vocal contagion of emotions mentioned in this review (project no. PZ00P3_148200).

Acknowledgements I thank Thierry Aubin and Nicolas Mathevon for initiating the writing of this review, and Roi Mandel as well as two anonymous reviewers for providing very useful comments on this manuscript.

Footnotes