The production and discrimination of emotional signals is a highly significant component of social living in mammals, as this allows for the efficient transmission of social intentions and the sharing of environmental information1,2. Emotion and arousal can be encoded through various acoustic features during vocal production, including the fundamental frequency and its harmonics (which determine pitch), as well as formant frequencies (determining timbre) and amplitude (perceived as loudness), thus providing a complex and multifaceted signal1,3. Vocalisations can also encode information on the signaller’s age, gender, and identity4, and so can provide receivers with a wide range of information. Considering the importance of vocalisations in promoting effective communication, species with frequent human contact may benefit from attending to the social and emotional information within human vocalisations, and from adjusting their social interactions with humans accordingly.

The emotional cues contained within vocalisations have the potential to follow similar acoustic rules across human and nonhuman species (the motivational-structural rules hypothesis5; sound symbolism6). Harsh, low-frequency sounds are typically used in threatening contexts whilst higher, relatively pure-tone frequencies tend to be used in appeasement or affiliative contexts5,7. It is suggested that these variations in acoustic structure may also be used ritualistically to mimic differences in body size and therefore alter the perceived level of threat posed by the signaller1,8. Lower fundamental frequencies can generate the impression of a larger body size5, along with lower vocal tract resonances (formants), which suggest a longer vocal tract6. Moreover, emotional states can directly alter the sound produced in the larynx due to changes in the rate of respiration and in the tension of the vocal folds1. The facial expression associated with the affective state can also influence the sound, through its effect on mouth shape and consequent filtering1,3,9,10. Such fundamental similarities in the form of affective vocalisations across species may facilitate interspecific communication of emotion.

For domestic animals it would be particularly advantageous to discriminate between positive and negative affect in humans. Numerous studies have demonstrated that domestic dogs are able to discriminate the emotional content of human voices in a range of contexts. Using a cross-modal emotion perception paradigm, dogs were found to associate positive and negative human emotional vocalisations with the corresponding facial expressions11 (but see12). In addition dogs are more likely to avoid contexts involving a scolding human versus dehumanised vocalisations and control conditions regardless of the signaller’s gender13 and to obey pointing commands more successfully when issued in a high-pitched, friendly voice compared with a low-pitched, imperative voice14. Furthermore, neurological fMRI research reveals different patterns of neural activity in dogs when hearing high-pitched praise versus neutral voices15. However, very few studies have investigated such abilities in other domestic species, and further, recent empirical evidence has suggested that horses do not differentiate between a harsh and a soothing voice when being trained to cross a novel bridge16. The authors suggest that the horses may not have attended to the voices due to the potentially more salient training cue of pressure release on the halter that was used as an additional signal in the experimental paradigm. New paradigms are therefore needed to fully explore horses’ abilities to discern emotionally relevant cues in human vocalisations.

Despite the lack of evidence to date, horses are potentially good candidates for having abilities relevant to discriminating between vocally expressed emotions in humans. Horses are sensitive to cues of affective state in conspecific vocalisations17 (see also18) and therefore may be predisposed to attend to emotional cues embedded in vocalisations generally. They have also been shown to discriminate socially relevant cues in human voices, such as voice identity characteristics during individual recognition19. Moreover, horses can distinguish human emotional states through other modalities such as through facial expression20, and are sensitive to changes in human anxiety levels21. As humans use their voices extensively during direct interaction with horses in riding, training, and groundwork it is likely that horses would also benefit from discriminating between different emotions expressed in human voices, as this would allow them to better predict the consequences of their interactions with humans.

In this study we used playback of auditory stimuli to investigate whether or not horses respond differently to positive and negative emotions displayed in human vocalisations. We presented horses with male or female human nonverbal vocalisations characterised as either happy (laughter) or angry (growling). Each horse was presented with one positive and one negative vocalisation of either a male or female human, in tests separated by at least one week. We predicted that there would be more negative responses towards negative vocalisations (more vigilance and freeze behaviour, avoidance, displacement behaviours, and left ear/right hemisphere biases) and more positive responses towards positive vocalisations (more approach behaviour and right ear/left hemisphere biases). In addition we predicted that horses would respond more negatively towards male stimuli versus female stimuli due to the relatively lower pitch and formant frequencies that are characteristic of male voices10.

Thirty-two horses took part in two trials each, one of which presented a negative and one a positive human vocalisation. Each horse received either male or female stimuli but not both. Trials were separated by at least one week (M = 18.57 days, SD = 8.26, max = 29 days). Emotions and stimuli were counterbalanced equally between horses and across trials. Stimuli were played through a MIPRO MA707 battery powered speaker connected to a Macbook Pro, which were placed 7 m outside a fenced riding arena and concealed within wooded vegetation. Horses were held parallel to the speaker 8 m from the fence (a total of 15 m from the speaker) at a line marked with a familiar jump pole (Fig. 1). During trials the horse was initially held for 2 min in the test position (perpendicular to the jump pole and directly facing the hidden speaker) to get used to the experimental setup. Following this lag period the stimulus was played once and then repeated after 10 s of silence. After the stimulus presentation the horse was held in the test position for a final 2 min. See Method for full details.