Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners.

1. Introduction

When talking to their babies, human adults use a special speech register characterized by higher and more variable pitch, slower tempo and clearer articulation of vowels than in speech addressed to adults [1–3]. This ‘infant-directed speech’ has positive aspects in engaging and maintaining attention of babies and facilitating their social interactions with caregivers: infants as young as seven weeks old show a preference for infant-directed speech over adult-directed speech [4]. Accordingly, infant-directed speech has been shown to increase cerebral activity more than adult-directed speech [5], meaning that infants are more engaged in what is being said to them when they listen to this special speech register. Infant-directed speech has also been hypothetized to facilitate language learning [6] by supporting the construction of phonetic and vowel categories [7,8], the clearer production of consonants [3] and the acquisition of new words [9]. This role in language learning is consistent with the decrease in the use and acoustic specificity of infant-directed speech that follows the development of language skills during the first year of the child [10–12]. At a proximal level, these dynamic changes could be explained by modifications of the baby's reactions to speech. As the baby grows up, he/she becomes more reactive to caregivers' solicitation and responds more specifically to meaningful sentences [13]. Promoting interaction thus becomes easier, which in return lessens the use of infant-directed speech. Another proximal explanation of the use of infant-directed speech could be that the morphological features of younger babies (large head, small nose and mouth = the ‘baby schema’ described by Konrad Lorenz [14,15]) elicit infant-directed speech as part of caretaking behaviour. As these juvenile features become less prominent, their elicitation of infant-directed speech is expected to decrease. Thus, infant-directed speech appears to function as a communication signal that has evolved to accompany the cognitive development of babies and that may depend on proximate mechanisms that are both static (the ‘baby schema’) and dynamic (babies' attention response).

Dogs have been in close relationships with humans for thousands of years and this intimate proximity is reflected in many aspects of mutual understanding and empathy [16–21]. While more than 80% of pet owners refer to themselves as ‘pet-parents’ [22], adult women show similar brain activation patterns when presented with the picture of their dog and their own children [23]. Many dogs react to human vocal or gestural signals, and even feelings [20,24]. Although dogs clearly do not possess the language ability, humans do change their speech patterns when talking to dogs using what is known as pet-directed speech, which shares similar structural properties with infant-directed speech (e.g. high-pitch register, slower tempo [25,26]).

Despite widespread interest in understanding the nature of the human–dog relationship, the proximate and ultimate factors that promote the use of pet-directed speech by human speakers remain unknown. The striking parallel between pet-directed speech and infant-directed speech may have different origins. Pet-directed speech may indeed constitute a spontaneous response of human speakers to juvenile characteristics shared by vertebrates' newborns (the ‘baby schema’ hypothesis), or it may represent speakers' attempt at engaging an interaction with a non-verbal being (the ‘learning’ hypothesis). The ‘baby schema’ hypothesis predicts that humans should restrict the use of pet-directed speech to young puppies. By contrast, the ‘learning’ hypothesis predicts that speakers should continue to use dog-directed speech with adult dogs as they do not develop the ability of language. Furthermore, the functional value of pet-directed speech remains unknown, as, to our knowledge, the assumption that dogs respond more to pet-directed speech than to normal speech has not yet been tested.

The aim of this study was thus to investigate whether the age of the dog receiver modulates the use and the properties of pet-directed speech. We then assessed the functional value of pet-directed speech by testing if it engages dogs' attention better than speech directed to human adults. To achieve this, we first recorded human speakers speaking in front of dogs' pictures and analysed their vocal features. Second, we performed playback experiments on puppies and adult dogs to test their reaction to pet-directed speech versus to speech directed to human adults.

2. Material and methods

(a) Human speech recording and analysis

We selected 90 images of dogs' faces from the Internet with 30 dogs classified as ‘puppies’ (less than 1 year), 30 dogs classified as ‘adults’ (1–8 years old) and 30 dogs classified as ‘old’ (more than 8 years), from a variety of dog breeds (the dogs' age and breeds were checked independently by two veterinarians; electronic supplementary material, table S1). Each human speaker (n = 30 women, aged 17–55) was then recorded (Zoom H4n digital recorder; sampling frequency = 44 100 Hz) speaking in front of three of these pictures including one of a puppy, one of an adult dog and one of an old dog (the pictures were presented using a smartpad). The set of three pictures differed between each recorded person. The images were successively presented to the recorded subject, in a balanced order between women (10 women were presented with the puppy first, 10 with the adult dog first and 10 with the old dog first). We also recorded the adult's voice in a control situation, without any dog picture, where the speaker was asked to speak to the researcher performing the recordings. This speech sequence was considered as human-directed speech. This control was obtained before the presentation of the set of dog pictures for 15 participants and after for the others. During each recording, the adult repeated the same sentence, which was presented on the smartpad screen together with the dog's picture or in the absence of picture (control condition): ‘Hi! Hello cutie! Who's a good boy? Come here! Good boy! Yes! Come here sweetie pie! What a good boy!’. For each participant, we thus obtained a set of four recordings: ‘puppy-directed’, ‘adult dog-directed’, ‘old dog-directed’ and ‘adult human-directed’ (control) speech sequences of identical verbal content. Our recording procedure ensured that each speaker emitted exactly the same speech sequence in each recording condition. Although recording the participants during an interaction with a real dog might have increased the ecological validity of our observations, the dynamic nature of the interaction would have inevitably led to variability in the uttered sentences, rendering the comparison between the acoustic features much more challenging.

Next, we performed acoustic analyses using PRAAT [27], and measured the following parameters (see electronic supplementary material, Methods): %voiced (percentage of the signal that is characterized by a detectable pitch), duration (total duration of the recording), mean F0, max F0, min F0 (respectively the mean, maximum and minimum fundamental frequency), F0CV (coefficient of variation of F0), inflex25 (minor intonation events), inflex2 (major intonation events), intCV (variability of the speech sequence's intensity), harm (harmonicity), jitter, shimmer, the first five formant frequencies of the speech sequence (F1, F2, F3, F4, F5).

(b) Playback experiments to dogs

We performed playbacks to domestic dogs Canis familiaris to test (i) whether puppy-directed speech is more effective than human-directed speech in engaging a dog's attention, and if this effectiveness varies with dog's age, and (ii) whether puppy-directed speech is more effective than adult dog-directed speech. The experiments were performed at the Bideawee animal shelter in Manhattan, NY (USA), between December 2015 and March 2016. The experimenter (T.B.-A.) was volunteering in the shelter at the time of the study and spent several days a week with the participant dogs. All the tested dogs had a positive relationship with her prior to the tests. The experiments were conducted in a dedicated, spacious (3 × 4 m), room. All the tested dogs appeared comfortable in the testing situation (e.g. they mainly spent their time exploring the room and did not display behaviours indicative of distress or suggesting that they wanted to leave the room).

In the first experiment, each dog (n = 20 with 10 puppies aged two to five months and 10 adult dogs aged 13–48 months, from the Bideawee shelter; see electronic supplementary material, table S2 for details) was tested during two successive playback sessions with: (i) an approximately 30 s sequence of puppy-directed speech and (ii) an approximately 30 s sequence of a human-directed speech (control). These two sequences came from our recording data bank (see §2a) and were made of three successive renditions of the sentence: ‘Hi! Hello cutie! Who's a good boy? Come here! Good boy! Yes! Come here sweetie pie! What a good boy!’. The playback sequences were recorded from the same human speaker for each dog, but each dog was tested with a different speaker. The two playback trials were separated by 1–2 min of silence, as the second playback was conducted once the dog had stopped displaying interest towards the speaker for at least 1 min. Five puppies and five adult dogs heard the puppy-directed speech recording first while the other individuals heard the human-directed speech (control) signal first.

Because adult dogs from an animal shelter may have an unknown history of negative interactions with humans, we performed an additional set of trials on a sample of adult dogs kept as family pets and without history of re-homing (see electronic supplementary material, table S2 for details). These dogs were tested using the same experimental set-up as for the shelter dogs (design and size −3.5 × 4 m of the experimental room, playback apparatus and protocol) and performed at the ENES Laboratory, Saint-Etienne (France), in September–October 2016. To ensure familiarity with the local language, we used the following script: ‘Alors le chien! Comment ça va le doudou? C'est qui le bon chien? Viens ici mon chien! Ah il est gentil le chien. Ca c'est un gentil chien!’ recorded from 10 French native speaking female participants using the exact same protocol and material as with the US participants.

In the second experiment, each dog (n = 10 puppies, aged three to eight months, different individuals from those tested in the first experiment, see electronic supplementary material, table S2 for details) was tested during two successive playback sessions with: (i) an approximately 30 s sequence of puppy-directed speech and (ii) an approximately 30 s sequence of adult dog-directed speech. These two sequences were derived from our recording data bank and were different for each tested dog. The two playback sessions were separated by 1–2 min of silence. Five individuals heard the puppy-directed speech first while the other five individuals heard the adult dog-directed speech sequence first.

The experimental signals were played back through a Bose SoundLink Mini Bluetooth speaker II. This high-quality loudspeaker allows a faithful reproduction of human voice (see electronic supplementary material, figure S1 for a comparison between the original and played back signals). The loudspeaker was positioned on the ground, near a corner and facing the centre of the room. The experimenter remained motionless, in the corner of the room opposite to where the loudspeaker was, and not facing the dog in order to avoid conscious or unconscious cueing. A video camera was placed to record the tested dog's reaction to the playback. The dog's response was assessed using 11 behavioural measurements (see electronic supplementary material, Methods). Instead of separately analysing the dependent behavioural measures, we performed a principal component analysis and retained a single composite score (PC1), separately for each of the two experiments [28] (electronic supplementary material, Methods).

3. Results

(a) Human speakers use dog-directed speech with dogs of all ages

The analysis of recordings showed that dog-directed speech differs from control speech in both its spectral and temporal dimensions: 11 out of the 17 measured acoustic features were significantly affected by recording conditions (electronic supplementary material, table S3). Specifically, dog-directed speech was higher-pitched, with more pitch variation over time. The periodic quality of the signal was also affected: harmonicity—the ratio of harmonics to noise in the signal—was higher in dog-directed speech sequences (figure 1; electronic supplementary material, sound S1). Although human speakers modified their speech in front of dogs of all ages, post hoc comparisons between recording conditions underlined that the distinctive pitch used in pet-directed speech was enhanced when speaking to puppies (electronic supplementary material, table S3): in this condition speakers increased their mean pitch by 21% on average compared with normal speech (compared with 11% and 13% average increases when they spoke to adult and to old dogs, respectively). Figure 1. Influence of recording condition on speech quality. X-axis = recording conditions (directed speech to human adult, puppy, adult and old dog respectively). Y-axis = mean pitch of the recorded speech sequence. Each dot represents a single recording of the same speech sequence from different human adult speakers (each speaker was recorded in each of the four recording conditions; see main text for description of the recorded speech sequence). The size of dots is proportional to the degree of acoustic periodicity (ratio of harmonics to noise in the signal) of the recorded speech sequence. Violin plots show the distribution's density and dots are jittered horizontally for better visualization. (Online version in colour.)

(b) Only puppies are highly responsive to dog-directed speech

Results of the first series of playback experiments showed that speech quality, dog age, playback order as well as the interaction between speech quality and dog age were significant predictors of dogs' response to speech sequences (table 1 and figure 2). As a result, nine out of the 10 tested puppies responded more to puppy-directed speech than to human-directed speech, by reacting more quickly, looking more often at the loudspeaker and approaching it closer and for longer periods (Tukey post hoc test on PC1 behavioural score: Z = 3.34, p = 0.0009, N = 10; electronic supplementary material, table S4 for loadings of behavioural variables on PC scores). Moreover, results of the second series of playback experiments showed that puppies did not respond significantly more to puppy-directed than to adult dog-directed speech (GLM: χ2 = 0.44, d.f. = 1, p = 0.509), demonstrating that both types of dog-directed speech have similar stimulating effects. Figure 2. Dogs' behavioural reaction to playback of speech sequences. X-axis = dogs' age in months (logarithmic scale); Y-axis = dogs' behavioural reaction (represented as a principal component score PC1 calculated from 11 different behaviours; higher values mean stronger reaction to the playback signal). Each dot represents the result of one playback test. Each dog has been tested with two different speech qualities (red squares: reaction to puppy-directed speech; blue dots: reaction to human-directed speech). Solid lines = loess regression curves (degree of smoothing = 1; degree of polynomial = 1); grey shaded areas = confidence intervals. (Online version in colour.)

Table 1.Effect of speech quality (human-directed versus puppy-directed), dogs' age and order of playback on dogs' behavioural reaction to speech sequences. Significant p-values are given in italics. Collapse term estimate s.e. χ2 d.f. p-value speech quality −1.198 1.517 15.68 4 0.0035 dog's age −1.860 1.938 29.79 4 <0.0001 playback order −2.357 0.711 18.96 4 0.0008 speech × age 4.189 2.606 12.22 2 0.0022 speech × order 0.700 0.967 0.62 2 0.733 order × age 1.621 1.226 2.50 2 0.287 speech × order × age −0.976 1.660 0.38 1 0.536

In the first series of playback experiments, adult dogs responded less strongly to dog-directed speech sequences than puppies did (Tukey post hoc test: Z = 6.45, p < 0.001, N = 20 adult dogs and 10 puppies). Moreover, the behavioural response of adult dogs did not differ significantly between the two speech types, with 11 out of 20 individuals responding more to the dog-directed speech and the nine others responding more to the human-directed speech (Tukey post hoc test on PC1 behavioural score: Z = −0.37, p = 0.708, N = 20). The origin (shelter or family) of the tested dogs did not influence their behavioural responses (χ2 = 0.45, d.f. = 1, p = 0.500, GLM with dependent variable = adult dog's behavioural reaction, fixed factors = speech quality, playback order and dog origin, random effect = dog identity).

(c) Speech pitch is an important factor driving puppy behavioural response

As shown by the above acoustic analyses, human- versus dog-directed speech types differed with regards to several acoustic features. Assessing the impact of each of these features on dogs' behavioural reaction to playback reveals that there is a strong interaction between the effect of the mean pitch of the speech sequence and the effect of dog age (analysis restricted to dogs tested with English-spoken sentences: LME on PC1 scores of the first series of playback experiments, with playback order and interaction between pitch and dog's age as fixed effects and dog identity as random effect: χ2 = 10.4, d.f. = 1, p = 0.0012; figure 3; see also electronic supplementary material, table S5 for interaction effects between other acoustic features and dog's age). Puppies' reactions were strongly influenced by the average pitch of the playback speech sequence: there was a highly significant effect of this acoustic feature on the level of behavioural reaction (LME on PC1 score of puppies with mean pitch and playback order as fixed effects and dog identity as a random factor: χ2 = 11.0, d.f. = 1, p < 0.001; figure 3). Conversely, the behavioural reaction of adult dogs to the playback was not significantly influenced by the pitch of speech sequence (χ2 = 0.64, d.f. = 1, p = 0.422; figure 3). Figure 3. Influence of speech pitch on dogs' behavioural reaction to playback. X-axis = mean pitch of the played back sequence; Y-axis = dogs' behavioural reaction represented as a principal component score PC1 (higher values mean stronger reaction to the playback signal). Green triangles: reactions of puppies (aged two to five months); brown lozenges: reactions of adult dogs (aged 13–48 months). Solid lines = linear fits; grey shaded areas = confidence intervals. (Online version in colour.)

Two additional acoustic features significantly correlated with puppies' reaction to playback, albeit to a lesser extent than pitch: the percentage of the signal that is characterized by a detectable pitch (%voiced) and the harmonicity (harm) (electronic supplementary material, table S6 and figure S2).

4. Discussion

By showing that human speakers employ dog-directed speech to communicate with dogs of all ages, this study suggests that this particular register of speech is used to engage interaction with a non-speaking, rather than just a juvenile listener. Yet dog-directed speech appeared to be modulated as expected by the ‘baby schema’ hypothesis [14,15], as specific acoustic traits were further exaggerated when speaking to a puppy. At the receiver end, our playback experiments constitute the first demonstration that dog-directed speech functions to engage the attention of puppies, which are specifically sensitive to acoustic parameters as a higher mean pitch and a higher level of harmonicity. This speech pattern thus constitutes a functional signal promoting human–puppy interaction. Conversely, adult dogs displayed no significantly different preference for dog-directed speech, suggesting that this register loses its functional value in adult dogs.

The analysis of the acoustic structure of recorded sentences underlines differences between dog-directed and normal speech. In line with previous studies [26], we found that dog-directed speech is characterized by a higher pitch and a higher degree of harmonicity than normal speech. The fact that the visual presentation of dogs of all ages led human speakers to modify their speech pattern is consistent with the hypothesis that dog-directed speech functions to facilitate interacting with an animal expected to be more sensitive to the prosodic, rather than to the verbal content of speech. Although caregivers progressively stop using infant-directed speech when infants start demonstrating syntactic and words understanding as they acquire language ability [29], human speakers continue using dog-directed speech with adult dogs that do not acquire language abilities. Pet-directed speech is thus in accordance with the ‘hyperspeech’ hypothesis which states that speakers use speech patterns optimized for intelligibility [30]. In the case of dogs, this strategy may be efficient to promote word learning, an ability well demonstrated in dogs [31].

The comparison of the acoustic structure between puppy-directed, adult dog-directed and old dog-directed speech recordings reveals that the age of the dog does weakly modulate the speech pattern: human speakers further raised the pitch of their voice when speaking to puppies than when speaking to adult and old dogs. The morphological cues typical of puppies (the ‘baby schema’) may thus constitute a reinforcing releaser. This effect of the ‘baby schema’ could be further tested by assessing if people also change their speech pattern depending on the neotenic level of adult dogs, which varies among breeds [32].

As shown by playback experiments, puppies reacted strongly to dog-directed speech, demonstrating the functional value of this speech pattern. Whether this interspecific dimension is innate or acquired through learning remains an open question. It is indeed well established that acoustic signals coding for emotional states share similar acoustic features across mammalian species [33]: although interspecific communication may suffer from limitations [34–36], emotion-dependent similarities may derive from shared, ancestral production constraints or reflect convergent evolution in response to common selection pressures [37]. Dogs and wolves emit high-pitched tonal vocalizations in greeting contexts, between adults or between cubs, and as a solicitation for food or care [38], and it is likely that puppies are innately receptive to any high-pitched signals with a pronounced harmonicity. It is also likely that this innate preference for pet-directed speech has been promoted by artificial selection: when choosing their pet within a litter, people will usually prefer puppies demonstrating higher levels of responsiveness to human solicitation [39]. Yet, this innate receptivity may also be reinforced by learning. The puppies we tested in our experiments had significant experience with humans and were used to interacting positively with people who used dog-directed speech. It is indeed well established that dogs have a well-developed ability to associate prosodic cues of human speech with specific contexts [40,41].

The absence of preferential reactivity to dog-directed speech in adult dogs was rather unexpected, as our production experiments suggest that old dogs are also exposed to humans using this speech pattern. This observation could be linked to an overall reduced propensity in adult dogs to respond to human playful signals. Specifically, in the absence of other communication cues (e.g. gestural signals), adult dogs could habituate rapidly to speech utterances from unknown persons, and thus rapidly ignore their vocal solicitation. Adult dogs are indeed known to react preferentially to their owner rather than to unfamiliar persons, although this depends on the context [42]. While puppies may react to any unknown speaker using pet-directed speech, older dogs may need additional cues to respond in unfamiliar contexts. Alternatively, this observation may suggest that pet-directed speech exploits perceptual biases which are present in puppies but not in adult dogs.

A potential limitation of our study arises from the fact that, in order to standardize the content of the dog-directed speech utterances (see Material and methods), we asked participants to read a script in front of pictures, which may have limited the extent of some features specific of dog-directed speech. Any such effect would however have been limited as we report clear differences between dog- and human-directed speech, both at the level of the acoustic properties, and at the level of the behavioural reaction that these utterances trigger in dogs. To address this potential limitation, future investigations could use stimuli recorded in a more realistic and interactive set-up, with participants asked to speak to ‘real’ dogs instead of pictures.

In conclusion, while pet-directed speech appears to have some functional value in the context of human–puppy interaction, human speakers also use this speech format when speaking to older dogs, in spite of the absence of specific reactivity. This observation is consistent with the hypothesis that pet-directed speech is also a spontaneous attempt to get the attention of non-verbal, rather than just juvenile listeners. Dogs share many aspects of their ‘social competence’ with humans [43], which causes dogs to appear ‘infant-like’ or ‘human-like’. This study suggests that dogs may appear as mostly non-verbal companions to humans who consequently modify their speech features as they do when speaking to young infants. Such a speaking strategy seems to be employed in other contexts where the speaker feels, consciously or unconsciously, that the listener may not fully master language or has difficulty in speech intelligibility, such as during interactions with elderly people [44], or when speaking to a linguistic foreigner [45].

Ethics

All procedures described in this manuscript were conducted in accordance with appropriate USA and French national guidelines, permits and regulations, and the guidelines for the treatment of animals in behavioural research and teaching of the Association for the Study of Animal Behaviour (ASAB). Ethical approval for the playback experiments on dogs was granted by the Hunter College IACUC committee which waived the review requirement for this project. Experiments in France were done under Approval no. C42-218-0901 (ENES lab agreement, Direction Départementale de la Protection des Populations, Préfecture du Rhône).

Data accessibility

Additional methods, two figures, six tables and one sound recording are included as the electronic supplementary material.

Authors' contributions

T.B.-A., D.R. and N.M. designed the study; T.B.-A. and M.G.-A. performed the recordings and the playback experiments. All authors agree to be held accountable for the work performed.

Competing interests

The authors have no competing interests.

Funding

This study was supported by the Hunter College, City University of New York and the University of Lyon/Saint-Etienne.

Acknowledgements We would like to thank all participants, the Bideawee animal shelter, the owners of the dogs, Nicolas Boyer and Solveig Mouterde. N.M. was invited professor at Hunter College during part of this research project and is grateful to Chris Braun and Mark Hauber for their kind support.

Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3660728.