Experiment 2 was designed in order to examine whether content alone or prosody alone was sufficient for driving the preference found in experiment 1. In experiment 2, the content from experiment 1 was reproduced but with reversed prosody such that the dog-related content was spoken with the prosody of ADS and vice versa. For simplicity, in all cases, DDS refers to stimuli with dog-directed prosody (with either dog- or adult-related content) and ADS refers to stimuli with adult-directed prosody (with either adult- or dog-related content). In experiment 2, we presented dogs with content-mismatched DDS (dog-directed prosody with adult-related content) and content-mismatched ADS (adult-directed prosody with dog-related content).

Methods

Study site and participants

In experiment 2, 32 dogs from Redhouse Boarding Kennels in York took part (16 females and 16 males; mean age 6 years ± 3.75). Data collection for this experiment was conducted 2 years after the first experiment (2016).

Stimuli

For experiment 2, uncompressed WAV files were recorded from two new female experimenters (age 20 and 21). The experimenters repeated the transcripts from experiment 1 with the opposing prosody, in order to produce content-mismatched DDS and ADS. All stimuli were still directed to an appropriate live audience (e.g. adult script was produced with dog prosody to a live dog; Irish setter) and processed as described in experiment 1.

For the stimuli used in experiment 2, some dog content was repeated in ADS, and some adult content was removed in DDS. This was in order to account for differences in word rate between naturalistic DDS and ADS. These alterations are indicated in Supplementary material. The amplitude of the speech segments was again equalised, and tracks were built as in experiment 1 (see Fig. 1).

Acoustic analysis of stimuli

To ensure the prosody of the content-mismatched DDS and ADS for experiment 2 was convincing, we compared the acoustic properties of these stimuli with the stimuli used in experiment 1. Mean, minimum and maximum pitch (FO) was measured (Table 3) in PRAAT (version 6.0.05). Pitch settings were 75-1200 Hz and continuous segments of speech with a continuous visible pitch line were selected, and the mean, min and max pitch in the segment was extracted using the ‘get pitch’ function. Pitch modulation was calculated as maxF0-minF0. Word rate was calculated as the number of words divided by the duration from the start of the first word to the end of the last word in a stimulus.

Table 3 Acoustic measurements of the different types of speech produced by each experimenter Full size table

Generalised linear mixed models (GLMMs) were used to assess the effect of prosody (dog-directed/adult-directed); content (dog/adult) and content–prosody matching (matched (experiment 1)/mismatched (experiment 2)) on the acoustic measurements of stimuli in experiments 1 and 2. These factors were entered as fixed factors in models with (1) mean pitch and (2) pitch modulation as DVs. In order to ensure we were comparing the pitch-related measures of the same words or phrases, for mean pitch and pitch modulation, measurements of each continuous segment of speech with a continuous visible pitch line that were available in both experiments were entered into the analyses. Each speech segment was numbered and included as a random factor along with speaker identity, in order to control for repeated sampling at these two levels (Warmelink et al. 2013). For word rate, the rate of each 10- or 15-s stimulus produced by each speaker was entered into analyses, with speaker identity entered as a random factor to control for repeated sampling of each speaker. As we only had a small number of data points for this GLMM (N = 16), we ran three separate models, each with a single fixed factor (prosody, content or prosody–content matching) to avoid overfitting the models.

GLMMs revealed that the content-matched (experiment 1) and content-mismatched stimuli (experiment 2) did not significantly differ in pitch, pitch modulation or word rate (Tables 3, 4), indicating that the content-mismatched stimuli were produced with prosody representative of natural dog-directed and adult-directed speech. In line with previous descriptions of the prosody of DDS, the pitch was significantly higher, the pitch modulation significantly greater and word rate significantly slower for stimuli produced with dog-directed prosody compared to adult-directed prosody (Burnham et al. 1998; Ben-Aderet et al. 2017; Tables 3, 4). Content did not significantly affect pitch modulation or word rate, but dog content was significantly higher pitched than adult content (Tables 3, 4).

Table 4 Results of GLMMs exploring the effect of prosody, content and content–prosody matching on pitch, pitch modulation and word rate Full size table

Design

As in experiment 1, this experiment used a within-subject design with all dogs hearing both DDS and ADS. Between-subject factors such as DDS speaker, DDS location and stimulus order were counterbalanced across trials.

Procedure

The procedure for this experiment was identical to that of experiment 1.

Interobserver Reliability

The primary observer (AB) coded 100% of videos. Two trained observers each coded 50% of the videos (N = 32/32 trials total). The primary observer had high agreement with both secondary coders, who also had high agreement with each other across all measurements (Spearman’s R > 0.90, p < 0.001 for all comparisons).

A third observer, who was blind to the hypotheses of the experiment, also coded 22% of the videos (N = 7/32 trials total) with the sound turned off so that they were unaware which speech type was heard by the dog. There was high agreement with the primary coder for looking time (R = 0.93, p < 0.001) and for proximity preference (R = 0.88, p < 0.001).

Statistical analysis

As above, attentive and affiliative preference was evaluated using mixed ANOVAs with the fixed within-subject factor speech prosody (DDS/ADS), between-subject factors DDS identity (e.g. experimenter 1/experimenter 2) and DDS location (right/left). All assumptions were tested and met.

Experiment 2: results

Looking preference

For content-mismatched DDS, 3 trials were removed due to equipment failure and the following analysis is based on n = 29. A mixed ANOVA revealed there was no significant preference for DDS when content was incongruent with prosody (Fig. 5; Table 5). During the control period, there was a main effect of identity, with dogs preferring to look towards experimenter 3 compared to experimenter 4 (Table 5). There was also an interaction of speech type and identity for total looking time. To explore the nature of the interaction between speech type and identity, four independent samples t tests with Bonferroni-corrected alpha (p < 0.0125) were conducted. Firstly, at the level of DDS, there was a significant main effect of speaker identity, with dogs preferring the speech of experimenter 3 over experimenter 4 (t (27) = 3.08, p = 0.005). However, at the level of ADS, there was no significant effect of speaker identity (t (27) = 0.82, p = 0.419). At the level of each speaker, there was no preference for the DDS of experimenter 3 compared with her ADS (t (27) = 0.77, p = 0.450), and the same was true for experimenter 4 (t (27) = − 1.50, p = 0.146).

Fig. 5 Time spent looking towards content-mismatched DDS and ADS during each phase, where error bars represent 1 standard error of the mean. n.s denotes non-significant comparisons as revealed by mixed ANOVAs (total: Table 5: other time segments: Table S4) Full size image

Table 5 Results of between-subject ANOVA (1,25) for the control silence and a mixed ANOVA with degrees of freedom (1,25) comparing main effects and interactions for looking times towards content-mismatched DDS and ADS Full size table

Proximity preference

This analysis is based on N = 30 following equipment failures. For content-mismatched stimuli, dogs spent more time, on average, in proximity to the ADS location as illustrated in Fig. 6. However, a mixed ANOVA revealed that this result was non-significant (see Table 6).

Fig. 6 A graph to show mean time spent in proximity with each speaker (seconds), for content-mismatched DDS and ADS. Error bars represent one standard error of the mean Full size image

Table 6 Results of a mixed ANOVA with degrees of freedom (1,26) comparing the time spent near DDS and ADS speakers for content-mismatched speech Full size table

To explore whether the failure to find a significant preference for either type of speech was likely due to reduced power associated with the slightly smaller sample size in experiment 2 compared to experiment 1, we considered effect sizes and conducted power analyses using G*Power (version 3.1.9.2). The preference for attending to DDS in experiment 1 was associated with a large effect size (η2 = 0.563), yet the same comparison in experiment 2 yielded a very small effect size (η2 < 0.001). An a priori power analysis for looking time in experiment 2 indicated that to find a similar effect size based on partial η2 of 0.56, with power of 0.80 and an alpha level of 0.05 for the within-subject comparison of speech type, 6 participants would have been needed, which we exceeded with our 29 participants in experiment 2. The proximity preference for the DDS speaker in experiment 1 was associated with a medium effect size (η2 < 0.156), yet the same comparison in experiment 2 yielded a small effect size (η2 = 0.038). An a priori power analysis for proximity duration in experiment 2 indicated that to find a similar effect size based on partial η2 of 0.16, with power of 0.80 and an alpha level of 0.05 for the within-subject comparison of speech type, 24 participants would have been needed, which we exceeded with our 30 participants in experiment 2. Together the effect sizes and power analysis indicate that experiment 2 had sufficient power to find differences similar to those found in experiment 1, had they existed, and therefore, we can be relatively confident in this null result.

Discussion

The results from experiment 2 suggest that there is no significant difference in dogs’ attention or proximity preference to speakers of DDS or ADS where content and prosody did not match. This suggests that neither content, nor prosody, is solely responsible for the preference for DDS shown in experiment 1. As the same scripts were used in both experiments, this result also highlights that the preference shown in experiment 1 could not be explained by the use of specific words in the content of the original stimuli, such as ‘walk’ or ‘dog’, for example. If this were the case, we would have observed a preference for content-mismatched ADS, which not only contained the specific dog-related words used in experiment 1, but more repetitions of them (see methods).

In order to explore alternative explanations for these null results we first considered if the difficulty of producing these content-mismatched stimuli had resulted in poor examples of DDS and ADS prosody being produced. The acoustic analysis of the stimuli, however, illustrates that the content-mismatched stimuli followed the same patterns of acoustic properties as the naturalistic DDS of experiment 1. This supports the use of these stimuli and highlights that the null result found in this experiment is unlikely to be due to failures in producing authentic DDS or ADS when the content is reversed. Second, although a broadly comparable number of subjects were used in experiments 1 and 2, it is possible that the slightly smaller N available in experiment 2 (33 vs 29 Looking duration; 34 vs 30 proximity duration), left experiment 2 with slightly less power to detect differences compared to experiment 1. However, examination of effect sizes indicates that while the naturalistic speech in experiment 1 elicited large effect size (η2 = 0.563), effect sizes obtained with the reversed stimuli were extremely small (η2 < 0.001). Power analyses confirmed that we had sufficient sample sizes in experiment 2 to detect differences similar to those found in experiment 1. We are therefore confident that the null result in experiment 2 was not due to lack of power.

In experiment 2 a significant interaction between speech type and experimenter revealed that experimenter 3’s DDS was more effective at eliciting attention than experimenter 4’s DDS. This effect is likely mediated by what seemed to be an a priori preference for experimenter 1, which resulted in dogs looking significantly longer at this experimenter in the control period before any speech was produced. It is not clear whether visual or scent characteristics drove this preference, although scent seems unlikely as the preference did not remain in the post-stimulus proximity to experimenters where an attractive scent could have been actively explored. It is interesting that dogs seemed to have an immediate preference for one experimenter and this may have enhanced the efficacy of an experimenter’s dog-directed prosody. It is, however, important to note that the preferred experimenter’s DDS was still not significantly more effective in attracting dogs’ attention than her ADS. Indeed post hoc analyses of the interaction term at the level of each speaker confirmed the main findings that the different types of speech did not elicit significantly different behaviour from the dogs.