Materials & Methods Behavioural observations were conducted on individuals from the Bompusa community of wild bonobos by ZC from October 2013 to March 2014 at Lui Kotale (managed by the Max Planck Institute for Evolutionary Anthropology, Leipzig) located near the Salonga National Park, in DR Congo. At this time, the fully habituated and fully identified community consisted of twelve adult females, two subadult females, five adult males two subadult males and eighteen immatures (juveniles and infants). Animal focal animal sampling (15 min) was conducted on all males and all adult females throughout the study period, amounting to an average of 11 focal hours per individual. We recorded vocalisations produced by focal individuals across a variety of behavioural contexts, as they occurred. We recorded vocalizations at distances of 7–20 m using a Sennheiser MKH816T directional microphone and Marantz PMD661 solid-state recorder (Microphone frequency response: 50–20,000 Hz, ±3.5 dB; sampling rate of 44.1 kHz, 16 bits accuracy). Our acoustic analyses focussed on the bonobo peep, a high-frequency, closed mouth vocalisation (approx. 2,200 Hz, (de Waal, 1988), see Fig. 1), short in duration (approx. 0.1 sec (de Waal, 1988; Clay & Zuberbühler, 2009)) and characterised by a simple, flat acoustic form composed of several harmonics that are generally un-modulated. In order to analyse the peep structure across different contexts, we first identified the most vocally active individuals from focal recordings, identifying those that produced vocalisations in at least two feeding contexts and two non-feeding contexts, which resulted in a sample of eight individuals (four adult males, one subadult male and three adult females). Behavioural contexts were mutually exclusive, i.e., peeps produced holding or consuming food while travelling or resting were excluded. In order to compare the acoustic structure of peeps in different contexts we compared the acoustic structure of peeps during the contexts that generated the most peep vocalisations per individual. We collected peeps produced at the onset of each behavioural context (i.e., at the onset of food discovery or travel). Because the number of peeps produced at the vocal sequence onset varied across different calling events, we analysed up the first three consecutive peeps produced at the beginning of a vocal event by the same individual as this was the typically number of peeps produced in a consecutive sequence. We calculated mean scores per parameter across the three peeps to standardise across the same calling event. In contrast to an analysis focussing on discrete emotional states, we were interested in first establishing whether bonobo vocalisations may be used flexibly different valence contexts (positive-neutral-negative), as has been demonstrated in prelinguistic infants (Oller et al., 2013). Therefore, for each individual, we randomly selected a balanced sample of eight peep recordings produced during feeding contexts (feeding on shoots/seeds on the ground and fruits in trees), which we inferred to be as approximately positive in overall valence (Briefer, Tettamanti & McElligott, 2015) and eight peep events produced during non-feeding contexts (resting and travel), which we inferred to be, in comparison to feeding, relatively neutral in order to valence. This amounted to a total of 128 peep events. In order to capture the spectrum of emotional valence in our acoustic analyses (i.e., positive-negative-neutral), we also analysed a sample of peeps associated with predator alarm responses and in response to agonistic interactions as the victim, which were both taken to represent negative valence. As peeps in response to agonistic and alarm contexts were rare, we analysed a balanced and randomized sample of 4 peep samples per individual, taken from independent behavioural events produced by 7 of the original 8 individuals (N = 28 in total). The eighth individual was excluded in this sample due to inadequate sample size. We selected two contexts per valence class (positive-negative-neutral) in order to maximise sample size as well as to adequately capture the potential acoustic variation in different contexts. In order to capture variation in the feeding experience overall, recordings from feeding contexts included a randomized and balanced selection of vocal events in response to feeding on fruits in trees as well as to herbaceous shoots on the ground. For non-feeding contexts, we analysed a randomized sample of recordings for each individual produced during rest and travel on the ground. For negative valence contexts, we analysed a randomised balanced sample of peeps produced during agonistic conflicts and predator alarm contexts. We carried out all quantitative acoustic analyses with Praat 5.4.01 using the following settings: analysis window length 0.05 s, dynamic range 70 dB; pitch range 500–3,000 Hz, optimized for voice analysis, spectrogram view range 0–10 kHz. We performed pitch analysis using a script (“Analyse Source Editor”) written by M Owren (pers. comm., 2007). We then took the following spectral measurements from the fundamental frequency (F0): (1) mean fundamental frequency (Hz): average F0 across the entire call; (2) frequency at call onset, (3) frequency at call middle; (4) frequency at call offset; (5) transition onset (Hz): frequency of maximum energy at call onset minus frequency of maximum energy at call middle; (6) transition offset (Hz): frequency of maximum energy at call middle minus frequency of maximum energy at call offset; (7) maximum fundamental frequency (Hz): maximum frequency of F0; (8) minimum fundamental frequency (Hz): minimum frequency of F0; (9) number of harmonics: number of harmonic bands visible. In the temporal domain, we measured the call duration (10). Next, we screened the data for outliers by producing standardized Z scores, rejecting any calls with a Z score greater than 3.29 in one or more parameters (Tabachnick & Fidell, 2001). We regressed all parameters to check for multi-colinearity and singularity, removing parameters with a variance inflation factor greater than 10. We then conducted a Discriminant Function Analysis (DFA) to assess whether the uncorrelated acoustic variables could discriminate between different behavioural contexts. Each of the eight individuals equally contributed eight randomly selected calls for both food (henceforth ‘positive valence’) and non-food (henceforth ‘neutral valence’) contexts and four calls per individual were entered for the negative valence (N = 156 peep samples in total). To cross-validate the discriminant functions produced in the analysis, we used the leave-one-out classification procedure, which classifies each calls by the functions derived from all calls other than that one. We used Binomial tests to analyse whether the proportion of correct discrimination differed significantly from chance. In order to examine whether peeps conveyed information about caller identity, we conducted a DFA using the same data used for the above analysis but taking individual identity as the discriminating factor. We additionally conducted separate DFAs for the positive and neutral valence contexts in order to control for behavioural context. We were unable to include separate DFAs for the negative valence context due to small sample size (N = 4 calls per individual) compared to the number of acoustic parameters under scrutiny, which led to inadequate statistical power. Since the acoustic data were two-factorial (caller ID; context), it has been argued that conventional DFA does not allow for a valid estimation of the overall significance of discriminability (Mundry & Sommer, 2007). Therefore, for any significant DFA discrimination, we conducted a permuted Discriminant Function Analysis (pDFA), using a macro written by (Mundry & Sommer, 2007; R Mundry, pers. comm., 2007). The pDFA estimates the significance of the number of correctly classified calls (cross-validated), taking into account repeated contributions per individual caller. Following significant discrimination in the pDFA and diagnostic tests, we used Univariate Analysis of Variance tests to explore whether each of the acoustic parameters varied statistically with context, entering Caller Identity as a Random Factor and Context as the Fixed Factor. All statistical tests were carried out using SPSS version 21.0 (SPSS Inc., Cary, North Carolina, U.S.A.) and R Studio version 3.1.1 (The R Foundation for Statistical Computing, Vienna, Austria). All tests were two tailed and alpha levels were set at 0.05, unless stated as being corrected. We applied standardised Bonferroni corrections for multiple comparisons.

Results During focal animal sampling, we recorded peeps in response to over a dozen different behavioural contexts, which, across all focal individuals, included feeding on fruits, leaves, seeds, flowers in trees and on shoots, seeds, leaves and fruits on the ground. It also included travelling, resting, grooming, preparing a nest, interacting sexually, responding to vocalisations from other parties, descending from trees after feeding, alarm responses to predators or unexpected events, weather changes, agonistic interactions, submissive or appeasement responses towards more dominant individuals, and vocal greetings to the arrival of another individual joining the party. Acoustic structure of peeps We compared the acoustic structure of peeps produced in different contexts (feeding; travel/rest; agonism/alarm) that were associated with different emotion valences (positive; neutral; negative valence, respectively), Fig. 1 and that generated the most peep vocalisations across individuals. Following a multi-colinearity screening, we entered six of the nine original acoustic parameters into our acoustic analyses for eight individuals (total N call events = 156: call duration, mean F0, F0 at call onset, number of harmonics, transition onset and transition offset) and applied logarithmic transformations on three of the acoustic parameters to improve their homogeneity of variance. Results from a cross-validated discriminant function analysis revealed that while the DFA model generated two significant discriminant functions (Wilks Lambda: 0.550, χ2 (df = 14) =80.007, P < .001), peeps produced in association with positive valence contexts could not be reliably discriminated from those produced in all other contexts: the functions only classified 49.3% of the calls correctly, which was below chance level (Binomial test (0.14) P > 0.05). On a pairwise basis, DFA analyses further revealed that peeps produced in association with positive valence contexts could not be reliably discriminated from those produced during neutral valence contexts (Wilks Lambda: 0.947, χ2 (df = 6) = 6.638, P = 0.356). In a cross-validated analysis, the functions only classified 52.3% of the calls correctly, which was below chance level (Binomial test (0.5) P > 0.05). However, peeps associated with negative valence (i.e., alarm and agonism) could be significantly discriminated from those produced in association with positive valence (feeding) (82.1% of calls correctly classified; Wilks lambda = 0.468, χ2 (df = 7) = 59.602, P < .001; Binomial test (0.5) P < .001, Bonferroni corrections), which was validated in a subsequent pDFA controlling for repeated contributions (P = 0.009). Similarly, there was significant discrimination of peeps produced in response to negative valence contexts to those during neutral valence contexts, with 77.4% of calls (cross-validated) correctly classified (Wilks Lambda = 0.551, χ2 (df = 6) = 47.107, P < .001; Binomial test (0.5) P < .001, Bonferroni corrections). Caller identity We used the same cross-validated DFA procedure to test whether peeps could be acoustically discriminated on the basis of caller identity (N = 8 individuals). The model generated six significant discriminant functions (Wilks Lambda: 0.371, χ2 (df = 42) = 119.043, P < .001), which discriminated caller identity at a significantly higher rate than chance (cross-validated correct classification: 31.3%, Binomial test (0.125) P < 0.001). We then conducted two separate DFAs to examine individual identity discrimination for peeps in positive and neutral contexts. Results from the analyses were equivalent, with identity significantly discriminated in both contexts (Individual identity in Feeding contexts 31.3% (20/64) calls correctly classified: Wilks Lambda = 0.234, χ2 (df = 42) = 81.285, P < 0.001; Binomial test (0.125) p < 0.001; in non-feeding contexts: 32.8% (21/64) calls correctly classified Wilks lambda = 0.210, χ2 (df = 42) = 87.313, P < 0.001; Binomial test (0.125) P < .001). Comparing acoustic parameters At the level of acoustic parameters, Univariate ANOVAs (Caller Identity as a random factor) revealed that the mean call duration, the mean fundamental frequency and the mean frequency at call onset varied significantly as a function of behavioural context (Mean call duration F 2,12 = 5.625, P = 0.019; Mean F0: F 2,12 = 19.054, P < .001; F0 call onset: F 2,12 = 40.259, P < 0.001). Pair-wise comparisons (standard Bonferroni corrections), as shown in Fig. 2, of fundamental frequency (F0) parameters showed that peeps produced in association with negative valence had a significantly higher mean F0 and a higher onset F0 compared to peeps associated with positive valence (Mean F0 negative = 2,131 ± 267 Hz, Mean F0 positive = 1,660 ± 133; F 1,6 = 16.862, P = 0.006; F0 at call onset negative = 2,027 ± 194 Hz, F0 at call onset positive = 1,612 ± 125 Hz; F 1,6 , = 35.990, P = 0.001) and neutral valence (Mean F0 neutral = 1,584 ± 210 Hz: F 1,6 = 27.160, p = 0.002; F0 at call onset neutral 1,508 ± 186 Hz, F 1,6 = 69.887, P < .001). Although peeps associated with negative valence were shorter in duration compared to those associated with positive valence (Mean call duration negative = 0.12 ± 0.14, mean call duration positive = 0.15 ± 0.03: F 1,6 = 8.316, P = 0.028), the result was not significant under the Bonferroni correction. There were no other significant acoustic differences. Figure 2: Boxplots indicating six acoustic parameters of peep vocalisations that varied as a function of behavioural context. The emotional valence associated with the context is indicated in parentheses. Thick black lines represent medians; open circles and small asterisks represent outliers, box edges represent the upper and lower hinges of the H-spread, which generally matches the upper and lower quartiles; whiskers represent the adjacent values, which are the most extreme values still lying within hinges and the normal distribution of the sample. For significant differences, lines with ∗∗ represents P < .05, ∗∗∗ represents P < .001.