Abstract Vocal learning, the substrate of human language acquisition, has rarely been described in other mammals. Often, group-specific vocal dialects in wild populations provide the main evidence for vocal learning. While social learning is often the most plausible explanation for these intergroup differences, it is usually impossible to exclude other driving factors, such as genetic or ecological backgrounds. Here, we show the formation of dialects through social vocal learning in fruit bats under controlled conditions. We raised 3 groups of pups in conditions mimicking their natural roosts. Namely, pups could hear their mothers' vocalizations but were also exposed to a manipulation playback. The vocalizations in the 3 playbacks mainly differed in their fundamental frequency. From the age of approximately 6 months and onwards, the pups demonstrated distinct dialects, where each group was biased towards its playback. We demonstrate the emergence of dialects through social learning in a mammalian model in a tightly controlled environment. Unlike in the extensively studied case of songbirds where specific tutors are imitated, we demonstrate that bats do not only learn their vocalizations directly from their mothers, but that they are actually influenced by the sounds of the entire crowd. This process, which we term “crowd vocal learning,” might be relevant to many other social animals such as cetaceans and pinnipeds.

Author summary The spontaneous acquisition of speech by human infants is considered a keystone of human language, but the ability to reproduce vocalizations acquired by hearing is not commonly described in other mammals. This skill, termed vocal learning, is challenging to study in nonhuman animals since such investigation requires the detection and exclusion of innate developmental effects. The recognition of vocal dialects among different populations can open a window on the vocal learning abilities of animals, but such findings in the wild may reflect genetic or ecological differences between groups rather than the learning of group-specific vocal behavior. In this study, we used a playback-based lab experiment to induce vocal dialects in fruit bat pups. By exposing groups of pups to different playbacks of conspecific calls, we could establish separate dialects, demonstrating the vocal learning skill of these bats. Furthermore, while songbirds, for instance, learn their songs directly from a specific tutor, our bats showed the ability to pick up vocal variations from the surrounding crowd, without direct interaction with any given tutor.

Citation: Prat Y, Azoulay L, Dor R, Yovel Y (2017) Crowd vocal learning induces vocal dialects in bats: Playback of conspecifics shapes fundamental frequency usage by pups. PLoS Biol 15(10): e2002556. https://doi.org/10.1371/journal.pbio.2002556 Academic Editor: Asif Ghazanfar, Princeton University, United States of America Received: March 14, 2017; Accepted: September 28, 2017; Published: October 31, 2017 Copyright: © 2017 Prat et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data are within the paper and its Supporting Information files Funding: European Research Council (ERC – GPSBAT) https://erc.europa.eu (grant number ERC-2015-StG - 679186_GPS-Bat) (to YY). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: F0, fundamental frequency; HWE, Hardy–Weinberg equilibrium; LDA, linear discriminant analysis

Introduction Vocal learning, the ability to learn to produce vocalizations by hearing, is essential in human language acquisition, but only a few other mammals appear to possess this capability [1–8]. Some indications for the existence of vocal learning in nonhuman animals arise from the observation of group-specific vocal dialects in wild populations [9–11]. Such vocal variations can indeed stem from vocal learning of typical vocalizations by members of the group; however, it is usually impossible to completely exclude other explanations for the appearance of vocal differences between populations [12]. For instance, genetic variations may lead to unique vocal patterns, and environmental constraints may induce specific usage of vocalizations. Studies of several species of bats have indicated their vocal learning ability [4]. Early studies suggested that Phyllostomus discolor pups adapt their isolation calls to their mothers’ directive calls [13], and P. hastatus females were shown to maintain a group-specific foraging call through vocal learning [14]. Geographic variations in vocalizations of these 2 species were also observed [15,16], though genetic and environmental factors were not excluded as possible contributors to these apparent dialects. In another bat species (Saccopteryx bilineata) that is an important model for vocal learning, pups have been shown to learn territorial songs from adult male tutors [17] and to engage in vocal babbling behavior [18]. In a previous study [19], we showed that depriving Egyptian fruit bat (Rousettus aegyptiacus) pups from hearing adults delays their vocal ontogeny. Yet we also found that these isolated pups eventually catch up with their control counterparts. Moreover, we have not shown plasticity in the vocal ontogeny of non-isolated pups. The Egyptian fruit bat is an extremely social and vocal mammal, living in colonies of dozens to thousands of individuals. In the wild, these bats are exposed to extensive vocal communication throughout their entire lives. A typical vocalization of this species is composed of a sequence of multiharmonic calls (Fig 1; see Materials and methods for details). The fundamental frequency (F0) in newborn pup isolation calls is high (ca. 8–15 kHz) (Fig 1A), and it gradually decreases to ca. 0.2–1.2 kHz in adults (Fig 1B–1D). We have previously shown that this process involves vocal learning [19]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Egyptian fruit bat vocalizations. (A) Isolation call—produced by newborn pups. (B) A modified isolation call—the first non-isolation social calls of pups (appearing around the age of 20–40 days). White dotted lines in (A) and (B) mark the F0; notice the drop in F0. (C) Adult multisyllabic vocalization. One example, out of a diverse repertoire, containing a low F0 call (179 Hz). (D) Another example of an adult vocalization containing a high F0 call (1,431 Hz). Notice how in the first call in the sequence the harmonics are clearly separable due to the high fundamental. (E) The distribution of adult calls’ F0. Calls with F0 lower than 250 Hz were designated by us as “Low-F0,” and calls with F0 higher than 1,315 Hz were designated as “High-F0” (see Materials and methods for details). (F) The distribution of F0 among the 3 playbacks: Blue represents Low-F0 playback, red represents High-F0 playback, and black represents Control playback. Distributions are plotted as smoothed and normalized histograms. Numeric data for (E) and (F) are given in S1 Data. F0, fundamental frequency. https://doi.org/10.1371/journal.pbio.2002556.g001 A fruit bat pup is mostly exposed to adult vocalizations when in the roost. In this situation, the pup continuously hears countless vocalizations coming from the surrounding darkness and has very little, if any, interaction with most of the vocalizing individuals. It is therefore exposed to a cacophony of fruit bat vocalizations, only a slight minority of which are emitted by its mother or by nearby roostmates. In this study, we therefore set to examine whether the vocal communication of pups that grow up in such an environment is shaped by the individuals that they directly interact with or by the background vocalizations they are “passively” exposed to. We raised pups in conditions that mimic the natural acoustic conditions of a dark fruit bat cave and observed the establishment of vocal dialects through vocal learning of the entire “crowd” in the artificial cave.

Discussion This study adds substantial evidence for the importance of vocal learning in the ontogeny of bat vocal communication. The highly controlled playback experiments that we performed excluded possible biasing factors such as differences in the ecological, developmental, or genetic backgrounds of the subjects or even differences in the recording conditions, all of which might lead to false reports of vocal learning. It is important to note that, in the wild, as well as in our setup, bats are exposed to an immense amount of vocalizations produced by conspecifics in the dark. Thus, young pups hear conspecifics that do not directly interact with them to an extent that quantitatively overshadows the vocalizations produced by their mothers or immediate neighbors. Accordingly, we found that our pups presented a “crowd vocal learning” phenomenon, where their vocal repertoire was shaped by the complete repertoire they heard in their colony (mainly governed by our playbacks) and not only by the vocalizations of a single tutor (e.g., their parents) as is mostly discussed in the songbird literature [21]. Vocal learning is often assumed to include imitation [1] or at least social reinforcement of specific vocalizations [8]. The bats in our study did not interact with their models and hence were not subject to reinforcement, and we cannot assert that they imitated specific calls. It may be in line with recent views, which dispute the dichotomous definition of (presence or absence of) vocal learning abilities and rather find varying levels of this skill among different species [22]. Furthermore, when syllables are not readily categorized into specific types, as in the case of fruit bat vocalizations [20], it might be more difficult to identify imitation than when clear syllable types are recognized (as in the case of many birdsongs). Yet the bat crowd vocal learning demonstrates some degree of imitation, with an apparent tendency to social conformity. We hypothesize that such crowd vocal learning may be employed by other species that are exposed to many vocalizations of conspecifics without directly interacting with them. Such auditory exposure occurs, for instance, in many cetaceans, whose calls travel very long distances, or in congregating species such as pinnipeds and some sea birds (in which vocal learning has so far not been described). Several aspects of the behavior of the High-F0 group suggest that innate preferences also play a role in vocal ontogeny: 1) The bats have not adopted calls with F0 above 2 kHz, although these were abundant in the playback. Such high F0 calls characterize subadults and are very rarely emitted by adults, and 2) They reduced the use of high F0 calls when reaching sexual adulthood. At the age of 43 weeks (approximately 300 days), the bats are already mature, and the use of high-fundamental calls at this age is extremely rare in fruit bats (possibly due to physical constraints). Hence, it seems that a bias that is related to the animal’s physiology overrides learning of too-high-fundamental calls after a certain age (High-F0 group, Mann–Whitney U test: p = 0.14 in the fourth recording session; S3D Fig). Note also that the High-F0 bats also included more low F0 calls in their repertoire relative to the controls (red outlined arrow, Fig 3). We can only hypothesize that this was due to their lesser exposure to calls around the control peak (approximately 600 Hz). Importantly, even if the High-F0 bats reduced the excess of high-frequency calls in their repertoire towards the end of the year, they still exhibited their unique vocal dialect that was also driven by additional acoustic properties. This can be learnt from the forming of separable groups in the time period of the last recordings (Fig 2B–2D, note that the probability of getting a separable group by chance is extremely low; see for example 4 random permutations in S4 Fig and exact p-values above). One acoustic feature that contributed to the unique dialect of the High-F0 group was the energy entropy (S5 Fig; also conforming to the LDA analysis in S1 Table). To conclude, in a tightly controlled acoustic environment, we observed the formation of vocal dialects as a result of crowd vocal learning. When such dialects are found in the wild, it is often difficult to exclude nonsocial factors, but in this study, the pups were raised and recorded in identical settings except for the playback they heard. Notably, shared intragroup behaviors acquired and transmitted through social learning are generally referred to as culture [12,23]. Furthermore, evidence for nonhuman culture is occasionally based on learned vocal behaviors of birds [24–26] and mammals [27,28], with specific emphasis on vocal dialect variations between wild populations [29–31]. In our study, though pups did not directly learn from conspecifics, they were actually exposed to a conspecific stimulus that is very similar to that available to them in the wild (i.e., a stimulus that includes sound without vision or touch). Hence, our results demonstrate the assimilation of shared behavioral phenotypes, which were acquired by social vocal learning from a conspecific stimulus and thus might be considered as in-lab establishment of (vocal) culture in a mammalian model.

Materials and methods Animal capture and care Adult, heavily pregnant female bats (R. aegyptiacus) were captured in 2 wild roosts in central Israel and were randomly mixed. The bats were kept in 3 identical acoustic chambers (length: 190 cm; width: 90 cm; height: 82 cm) large enough to allow flight and fed with a variety of fruit ad lib. The light/dark regime was 12 h/12 h. The bats were randomly assigned to 3 groups, each housed in 1 chamber: 5 bats in the High-F0 group, 5 bats in the Low-F0 group, and 5 bats in the control group. All bats gave birth inside the chambers. One pup of the High-F0 group and 1 pup of the control group died few days after birth. Subsequently, 1 mother with a pup approximately 1.5 months old (caught in the wild roost) was added to the control group when the pups were ca. 1.5 months old. Ethics statement All experiments were reviewed and approved by the Animal Care Committee of Tel Aviv University (Number L-13-016) and were performed in accordance with its regulations and guidelines regarding the care and use of animals for experimental procedures. The use of bats was approved by the Israeli National Park Authority. Playback In previous studies in this exact setup, we have recorded hundreds of thousands of bat vocalizations. Examining the distribution of the F0 among the recorded adult and subadult vocalizations (Fig 1E), we defined 2 extreme groups of calls—High-F0 (above 1,315 Hz, 2 SD above the mean) and Low-F0 (below 250 Hz, which is the minimum between the 2 modes in the bimodal distribution, 1.1 SD below mean). For the playbacks (Fig 1F), we sampled the original dataset with 2 biased samples: one containing a high proportion of Low-F0 calls, which was played to the Low-F0 group, and one containing a high proportion of High-F0 calls (including subadult vocalizations), which was played to the High-F0 group. For the control group, we used a random sample (see diamond shapes in Fig 2; see also S3 Fig and lines in the middle row of Fig 3 for the F0 content of the playbacks). We used raw recordings (audio files) without any editing to keep the stimulus as natural as possible. All in all, 105, 227, and 191 different recordings were included in the High-F0, Low-F0, and control playbacks, respectively (each group was exposed to the same number of played recordings during the entire experiment period, where each recording included a sequence of calls and represented a full vocal interaction that was recorded between adult bats; see below). The playback vocalizations were played around the clock with a timing distribution mimicking the natural vocal behavior of this species, where many of the vocalizations are emitted at dawn and dusk and more vocalizations are emitted during the night than during the day [20]. In each playback event, 1 vocalization (a raw recording of a sequence of calls) was selected randomly for each group, and these vocalizations were played concurrently in their corresponding chambers, i.e., the playbacks were played in a random, nonrepeating order. The rate of the playbacks was 14,057 call-sequences (i.e., recordings) per day and was the same in all 3 groups. Because not all sequences had the same number of calls, the groups heard 69,931, 48,651, and 129,715 calls per day on average for the Low-F0, High-F0, and control groups, respectively (to clarify the difference between a recording and a call, see Fig 1C, where a recording with 4 calls is shown, and Fig 1D, depicting a recording with 3 calls). These might seem like large differences, but even in the treatment with the fewest calls (i.e., 48,651 calls per day), the pups were exposed to a playback rate that was approximately 16 times higher than the calling rate of 5 adult bats [20]. Thus, pups heard (at least) 16–30 times more playback vocalizations per day than the vocalizations produced by their mothers during the first 14 weeks of the experiment (when the mothers were still present). Recording of pup vocalizations We recorded the pups’ vocalizations in 4 recording sessions, when the pups were at the ages of 12–18 weeks, 31–35 weeks, 40–43 weeks, and 48–51 weeks. All ages are reported with an accuracy of ±15 days. During a recording session, each group of pups was transferred into a recording chamber, which was similar to the housing chambers. All pups in a group were transferred together (except for part of the first recording session in which the pups were recorded in triplets; see S5 Table), recorded for 1–5 days, and returned to their home chamber. This transfer was repeated for each group in rotation until the end of the recording session, which lasted for 21–45 days, resulting in all groups being recorded for approximately the same time and no more than a few days apart (see S5 Table for the detailed schedule). The recording chamber was continuously monitored with IR-sensitive cameras and omnidirectional electret ultrasound microphones (Avisoft-Bioacoustics Knowles FG-O; 2 microphones in a cage, 1 in each side of the cage). Audio was sampled using Avisoft-Bioacoustics UltraSoundGate 1216H A/D converter with a sampling rate of 250 kHz. Raw audio recordings were automatically segmented and filtered for noises and echolocation clicks, leaving only bat social communication calls (see [19] for details of this process). The video was synchronized to the audio, resulting in a short movie accompanying each audio recording. Videos were then analyzed by L.A., who identified the emitter of each call. The bats were individually marked using fur bleaching. An emitter bat was recognized by its mouth movements, and 2–3 cameras could be used to verify a distinct assignment. If there was any doubt regarding the emitter's identity, we excluded the vocalization from the analysis. Data analysis and statistics Social vocalizations of R. aegyptiacus are composed of sequences of separated calls (in our analysis, we regarded a call as a vocalized segment of a duration of at least 20 ms that is separated by at least 4 ms of silence from other vocalized segments). The vocal sequences commonly contain between 1 to 20 calls, with an average length of 2.7 calls (±2.6, SD) per sequence (see examples in Fig 1C and 1D) and an average duration of 119.1 ms (±69.3 ms, SD) per call. These calls are typically broadband (with 90% of the energy spread between approximately 3–45 kHz), generally harmonic squawks, with an average F0 of 544 Hz for an adult bat (F0 for a single call was defined as the geometric mean of the F0 content in that call). The calls are not readily clustered into different acoustic syllables (in the past, we have tested many more features than were used in this paper). They rather appear to rest on an acoustic continuum (see S1 Fig for a description of different acoustic features across the repertoire). They can thus all be considered as variations of one large “acoustic cloud” of agonistic calls. For each call, 7 acoustic features were extracted: log F0, Shannon entropy of the power spectrum, Wiener entropy, spectral centroid, frequency with peak energy, amplitude entropy, and duration. The features were measured with a sliding window of 20 ms (19 ms overlap) and were averaged for each call (except for the duration, which was measured for the entire call). The F0 was calculated using the YIN algorithm [32]. This processing was computed over all recorded calls as well as all playback calls. We first examined the differences between the groups and their relation to the playbacks using LDA (Fig 2). To this end, we performed an LDA on the features extracted from the 3 playbacks, obtaining the 2 discriminant functions (a projection of the 7 acoustic features onto a new 2-dimensional space, S1 Table) that best discriminate between the playbacks. We then plotted the average of the calls of each pup in each recording session in these new 2 dimensions. The features were scaled prior to the application of the LDA by subtracting the mean and dividing by the SD, for both the playbacks and the pup vocalizations. The separation between the groups, which is clearly visible from the second recording session onwards, was evaluated for statistical significance (using permutations) as follows: For each recording session (each panel in Fig 2), we tested the linear separation between the group, i.e., how many pups are correctly assigned to their group if straight lines are drawn to best separate the groups (this was done using a second LDA applied to obtain the separation significance). We then tested all possible permutations of group assignments for the pups, keeping the number of pups in each group constant, and computed an exact p-value (correct assignments in best separation: 10/14, 14/14, 12/14, and 14/14, with p-values: 0.09, 3.2 × 10−5, 0.0075, and 5.6 × 10−5, for recording sessions 1–4, respectively). To control for possible sex biases (i.e., differences between males and females), we repeated these permutations while also keeping the male/female compositions of the groups, obtaining similar results (p = 0.1, p = 6.8 × 10−5, p = 0.0076, and p = 2×10−4, for recording sessions 1–4, respectively). In order to assess the statistical significance of the use of different F0 (S3 Fig), we performed a mixed linear model analysis, testing the effect of the group on the development of Low-F0 usage or High-F0 usage. We also tested for a possible effect of the sex of the pups (including it in the models) and found no such significant effect (see S3 Table). After finding an overall group effect, we used 1-tailed Mann–Whitney U tests to demonstrate the differences between the manipulation groups and the control group at each recording session (S2 Fig). The mixed model analysis was performed in SPSS. All other processing and the analysis of the data were performed using Matlab 8. Genetic analysis Sample collection. 3-mm diameter wing punch was sampled from each of 11 individuals (2 pups from the Low-F0 group and 1 pup from the control group died after the recordings were completed but before the samples were taken a few months after the end of the experiment). Punches per individual were preserved in molecular grade 100% ethanol and frozen at −80°F. Wing tissues were obtained using sterile, disposable 3-mm skin biopsy punches. One biopsy punch was used per individual, and samples were taken from regions of the wing that were far enough from major blood vessels and the edge of the wing to avoid tearing. Molecular methods and genetic analyses. Genomic DNA was extracted using DNAeasy tissue Extraction kit (Qiagen, Valencia, California). Samples were genotyped at 10 microsatellite marker loci developed for R. madagascariensis or R. leschenaulti using described conditions [33,34]. Amplified products were visualized on an ABI 3100 genetic analyzer. Allele size scoring was performed using GeneMarker v2.6.7 (SoftGenetics, LLC), verified and amended by eye. We examined the deviation from Hardy–Weinberg equilibrium (HWE) and the presence of null alleles using the software Cervus v3.0.7 [35]. Pairwise relatedness was calculated using the package 'related' in R [36]. Microsatellite markers were polymorphic (mean allele number per locus 5.5, range: 2–7), did not deviate from HWE, and had low level of null alleles (< 15%). Genetic results. Relatedness estimates were qualitatively similar across the various estimators used. Using Wang (2002) estimator [37], the relatedness estimate within groups was r = −0.064 ± 0.064 (mean ± se) and between groups was r = −0.066 ± 0.032 (mean ± se), confirming that relatedness within groups was not different than between groups. The numerical data used in all figures are included in S1 Data.

Acknowledgments We thank Mor Taub for assistance with the video analysis and Lee Harten for assistance with DNA sampling.