Vocal learning, a crucial component of human speech, has evolved independently in several distantly related taxa, typically to allow the learning and cultural transmission of complex, conspecific calls []. The learned songs of birds [] and whales [] are the best-known examples. Numerous instances of vocal imitation across species (sometimes termed “vocal mimicry”) also exist, for example animals imitating human speech. Among birds, parrots and mynahs are talented imitators of the human voice [], but only a few convincing examples of speech imitation in nonhuman mammals are known. One documented case was Hoover, a harbor seal (Phoca vitulina) who could utter simple phrases in English after being raised by a Maine fisherman []. Another study documented that an adult male beluga (Delphinapterus leucas) imitated his name “Logosi” []. Anecdotal reports further suggest that a male Asian elephant (Elephas maximus) in a zoo in Kazakhstan might have been capable of producing speechlike utterances in Russian and Kazakh [], but documentation is lacking.

Human speech imitation in animals requires a complex match between vocal perception and production to perceive, decode, and reproduce the speech signal. Despite considerable effort, several attempts to train apes to imitate human speech provide little support for ape vocal imitation abilities []. The inability of our nearest living relatives to imitate speech apparently stems from poor cortical-motor control of the larynx and the vocal tract []. Despite lacking certain morphological structures that humans use to articulate speech sounds (e.g., having a beak instead of lips), some animals can overcome morphological constraints that might seem to preclude production of human sounds, as long as neuronal circuitry specialized for perceiving and reproducing an acoustic signal is available.

Here, we analyze human speech imitation by a male Asian elephant named Koshik from the Everland Zoo in South Korea, augmenting and extending prior evidence of vocal imitation in elephants [].

Koshik’s speech sound repertoire was said by his trainers to comprise six Korean words. We tested this hypothesis by analyzing transcriptions made by 16 Korean native speakers on 47 recordings of Koshik’s utterances (see Table S1 available online). The subjects were not informed about the supposed spelling or meaning of the imitations. This analysis largely confirmed the trainers’ claims, indicating that Koshik’s speech imitations correspond to the following five words: “annyong” (“hello,” Audio S1 ), “anja” (“sit down,” Audio S2 ), “aniya” (“no”), “nuo” (“lie down,” Audio S3 ), and “choah” (“good,” Audio S4 ). Agreement was high for vowels and relatively poor for consonants: vowel transcription similarity was 67% overall, whereas consonant agreement only reached 21% ( Table S1 ). For example, “choah” utterances (according to trainers) were mainly transcribed as “boah” (“look,” 38%) or “moa” (“collect,” 23%), but neither of these utterances was used toward Koshik. As a result, transcriptions provided exact spelling matches (in Korean) for only one sound (“annyong,” “hello,” for which the majority of respondents [56%] agreed) and three additional imitations for which considerable agreement could be documented (“aniya”: 44%; “nuo”: 31%; “anja”: 15%). These results show that Koshik accurately imitates vowels, determined by formant frequency matching, but that consonant fidelity is relatively poor. Korean is not a tonal language like Chinese, in which changes in fundamental frequency are phonemic and change word meanings. Figure 1 contains spectrographic depictions of Koshik’s speech imitation corresponding to the word “nuo,” together with “nuo” produced by one of his trainers and a native Korean speaker unfamiliar with Koshik.

Spectrograms exemplifying the speech utterance “nuo” of the trainer (A and D) compared to the elephant’s (Koshik) imitation (B and E) and a 40-year-old male Korean native speaker (C and F) with no experience of Koshik’s Korean output (recorded via a head set and thus with higher recording quality than the other two sound samples). (A–C) represent narrow band spectrograms of “nuo” and (D–F) give wide-band spectrograms of each “nuo” utterance, respectively. The fundamental frequency (fund. freq.) and the first and the second formant (F1 and F2) are indicated.

We applied discriminant function analysis (DFA) to compare structural characteristics of Koshik’s speech imitations to natural Asian elephant calls (using duration, minimum and maximum fundamental frequency, and the first formant/spectral peak frequency), finding that Koshik’s imitations are very different from 187 calls of 22 Asian elephants of both genders and various ages recorded in five different zoos and in the Udawalawe National Park, Sri Lanka ( Table S2 ). Instead, they cluster tightly with the human model utterances ( Figure 2 A), which were recorded from Koshik’s trainers. Fundamental frequency is the most discriminating feature. Post hoc Bonferroni tests revealed no significant difference in minimum or maximum fundamental frequency between Koshik’s imitations and the trainer’s utterances, but showed significant differences relative to the natural Asian elephant calls (all p < 0.001) ( Figure 2 B).

(A) Scatterplot representing function 1 and 2 of the DFA. None of Koshik’s imitations was classified with the natural Asian elephant (Ae.) calls, whereas 50% were allocated to the human utterances. In turn, 58% of the human utterances (trainers) were allocated to Koshik’s imitations. The strongest factor loading of the first function included the variables maximum (0.477) and minimum (0.441) fundamental frequency (% variance explained: 99.5%), with formant/spectral peak frequency (0.997) on the second function (% variance explained: 0.5%).

Koshik’s Speech Production and Formant Matching

19 Soltis J. Vocal communication in African elephants (Loxodonta africana). Figure 3 Koshik’s Imitation of Human Formant Frequencies Show full caption (A) Koshik’s posture during speech imitation. (B) Box plot presentation of the mean peak frequencies of the vowels “a,” “o,” and “u” of Koshik and his trainers (F1 = Formant 1, F2 = Formant 2). (C) Time-varying center frequencies of the first two human formants and the corresponding formants of the elephant of (i) “anja” and (ii) “nuo.” 18 Peterson G.E.

Barney H.L. Control methods used in a study on the vowels. (D) Scatterplots of the first formant on the x axis and the second formant on the y axis of the same two utterances as in (C). These data were superimposed upon the mean values for each vowel (given by the phonetic labels) of American English speakers taken from Peterson and Barney []. In all cases gray symbols depict human, and black symbols elephant, formant values. Particularly during vowel production, Koshik’s first two formants accurately match formant 1 and formant 2 of his trainers ( Figure 3 ). Comparing means of the first and second formant with the corresponding human formant of the most commonly recorded vowels, “a,” “o,” and “u,” revealed no significant difference between the elephant and the human models ( Table S3 ). Koshik’s precise imitation of the acoustic characteristics of his trainers is remarkable, given that the long vocal tract of an elephant would naturally produce much lower formant frequencies []. Koshik creates these accurate imitations of human formant frequencies by placing his trunk tip into his mouth (always from the right side; Figure 3 A and Movie S1 Movie S2 , and Movie S3 ) at the onset of phonation (about 0.3 ± 0.11 s before starting to vocalize, n = 50). During phonation, he raises the lower jaw while keeping the trunk inside the mouth, thus modulating the shape of his vocal tract. The elephant removes the trunk from the oral cavity about 0.4 ± 0.23 s (n = 50) after phonation. There is no considerable difference in the timing of trunk insertion and removal between the different imitations.

20 Herbst C.T.

Stoeger A.S.

Frey R.

Lohscheller J.

Titze I.R.

Gumpenberger M.

Fitch W.T. How low can you go? Physical production mechanism of elephant infrasonic vocalizations. 21 Nair S.

Balakrishnan R.

Seelamantula C.S.

Sukumar R. Vocalizations of wild Asian elephants (Elephas maximus): structural classification and social context. 22 de Silva S. Acoustic communication in the Asian elephant, Elephas maximus maximus. 23 Wemmer C.

Mishra H.R. Observational learning by an Asiatic elephant of an unusual sound production method. 24 Hardus M.E.

Lameira A.R.

Van Schaik C.P.

Wich S.A. Tool use in wild orang-utans modifies sound production: A functionally deceptive innovation?. Not much is known about Asian elephant sound production in general. Presumably, low-frequency rumbles are produced via the same physiological production mechanism (passive vocal-fold vibration) as in human speech, as recently shown for African elephants []. Whether this is true for all call types, and whether particular elephant sounds are emitted nasally or orally, remains unknown. In any case, Koshik’s use of his trunk to produce speech sounds is very unusual and has not been reported for wild Asian elephants [], nor for Koshik when he produces natural elephant calls. Three other Asian elephants have been described to whistle by pressing the trunk against the mouth []. Putting a body part, in Koshik’s case the trunk, inside the mouth, thereby modulating the vocal tract in order to manipulate formants, is a wholly novel method of vocal production. Lacking X-ray images, we cannot be certain whether tongue movements are also involved in Koshik’s speech imitations. But we do know that elephants lack a full oral sphincter, because the upper lip is fused with the nose to form the trunk. Lip rounding, a feature of vowels such as /u/, is thus, strictly speaking, impossible. Koshik’s success at vowel imitation suggests that elephants are able to overcome morphological limitations by augmenting the oral vocal tract with their trunk: an evolutionarily novel and highly specialized appendage. The only vaguely reminiscent result we are aware of, outside of humans, concerns orangutans (Pongo pygmaeus wurmbii), who are reported to modulate sound spectra using their hands or leaves [].

25 Patel A.D.

Iversen J.R.

Bregman M.R.

Schulz I. Experimental evidence for synchronization to a musical beat in a nonhuman animal. 26 Schachner A.

Brady T.F.

Pepperberg I.M.

Hauser M.D. Spontaneous motor entrainment to music in multiple vocal mimicking species. 25 Patel A.D.

Iversen J.R.

Bregman M.R.

Schulz I. Experimental evidence for synchronization to a musical beat in a nonhuman animal. 26 Schachner A.

Brady T.F.

Pepperberg I.M.

Hauser M.D. Spontaneous motor entrainment to music in multiple vocal mimicking species. 26 Schachner A.

Brady T.F.

Pepperberg I.M.

Hauser M.D. Spontaneous motor entrainment to music in multiple vocal mimicking species. The results indicate that the elephant brain can transfer detailed information between auditory centers and the corresponding motor planning regions (including those controlling the trunk muscles), in addition to having the precise control over the larynx necessary to gate and modulate fundamental frequency. Our documentation of elephant vocal learning adds support to the “vocal learning and rhythmic synchronization hypothesis,” since it has been recently suggested that Asian elephants may be capable of beat perception and synchronization (BPS) []. This hypothesis signifies that entrainment might have evolved as a byproduct of selection for vocal imitation (BPS also requires information transfer between the auditory and motor systems) and, thus, that only vocal learning species should be capable of BPS []. The alternative, that entrainment leads to vocal imitation, is rendered unlikely by the finding that, while all known entraining species are vocal learners, many vocal learners show no entrainment ability [].

27 Poole J.H. Rutting behaviour in African elephants: the phenomenon of musth. 28 Pepperberg I.M.

Naughton J.R.

Banta P.A. Allospecific vocal learning by grey parrots (Psittacus erithacus): a failure of videotaped instruction under certain conditions. 29 Amsler M. An almost human grey parrot. 30 West M.J.

Stroud A.N.

King A.P. Mimicry of the human voice by European starlings: the role of social interaction. Although elephants living under human care may be heavily exposed to speech from birth on, they do not imitate speech on a regular basis. Thus, early intensive speech exposure does not seem adequate to initiate speech imitation in elephants (although it might be a required precondition), as long as they are embedded within an elephant social environment. Koshik was captive-born in 1990 and translocated to Everland in 1993, where two female Asian elephants accompanied him until he was five years old. From 1995 to 2002, Koshik was the only elephant in Everland. He was trained to physically obey several commands and was exposed to human speech intensively by his trainers, veterinarians, guides, and tourists. In August 2004, his trainers first noticed that Koshik imitated speech. We cannot be certain whether Koshik started to produce speech sounds at 14 years of age (near the onset of Koshik’s sexual maturity; his first musth period [] occurred in March 2005) or whether earlier imitations went unrecognized by his trainers. However, the determining factors for speech imitation in Koshik may be social deprivation from conspecifics during an important period of bonding and development when humans were the only social contact available (this hypothesis may also hold for other known examples of speech imitation in mammals, Hoover the seal and the beluga Logosi, and also most talking birds []).

17 Poole J.H.

Tyack P.L.

Stoeger-Horwath A.S.

Watwood S. Animal behaviour: elephants are capable of vocal learning. 31 Maglio V.J. Origin and evolution of the Elephantidae. 1 Fitch W.T. The evolution of speech: a comparative review. 2 Janik V.M.

Slater P.J.B. Vocal learning in mammals. 5 Marler P. A comparative approach to vocal learning: song development in white-crowned sparrows. 7 Brainard M.S.

Doupe A.J. What songbirds teach us about learning. 8 Jarvis E.D. Learned birdsong and the neurobiology of human language. 9 Payne K.

Payne R. Large-scale changes over 19 years in song of humpback whales in Bermuda. 27 Poole J.H. Rutting behaviour in African elephants: the phenomenon of musth. Together with previous examples documenting vocal production learning in African elephants [], these new data extend the vocal learning ability to both surviving genera of the once-numerous Elephantidae []. What function or functions might vocal learning serve in elephants? In seals, baleen whales, and many passerine species, which vocalize or sing to attract mates and/or defend territories, vocal learning might facilitate the generation of more complex songs or calls and thus increase reproductive success via sexual selection []. In elephants, little is known about the functional relevance of male calls, which males produce more frequently during musth periods []. Koshik, however, produced speech imitations throughout the year, not only when in musth.