Animals B and C were presented with a new set of training sounds which varied in where energy was distributed across frequencies to correspond with formants from five of the cardinal vowels; [ɑ], [e], [i], [Ͻ] and [u]. The vowels were produced by a native North American English speaker but were digitally altered to have the same average fundamental frequency of the seals’ normal calls. Thus, these were novel calls that the seals had not produced before and that are not part of a gray seal’s repertoire. To confirm this we also investigated the use of peak frequencies in 1,859 calls of 34 wild naive gray seals of similar age. Once animals B and C reached a success criterion of 80% in copying single vowel sounds, we tested them presenting the same signals in randomized combinations of up to three vowels.

To teach animals the experimental paradigm, all were initially trained to match the moan call type from their own repertoire following methods in []. In the second phase of training, animals heard digitally modified model calls to see how far animals could change acoustic parameters. For animal A, moan duration was kept constant while changing the pitch, allowing the relative frequency structure of the call to remain intact while linearly shifting the entire frequency spectrum of the call. We used training sequences of up to three calls, consisting of every possible combination of three different peak frequencies (880, 1,046, and 1,174 Hz). The seal was reinforced for matching both the number of calls and the change in frequency. During test trials we used sequences of up to 10 calls and varied the peak frequency over an octave (698-2093 Hz).

Training Procedure

All behaviors were trained using operant conditioning and positive reinforcement. A large portion of the seals’ daily diet, consisting primarily of herring (Clupea harengus) and sprat (Clupea sprattus), was used as reinforcement. Correct responses were reinforced with fish, while incorrect responses resulted in a three to five second least reinforcing stimulus (LRS), a “time-out” during which the trainer made no response, before continuing training.

All sessions took place on land and were voluntary. The seals always had access to the water, and if the animal refused to leave the water for a session, training was ended and their diet was free fed to them at the end of the day. Initially the seals were reinforced for making any sound, which progressed until they were only reinforced for vocalizing when stationed out of the water. At this point a hand cue was introduced, and the seals were reinforced for making any sound when cued in addition to staying silent when the hand cue was not present (i.e., the cue was under stimulus control).

45 Shapiro A.D.

Slater P.J.B.

Janik V.M. Call usage learning in gray seals (Halichoerus grypus). Zola (animal A) had then been trained to discriminate between seal call types by producing a growl when hearing a growl and producing a moan when hearing a moan. We used both her own calls and those of unknown seals as stimuli in this training. These call types have been previously well documented; growls are noisy calls with a bandwidth of up to 20 kHz, while moans are periodic calls with a harmonic structure and bandwidth rarely exceeding 5 kHz [].

Once Zola consistently matched call type, playbacks shifted to a digitally altered stimulus. We used a moan recorded from the animal as a basis for our stimuli. The final moan stimulus always had the same duration (0.5 s), inter-call interval (0.1 s), and amplitude (70 dB re 20 μPa rms) but was digitally manipulated using Adobe Audition 2.0 to vary in number and frequency. We did not alter the natural waveform of the seal’s call in any other way. The stimuli’s frequency was changed using the “pitch shifter” function, which keeps the duration of the call constant while moving the pitch of the call. This allowed the relative frequency structure of the call to remain intact while linearly shifting the frequency spectrum of the call.

Peak frequency was changed in integer steps corresponding to the musical scale nearest the seal’s mean peak frequency and extending more than one standard deviation based on a sample of 100 calls (mean frequency 1015 Hz, SD ± 89.27). Thus, Zola was presented with 880, 1,046, and 1,175 Hz calls (corresponding with musical notes A5, C6, and D6). Shifting the peak frequency additionally changed the fundamental frequency of the signals (180, 210 and 245 Hz respectively). During each playback the seal was played one to three calls, consisting of every possible combination of these three frequencies. To enable us to reinforce performance consistently without allowing the animal to produce the correct number of calls by stopping when reinforced, we assumed that the animal’s response had finished if there was a gap of one second after her last call. This interval was chosen as it was considerably longer than the mean inter-call interval of a sample of 50 calls from her multi-call performance (mean = 0.27 s, SD ± 0.17). The seal was only reinforced if it matched both the number of calls and the change in frequency of calls. Rather than reinforcing an absolute match in frequency, reinforcement was based on the direction of change in peak frequency between successive stimuli such that if the change in frequency was at least 60 Hz in the correct direction (either increased or decreased as in the model, based on what was easy to judge visually on the computer screen during training and the peak frequency difference between test signals), then the seal was reinforced. If the signal consisted of only one call, then the seal was reinforced for responding with only one call regardless of frequency.

The training procedure for the seals tested in 2013, Gandalf (animal B) and Janice (animal C), was similar to that used in 2012, with some changes. While in 2012 the seal stationed spontaneously at a set location at the side of the pool, in 2013 both seals would position themselves near the trainer, directly next to the enclosure’s fence. To keep the seals approximately one meter from the speaker, a physical station (a ball on which the seal positioned its chest) was introduced. If the seal was not at that station, its responses were not reinforced.

In initial training several different moans between 0.3 and 0.5 s in duration were used, with only one moan being played during each playback. Training stimuli were composed of novel sets of ten to twenty calls used per session and were changed every two to three sessions. At this stage of training, the seals were reinforced for producing any single moan in response to the single stimulus per playback.

Once the animals were successful in this task, (in May 2013) both seals were trained with a single moan that always had the same duration (1.0 s) and amplitude (70 dB re 20 μPa rms) but varied in frequency and number of repetitions. Peak frequency was changed in integer steps corresponding to the musical scale nearest the seal’s mean peak frequency and extending more than one standard deviation in both directions (for Janice 915 Hz, SD ± 54.08 and Gandalf 577 Hz, SD ± 31.32). Thus, for Janice signals of 783, 987 and 1,175 Hz (corresponding with musical notes G5, B5, and D6) were used while for Gandalf stimuli had frequencies of 493, 587, and 698 Hz (B4, D5 and F5). During each playback the seal was played between one and three moans per playback with any combination of the three frequencies. Just as in 2012, the seals were only rewarded when producing the correct number of calls and changed the frequency of their calls in the correct direction as in the sample.

Once the seals both had five consecutive sessions with at least 80% correct responses, they were then presented with a new set of training stimuli. These calls had constant duration (0.6 s), amplitude (70 dB re 20 μPa rms) and fundamental frequency. The fundamental frequency was chosen based on each seal’s average for a sample of 100 calls; for Janice 380 Hz (mean 378, SD ± 33.79) and Gandalf 190 Hz (mean 192, SD ± 39.02). The calls varied by sound spectrum (i.e., where energy was distributed across frequencies), to correspond with formants from five of the cardinal vowels; [ɑ], [e], [i], [Ͻ] and [u]. These specific vowels were chosen as they are produced using variable mouth, lip and tongue positions and they were easily identifiable by human listeners. The vowels were produced by a native North American English speaker (recorded with sampling rate 96 kHz, 24-bit) and then digitally altered in Adobe Audition to have a mean fundamental frequency similar to that of the seals. The seals were presented with one sound per playback and reinforced for responding with the correct number (i.e., one call) and formant frequencies. Once the seals had five consecutive sessions with 80% correct trials, they were then presented with multiple calls. At this time the seals were presented with up to five vowels (inter-call interval 0.05 s) per trial, always in the same order (i.e., vowels were always played in the following sequence: [ɑ], [e], [i], [Ͻ] and [u]) in addition to being played individually. Thus, nine training signals were used at this point.