We are all born with a predisposition for music, a predisposition that develops spontaneously and is refined by listening to music. Nearly everyone possesses the musical skills essential to experiencing and appreciating music. Think of “relative pitch,”recognizing a melody separately from the exact pitch or tempo at which it is sung, and “beat perception,”hearing regularity in a varying rhythm. Even human newborns turn out to be sensitive to intonation or melody, rhythm, and the dynamics of the noise in their surroundings. Everything suggests that human biology is already primed for music at birth with respect to both the perception and enjoyment of listening.

Human musicality is clearly special. Musicality being a set of natural, spontaneously developing traits based on, or constrained by, our cognitive abilities (attention, memory, expectation) and our biological predisposition. But what makes it special? Is it because we appear to be the only animals with such a vast musical repertoire? Is our musical predisposition unique, like our linguistic ability? Or is musicality something with a long evolutionary history that we share with other animals?

Schlauwauwau

Darwin assumed all vertebrate animals perceive and appreciate rhythm and melody simply because they have comparable nervous systems. He was convinced human musicality had a biological basis. He also suggested sensitivity to music must be an extremely old trait, much older than sensitivity to language. In fact, he viewed musicality as the source of both music and language, and attributed its presence in humans and animals to the evolutionary mechanism of sexual selection.

To what extent, though, do we share this ability with other animals? Is musicality something uniquely human? Or do we share musicality with other animals on account of the “common physiological nature of [our] nervous systems,” as Darwin suspected? To understand the evolution of music and musicality, we have to establish what the components of music are and how they demonstrate their presence in animals and humans. Perhaps then we can determine if musicality is special to humans.





By the beginning of the 20th century, Ivan Pavlov had discovered that dogs could remember a single tone and associate it with food. Wolves and rats also recognize members of their own species by the perfect pitch of their call and can also differentiate tones. The same applies to starlings and rhesus macaques.

A much more musical skill, though, is relative pitch. Most people listen not to a melody’s individual tones and their frequencies but to the melody as a whole. Whether you hear “Mary Had a Little Lamb” sung at a higher or lower pitch, you still recognize the song. It is even possible you may hear a tune on the loudspeakers in a noisy café and still be able to recognize it instantly.

But who was the singer? You rack your brains, making associations in the hope of remembering the singer’s name or the song title. When that doesn’t work, you turn your smartphone toward the loudspeaker. The software gives you all the information you need within seconds. You now know exactly which song was played, who sang it, and which album it was from.

Songbirds possess a form of listening that has led modern composers to give timbre an important place in their compositions.

To make this possible, the software producers have systematically analyzed and efficiently stored most of the commercially available recordings. A unique description of each song, an acoustic fingerprint that says something about the specific acoustic qualities of each piece of music, is stored in a huge archive. The computer program subsequently compares the fingerprint of the piece of music recorded on the smartphone with the one in the archive, then quickly and efficiently identifies the recording. While a piece of cake for computers, this task is virtually impossible for humans.



However, if you hold your smartphone close to someone singing the same song, the software will respond by saying it has no idea what is being sung. Or it will make a wild guess. The version of the piece being sung is not included in the database of analyzed music, so the software cannot find the fingerprint. By contrast, humans placed in the same situation will recognize the song instantly, and the song may even resonate in their minds for days to come.

A computer would be surprised, so to speak, to learn that we need only half a song to identify who is singing it or what is being sung, regardless of whether it is sung at a higher or lower pitch, slower or faster, in tune or out of tune. For humans, part of the pleasure of listening to music derives from hearing connections and relationships (both melodic and harmonic) between the tones.





For a long time, scientists believed songbirds recognize and remember melodies based on the pitch or fundamental frequency, a skill that is a form of perfect pitch. The American bird researcher Stewart Hulse reached this conclusion some 40 years ago after performing a series of listening experiments with European starlings. He showed that the starlings could discriminate between ascending and descending tone sequences, but not if the sequences were played at a slightly higher or lower pitch. Hulse concluded that the birds focused on the absolute frequencies. European starlings, like many mammalian species, turned out to have perfect pitch rather than relative pitch.

Relative pitch, or the ability to recognize transposed melodies, has been well researched in humans. Neuroscientific studies reveal that relative pitch uses a complex network of different neural mechanisms, including interactions between the auditory and parietal cortices. This network appears to be lacking in songbirds. In researching the biological origins of human musicality, the absence of this neural network in songbirds makes the question of whether humans share relative pitch with other animal species all the more fascinating.

As far as we know, most animal species do not have relative pitch. Humans appear to be the exception. One might wonder, though, whether relative pitch should be limited to pitch alone. Might sound have other aspects in which not the absolute physical characteristics but rather the relationships between them contribute to musicality?

In 2016, researchers at the University of California, San Diego, made an important contribution to providing a possible answer to this question. They exposed starlings to different melodies in which both the timbre and pitch had been manipulated. The stimuli consisted of what one might call sound-color melodies, tone sequences in which each tone has a different timbre. A series of experiments studied the acoustic aspects of the melodies that were used by the birds to classify new, previously unheard melodies.

The fish were able to distinguish between compositions by John Lee Hooker and Johann Sebastian Bach.

Surprisingly, the researchers discovered that the starlings did not use pitch to distinguish a stimulus, as had previously been thought, but rather timbre and changes in timbre (spectral contour). The birds responded to a specific song even when it had been manipulated and all the pitch information removed using “noise vocoding” techniques. The resulting melody resembles a noisy sequence of sounds, a sound-color melody in which the sounds change from one note to the next but have no perceptible pitch. Only when little information remains, as with the stimuli in Hulse’s European starling experiment (the stimuli consisted of pure tones, tones without any spectral information), do songbirds pay any attention to pitch.



As for melody perception, songbirds rely mostly on the spectral information and how that changes over time, or, more specifically, the changes in the spectral energy from one sound to the next. By contrast, humans listen to the pitch, paying little attention to the timbre.

One could say that songbirds listen to melodies the way humans listen to speech. In speech, humans focus mostly on the spectral information; this is what allows us to differentiate between the words “bath” and “bed.” In music, melody and rhythm demand all the attention. Whereas in speech, pitch is secondary—it can say something about the identity of the speaker or the emotional significance of the utterance—in music, it is primary. This is an intriguing and as yet poorly understood distinction between the experiences of listening to music and listening to speech.

A possible explanation is that musicality is a byproduct of cortical systems that were developed for speech and are supernormally stimulated by music. An opposing explanation, however, is also possible, namely that musicality precedes both language and music. In that case, musicality could be interpreted as a sensitivity that humans share with many nonhuman species, but in humans this predisposition has evolved into two partially overlapping cognitive systems: music and language.





I came across the beginnings of empirical evidence supporting this idea at a 2014 international conference in Austria. In a lecture, Michelle Spierings, a postdoctoral researcher at the University of Vienna, explained how zebra finches learned to identify differences between sound sequences. Michelle called them “syllables.” The sounds consist of human utterances such as “mo,” “ca,” and “pu.” The order of these speech sounds (syntax), as well as their pitch, duration, and dynamic range (spectral contour), is changed throughout a series of different behavioral experiments.

The zebra finches first learn the difference between the sequence Xyxy and xxyY, in which x and y stand for different speech sounds, and the capital letter for a musical accent: a bit higher, longer, or louder. For example: “MO-ca-mo-ca” as opposed to “mo-mo-ca-CA.”

The finches then listen to an unfamiliar sequence, with altered accents and structure. The purpose is to test which aspect of the speech sounds the birds use to make the distinction: the musical accent or the order of the elements.

As Michelle showed, humans make these distinctions primarily on the basis of the order of the elements: abab is different from aabb, while cdcd resembles abab. Humans “generalize” the structure of abab to the as yet unheard cdcd sequence. This supports the idea that humans focus mainly on the syntax, or the order of the elements, when listening to such a sequence. Syntax (word order, such as “man bites dog”) constitutes an important characteristic of language.

TigerStock’s / Shutterstock

By contrast, the zebra finches turn out to focus mostly on the musical aspects of the sequences. This does not mean they are insensitive to the order (in fact, they were able to learn it to some degree), but it is mainly the differences in pitch (intonation), duration, and dynamic accents—the musical prosody—that they use to differentiate the sequences.



Properly interpreted, the results could suggest that humans may share a form of musical listening with zebra finches, a form of listening in which attention is paid to the musical aspects of sound (musical prosody), not to the syntax and semantics that humans heed so closely in speech.

Once again, Darwin came to mind. Might the musical listening process of humans and zebra finches be closely related?

The research on starlings and zebra finches reveals that songbirds use the entire sound spectrum to gather information. They appear to have a capacity for listening “relatively,” that is, on the basis of the contours of the timbre, intonation, and dynamic range of the sound. This is a form of listening that had been observed earlier by music theoreticians and that led modern composers like Edgard Varèse, György Ligeti, and Kaija Saariaho to give timbre an important place in their compositions.

Relative pitch in humans can mean more than just hearing relationships between pitches. Familiar melodies in which the pitch is rendered unrecognizable can also be identified from the contours of other aspects of sound. But humans are seldom interested in spectral contours.

All of this raises intriguing questions: What is needed for a human to be able to listen like a songbird? Or, conversely, is it possible for a songbird to listen to music the way humans do?

Humans and songbirds have their own strategies and preferences when it comes to listening. In my own zebra finch study, I learned that rhythmic structure is not the first thing zebra finches pay attention to. The evidence appeared to suggest that zebra finches focus primarily on intonation, timbre, and dynamic differences and minimally on the temporal aspects of sound. In fact, musical prosody might well be more informative for zebra finches than the temporal structure of the song elements.

The results of the zebra finch study forced me to realize that what is obvious to humans is not necessarily obvious to animals. While I cannot help but hear regularity in regular rhythms, zebra finches appear to focus mainly on other “local” aspects, such as a single tone or time interval. This illustrates my favorite one-liner from the American psychologist James J. Gibson: “Events are perceivable but time is not.” The perception of time is only possible when something happens. In the case of zebra finches, this “event” seems to be the individual sounds to which they attribute certain characteristics and not so much the temporal structure of a sequence of sounds (the rhythm in which the sounds follow each other).

In this sense, humans listen more globally and abstractly, with greater attention to the whole. We are almost too good at seeing and hearing relationships, relationships which are often not there but have their source in our own experiences and expectations. This is why we find it surprising that other animals solve problems in ways seemingly much more complicated than our own. However, what is the simplest solution for us is not always the simplest solution for another animal species.





Consider this example of an unexpectedly simple solution for a difficult problem in the visual domain: to develop a search algorithm that can find photographs of airplanes on the Internet. This is a difficult task because many of the countless possible photographs include depictions of objects closely resembling airplanes, such as birds or other white or metallic objects against a blue background.

The classic method in artificial intelligence would be to create a knowledge-based system that codifies precise rules (interpretable by a computer) about what does and does not constitute an airplane. The list could be quite long: an elongated and symmetrical object, two wings, a nose and a tail, small windows along both sides, a propeller on the nose or each wing, and so on. It is extremely challenging to compile a list of criteria that all airplanes would meet, but that would also allow airplanes to be distinguishable from, for example, birds and other airplane-like objects.

Hold your smartphone close to someone singing the same song, and the software will have no idea what is being sung.

Recent computer simulations convincingly demonstrate that the most efficient way to determine whether an object in a photograph is or is not an airplane is, surprisingly, not to use a knowledge-based system. All the complicated reasoning turns out to be superfluous. The question—is there or is there not an airplane in the photograph?—can be answered much more simply and efficiently by focusing on one detail alone: Is there or is there not a nose wheel in the photograph?



Zebra finches and other animals that regularly take part in categorization experiments may be able to do just that. They listen, so to speak, to the “nose wheel” of the music: a detail that has little to do with the essence of the music. The bird remembers and recognizes one distinct detail, a detail that has resulted in food often enough to make it worthwhile for the bird to continue to focus on it.

What we know for sure is that humans, songbirds, pigeons, rats, and some fish (such as goldfish and carp) can easily distinguish between different melodies. It remains highly questionable, though, whether they do so in the same way as humans do, that is, by listening to the structural features of the music.

A North American study using koi carp—a fish species that, like goldfish, hears better than most other fish—offers an unusual example. Carp are often called “hearing specialists” because of their good hearing. The sensitivity of a carp’s hearing can be compared to the way sounds might be heard over a telephone line: Though quality may be lacking in the higher and lower ranges, the carp will hear most of the sounds very clearly.

Three koi—Beauty, Oro, and Pepi—were housed in an aquarium at Harvard University’s Rowland Institute, where they had already participated in a variety of other listening experiments. In the earlier experiments, they had learned they would receive food if they pressed a button at the bottom of the tank, but only if music was heard at the same time. The current experiment concentrated on the carps’ music-distinguishing ability. As well as being taught to differentiate between two pieces of music (discrimination), Beauty, Oro, and Pepi were observed to see if they could recognize whether unfamiliar pieces of music resembled other compositions (categorization).

In the discrimination experiment, the koi were exposed to compositions by Johann Sebastian Bach and the blues singer John Lee Hooker to see whether they could differentiate between the two. In the categorization experiment, the koi were tested to see if they could classify a composition as belonging to either the blues or the classical genre. In the latter experiment, they were alternately exposed to recordings of different blues singers and classical composers ranging from Vivaldi to Schubert.

The surprising outcome was that all three koi were able to distinguish not only between compositions by John Lee Hooker and Bach, but also between the blues and classical genres in general. The fish appeared to be able to generalize, to correctly classify a new, as yet unheard piece of music based on a previously learned distinction.

But what was the basis for the kois’ decisions? How did they make the distinction? And what exactly did they listen to? If nothing else, the study clarified that they did not make the distinction based on the timbre of the music, because even when the classical and blues melodies were played on an instrument with a different timbre, the koi were still able to distinguish between them.

The koi research was inspired by a 1984 study describing the music-distinguishing ability of rock doves. It turned out that rock doves, too, can distinguish between compositions by Bach and Stravinsky. And, like carp, rock doves can also generalize what they have learned from only two pieces of music to other, unfamiliar pieces of music. They can even distinguish between compositions by contemporaries of Bach and Stravinsky.

Rock doves and carp are able to do something that is quite difficult for the average human listener: judge whether a piece of music was composed in Bach’s time (the 18th century) or Stravinsky’s (the 20th century). Moreover, these species can do all of this with no significant listening experience, no extensive music collection, and no regular concert attendance. I suspect that they perform the task on the basis of one distinct detail. This, in itself, is an exceptional trait. Most likely it is a successful tactic to generate food. Yet it still offers no insight into the “perception, if not the enjoyment,” of music. That may be one aspect of musicality that belongs to humans alone.





Henkjan Honing is Professor of Music Cognition at the University of Amsterdam. He is the author of The Evolving Animal Orchestra: In Search of What Makes Us Musical and Musical Cognition: A Science of Listening.

This article was adapted from The Evolving Animal Orchestra: In Search of What Makes Us Musical, published in 2019 by MIT Press.

Lead Image: Leigh Prather / Shutterstock