Humans are obviously pretty special when it comes to language. One of our cleverest tricks is the ability to process the sounds of spoken language at high speed—even more remarkable when you consider just how variable these sounds are. People have very different voices and very differently shaped throats and mouths, which all affect the sound waves that come out of them. And yet we have very little trouble communicating with speech.

There are many ways to try to figure out how this wizardry evolved, but one particularly useful source of information is birds. Their evolutionary relationship to humans goes pretty far back on the family tree, so anything unusual we have in common with them—like vocal learning—is unlikely to be because of our shared genetic history. Instead, it's more likely to result from similar evolutionary pressures causing both of us to hit on similar solutions.

This is why a paper in this week's PNAS is so fascinating: it found that songbirds process sounds in a way that is very similar to humans. Like us, they're able to process how all the complex frequencies bound up in a single sound relate to one another. It’s very close to how humans process vowels.

Perfect pitch

Cognitive scientists Micah Bregman, Aniruddh Patel, and Timothy Gentner thought there was something missing in our understanding of how songbirds recognize one another’s tunes. Songbirds are very good at learning and recognizing melodies, but they have a curious limitation that humans don’t have: they don’t recognize the same melody in different keys. If someone plays “Mary Had a Little Lamb” starting on a particular note and then plays it again starting on a different note, humans have no problem recognizing that it’s the same song because the notes in the song are still all the same distance apart. Songbirds can’t do this.

The team, led by Gentner, writes that this has led to a widespread expectation that songbirds use absolute pitch to recognize songs. Rather than hearing how far one note is from the next note and using that information to remember how songs sound (like most humans do), absolute pitch means knowing what each note sounds like on its own—recognizing the frequency of the sound wave. When humans do this, we call it perfect pitch.

Gentner’s team argues that there's a problem with this idea: all the previous work investigating songbird pitch perception had used simple sounds. But sounds in the real world are rarely simple. They rarely consist of a pure sound wave oscillating on its own at a particular frequency. Instead, they’re a whole bundle of waves moving at the same time, creating a complex sound made up of numerous frequencies called harmonics. Using complex sounds would be a more accurate test of birds’ natural abilities, the researchers write.

In humans, the processing of complex sounds is very important because it’s how we understand speech. The sound waves sent out by our voice boxes pick up additional frequencies created by the shape of our throats and mouths, and we use our tongues and lips to shape these additional frequencies and make a wide range of speech sounds. For instance, each vowel has a different set of relationships between all the simultaneous sound waves our throats and mouths produce. That’s why an “aaa” sounds like an “aaa,” even if it’s produced by the teeny throat of a two-year-old child or the monstrous laryngeal cavity of a professional wrestler—the relationships between the frequencies in the sound are the same, even if the pitch is very different.

Starlings and vocoders

The researchers set out to test complex sound recognition in a species of songbird, the European Starling. They trained starlings to recognize short sequences of four notes. Each of the four notes was played by a different musical instrument, meaning that each note had a different timbre, or sound quality. The sequence of notes either ascended from a low note to a high note or descended from high to low.

Once the starlings were recognizing the sequences with very high accuracy levels—around 90 percent—they tested whether the birds could recognize the same sequences starting at a different pitch but played by the same musical instruments (that is, with the same timbre). Recognition plummeted to around chance levels.

The relationship of one note to the next wasn’t enough for the birds to recognize the shifted sequence, and neither was the timbre of the different musical instruments. But the researchers found that birds didn't recognize specific notes, either. They also tested the birds on sequences where they kept the pitch the same but changed the timbre to piano (not one of the original instruments used). Again, the birds were pretty bad at recognition—even though the sequences were the same as what they’d originally learned, just with a different instrument.

That suggests that the birds aren’t using absolute pitch by itself in their sequence recognition. They're using timbre, too. This revelation means, the authors write, that the birds are using the “absolute spectral envelope”—all the information that a single sound carries about the relationships among its internal frequencies.

To test this idea, they conducted a final test by running the note sequences through a vocoder, which preserved the absolute spectral envelope but removed absolute pitch information. Then they compared this setup to the same sequences on piano, which kept pitch intact but discarded other spectral information. The birds had a far easier time with the vocoder, suggesting that the totality of spectral information, not just pitch, is what they needed to identify the note sequences.

More like vowels than songs

The strange thing about this phenomenon is how little the starlings seemed to use pitch compared to how much humans use it in musical perception. Our capacities for music and song are startlingly similar in many ways, but this is a big and important difference.

When humans hear vocoder-processed music, it really changes what we think we’re hearing. To the birds, it didn’t change much at all. That's much closer to the human process of speech perception—we have no problem recognizing speech that has gone through a vocoder. The implication here is that bird sound recognition is closer to how humans process speech than to how we process music, the authors write.

Understanding these differences and similarities between birds and humans can help us understand our cognitive machinery better. It can help to channel and define research that’s trying to figure out the cognitive and physical tools of sound perception, by allowing us to study it in both humans and birds. So to make progress in understanding our own remarkable sound-processing abilities, we need to keep playing vocoder to birds. Poor birds.

PNAS, 2015. DOI: 10.1073/pnas.1515380113 (About DOIs).