The length of your vocal tract depends mostly on physiology: Women’s vocal folds tend to be higher up, so their tracts are shorter. The shape is largely based on where you put your tongue, like when you place the tip of your tongue between your teeth to make a th sound. By moving your tongue around in your mouth and opening and closing your lips, you change the sounds you’re making, and the formants you see in the spectrogram.

Chelsea Sanker, a phonetician at Brown University, looked at the spectrogram above to help me figure out what was going on. (For the record, when Sanker listened to the recording, she “[could] not hear it as having ls at all.” Point to yanny.)

First of all, the clip is, according to Sanker “not prototypical” of either laurel or yanny. It’s somewhere in the middle. Sanker said the l/y discrepancy might come from the fact that the sound there isn’t velarized—the speaker’s tongue isn’t touching the back of their soft palate (the velum), as many American English speakers do when they say an l. The middle consonant is definitely not an n, Sanker said, but you might hear one because the vowel in front of it sounds particularly nasal. People who hear laurel are hearing a syllabic l in the second syllable, which has some similarities to the vowel sound at the end of yanny. Both are sonorants—you could go on singing them until you run out of air, as opposed to an obstruent like p or t.

One of the more interesting things to come out of the yanny/laurel debate was the discovery that, by changing the pitch of the recording, you could adjust what you heard. In general, people heard yanny more consistently when the pitch was lower and laurel when the pitch was higher.

This makes perfect sense. When it’s not being shifted around via computer program, the pitch of your voice depends on how thick and how tense your vocal folds are. It’s entirely independent of the formants, which are based on how long your vocal tract is and where you’re constricting it. In real life, when you raise or lower your voice, the formants remain unaffected. When vocal recordings are pitch-shifted, though, the formants are actually shifted, too. But even in shifted recordings, we’re still biased to think that the formants of low voices sound high, and the formants of high voices sound low.*



When the speaker’s voice is artificially lowered, we’re inclined to hear the formants as if they’ve been raised; if the speaker’s voice is raised, we think the formants sound lower. The sounds in yanny generally have higher formants and fewer dips than the sounds in laurel—to see for yourself, here are spectrograms of my colleague Robinson Meyer saying each word:

Plenty of things could be influencing your interpretation of yanny/laurel, including your dialect and whether you listened to the recording over a speaker or headphones. People have a tendency to try to match the sounds they hear onto real words that they’ve heard before, like laurel. But reading yanny first, since it appears on the left side of the poll, could have primed listeners to hear it over laurel.