My sister has a rare talent for mishearing lyrics. When we were younger, song meanings would often morph into something quite different from their original intent. In one Wallflowers hit, for instance, she somehow turned “me and Cinderella” into “the incinerator.” My favorite, though, remains that classic of the swing age, “Drunk driving, then you wake up”—a garbling of the Louis Prima hit that saw a brief resurgence in the nineties, “Jump, Jive, an’ Wail.”

My sister’s creation of a night of drunk driving from jumping and jiving is actually a common phenomenon, with the curious name mondegreen. “Mondegreen” means a misheard word or phrase that makes sense in your head, but is, in fact, entirely incorrect. The term mondegreen is itself a mondegreen. In November, 1954, Sylvia Wright, an American writer, published a piece in Harper_’_s where she admitted to a gross childhood mishearing. When she was young, her mother would read to her from the “Reliques of Ancient English Poetry,” a 1765 book of popular poems and ballads. Her favorite verse began with the lines, “Ye Highlands and ye Lowlands / Oh, where hae ye been? / They hae slain the Earl Amurray, / And Lady Mondegreen.” Except they hadn’t. They left the poor Earl and “laid him on the green.” He was, alas, all by himself.

Hearing is a two-step process. First, there is the auditory perception itself: the physics of sound waves making their way through your ear and into the auditory cortex of your brain. And then there is the meaning-making: the part where your brain takes the noise and imbues it with significance. That was a car alarm. That’s a bird. Mondegreens occur when, somewhere between the sound and the meaning, communication breaks down. You hear the same acoustic information as everyone else, but your brain doesn’t interpret it the same way. What’s less immediately clear is why, precisely, that happens.

The simplest cases occur when we just mishear something: it’s noisy, and we lack the visual cues to help us out (this can happen on the phone, on the radio, across cubicles—basically anytime we can’t see the mouth of the speaker). One of the reasons we often mishear song lyrics is that there’s a lot of noise to get through, and we usually can’t see the musicians’ faces. Other times, the misperceptions come from the nature of the speech itself, for example when someone speaks in an unfamiliar accent or when the usual structure of stresses and inflections changes, as it does in a poem or a song. What should be clear becomes ambiguous, and our brain must do its best to resolve the ambiguity.

Human speech occurs without breaks: when one word ends and another begins, we don’t actually pause to signal the transition. When you listen to a recording of a language that you don’t speak, you hear a continuous stream of sounds that is more a warbling than a string of discernable words. We only learn when one word stops and the next one starts over time, by virtue of certain verbal cues—for instance, different languages have different general principles of inflection (the rise and fall of a voice within a word or a sentence) and syllabification (the stress patterns of syllables)—combined with actual semantic knowledge. Very young children can make mistakes that shed light on how the process actually develops. In “The Language Instinct,” Steven Pinker points out a few near-misses: “I am heyv!” as a response to “Behave!”; “I don’t want to go to your ami” in reply to going to Miami. People immersed in an environment with a new language often initially experience the same thing: a lack of clear ability to tell what words, exactly, should properly emerge from the sounds that are being spoken. Most likely, my sister’s unconventional talent stems partly from the fact that English is not our first language. For us, on a basic level, word processing will always be just a bit different from that of native English speakers.

A common cause of mondegreens, in particular, is the oronym: word strings in which the sounds can be logically divided multiple ways. One version that Pinker describes goes like this: Eugene O’Neill won a Pullet Surprise. The string of phonetic sounds can be plausibly broken up in multiple ways—and if you’re not familiar with the requisite proper noun, you may find yourself making an error. In similar fashion, Bohemian Rhapsody becomes Bohemian Rap City. Children might wonder why Olive, the other reindeer, was so mean to Rudolph. And a foreigner might become confused as to why, in this country, we entrust weather reports to meaty urologists or why so many people are black-toast intolerant. Oronyms result in not so much a mangling as an incorrect parsing of sounds when context or prior knowledge is lacking.

Other times, the culprit is the perception of the sound itself: some letters and letter combinations sound remarkably alike, and we need further cues, whether visual or contextual, to help us out. In their absence, one sound can be mistaken for the other. For instance, in a phenomenon known as the McGurk effect, people can be made to hear one consonant when a similar one is being spoken. “There’s a bathroom on the right” standing in for “there’s a bad moon on the rise” is a succession of such similarities adding up to two equally coherent alternatives. (Peter Kay offers an auditory tour of some other misleading gems.)

What usually prevents us from being tripped up by phonetics is the context and our own knowledge. When we hear a word or phrase, our brain’s first cue is the actual sounds, in the order in which they are produced. According to the cohort model—one of the leading theories of auditory word processing—when we hear sounds, a number of related words are activated all at once in our heads, words that either sound the same or have component parts that are the same. Our brain then chooses the one that makes the most sense. For instance, if I’m talking about the role of the syllable in language comprehension, you’re also, on some level, thinking about a silly-looking ball rolling away. You’re also considering the smaller snippets that form each word’s makeup: roe, along with roll; sill, along with silly and syllable. Only after I say the whole phrase do you understand what I’m saying. Songs and poems, in some sense, lie between conversational speech and a foreign language: we hear the sounds but don’t have the normal contextual cues. It’s not as if we were mid-conversation, where the parameters have already been set.

Along with knowledge, we’re governed by familiarity: we are more likely to select a word or phrase that we’re familiar with, a phenomenon known as Zipf’s law, according to which the actual frequency of a word can affect how seamlessly it’s processed. If you’re a member of the crew team, you’re far more likely to select “row” instead of “roe” from an ambiguous sentence. If you’re a chef, the opposite is likely. One of the reasons that “Excuse me while I kiss this guy” substituted for Jimi Hendrix’s “Excuse me while I kiss the sky” remains one of the most widely reported mondegreens of all time can be explained in part by frequency. It’s much more common to hear of people kissing guys than skies. Expectations, too, play a role. You’re much more likely to mishear “Cry Me a River” as “Crimean River” if you’ve recently been discussing the situation in Ukraine.