Editor's note: The following excerpt from a book about swear words contains many, many swear words. Some of them are pretty ugly, but it's all in the name of linguistics.

For every profane holy, fucking, and shit, there’s a technical and anodyne liturgical, copulation, or excretion. For every cock, there’s a childlike wee-wee. Many words describing sexual organs, excretory functions, and so on fail to rise to the heights (or, if you prefer, sink to the depths) of profanity. These words are articulated without fear of offending, whether in the classroom or the courtroom or the examination room. They aren’t profane, despite referring to taboo concepts. This means that something beyond what a word denotes—what it refers to—must cement it as profanity.

What is that thing?

The most obvious possibility is that some aspect of how profane words are written or sound makes them vulgar. For example, many English profane words famously have four letters—fuck, shit, piss, cock, tits, and many others.

Excerpted from What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen Basic Books

Take a list of eighty-four commonly used swear words. Of the eighty-four words, twenty-nine are spelled with four letters. By this count, then, just over a third of profane words are four-letter words. This number may be artificially deflated, since many of the longer words (like asshole, motherfucker, and wanker) have shorter four-letter words embedded inside them. But it’s a good start.

The first thing to notice from this is that having four letters isn’t a necessary prerequisite for profanity. Certainly, we already knew this: words like ass and motherfucker don’t have four letters, and most of the words on the list have some number of letters other than four. Nor is having four letters sufficient, since many four-letter words are not at all profane, like four or word. So we have to reconsider the question we’re asking. The real issue seems to be whether having four letters makes a word more likely to be profane, all other things being equal.

You can see on the following chart that the answer is yes: there’s a sharp spike at four, representing those twenty-nine four-letter profane words. But is twenty-nine a lot? You can tell by comparing the lengths of profane words in blue bars with English words in general, shown in the red bars.

Benjamin K. Bergen/WIRED

As you can see, English has a lot of words with four, five, six, or seven letters. And in general English looks like a smoother version of the profane distribution. But what really sticks out is how many more profane four-letter words there are than expected from English in general.

The 29 profane four-letter words in our list are significantly more than you’d expect if profane words were like English words in general, in which case we’d expect only 12.6 profane four-letter words out of 84.

Perhaps more surprising is how many profane three- and five-letter words there are. There are relatively few three-letter words in English overall, and profane words are almost twice as likely to have three letters than you’d expect, all things being equal. We’ll come back to this in a moment, because it’s important. Less important but also notable is the little bump in eight-letter profane words, compared with the language in general. This is due to words composed of two four-letter words, like bullshit and shithead. Four-letter words appear to bend how English words look even when they’re merely parts of other words. But for our present purposes, it’s enough to note that profanity in English is strikingly more likely to have four letters than other words. The take-away is that there’s some truth to the popular notion about four-letter words.

What’s So Special About Four

So this raises the obvious question, why? Why are profane words more likely than other words to have four letters? What’s special about how these three- and four-letter words sound?

The three-letter words included in the list are ass, cum, fag, gay, god, Jew, and tit. And the four-letter words are anal, anus, arse, clit, cock, crap, cunt, dick, dumb, dyke, fuck, gook, homo, jerk, jism, jugs, kike, Paki, piss, scum, shag, shit, slag, slut, spic, suck, turd, twat, and wank.

Do you notice any general trend in how these words are pronounced? First, regardless of how many letters they’re spelled with, they tend to be pronounced with just one syllable. When you pronounce bitch and shit normally, they’re only one syllable long. Just a few words on the list have more than one syllable: anal, anus, homo, Paki, and, arguably, jism.

Now, this can’t possibly be the whole story, because there are thousands of one-syllable words in English, and most of them aren’t taboo. The profane words are but a speck in a sea of monosyllables. And if we’re just looking at three- and four-letter words, it’s no surprise that they’ll tend to be pronounced with one syllable or two. But these words don’t just tend to be monosyllabic. They tend to be built in a particular way. English allows many different types of syllable. Every syllable has a vowel at its core. For some syllables, the vowel is both the beginning and the end (the alpha and the omega, as it were), as in words like a, I, and uh. (Don’t be confused by spelling—there’s no h in the pronunciation of uh.*)

But most syllables also have consonants in them, before or after the vowel. So with this in mind, we can return to English profanity. If you briefly revisit the words in the lists above, you may notice something remarkable about their syllables.

Almost every word on those lists ends with one or more consonants. That is, they all have “closed syllables” rather than syllables sporting bare vowels. As you can see, many profane words even double down on their final consonants. Words like cunt and wank actually have two consonant sounds at the end. Interestingly, consonants seem pretty important in general—all but a few (like ass or arse) begin with at least one consonant, and many begin with two, like crap, prick, slut, and twat. But really the strong generalization here appears to be that syllables of profane words tend to be closed.

Could these two tendencies—a trend toward having just one syllable and another toward that one syllable being closed—be part of what makes profane words sound profane?

We can start to answer this by splitting our data in a different way—based not on how many letters a word is spelled with but on how many syllables it has and whether those syllables are closed. When we do that, we find that not just the three- and four-letter words are closed monosyllables; so are seven of the sixteen five-letter words, like balls, bitch, prick, and whore, but not Jesus or pussy. In all, thirty-eight of the eighty-four words on the list are one syllable long, and thirty-six of these (or 95 percent) are closed. Only two profane words on the list, Jew and gay, are “open” monosyllables. How does this ratio compare to the words of English more generally? I took the top 10 percent most frequent monosyllabic words from the MRC Psycholinguistic Database. It turns out that whereas 95 percent of our profane monosyllabic words are closed syllables, that number drops down to 81 percent when you look at nonprofane words, which is significantly lower.

You can probably find some profane open monosyllables. Like, potentially, ho, lay, poo, and spoo. These are good candidates. Maybe you can come up with one or two more. But consider: boob, bung, butt, chink, cooch, coon, damn, dong, douche, dump, felch, FOB, gook, gyp, hebe, hell, jap, jeez, jizz, knob, mick, MILF, mong, muff, nads, nards, nip, poon, poop, pube, pud, puke, puss, queef, quim, schlong, slant, slope, smeg, snatch, spank, spooge, spunk, taint, tard, THOT, toss, twink, vag, wang, and wop. And I’m only getting started. Run the numbers again with these new open and closed monosyllabic words, and you still have upward of nine out of ten profane monosyllables that are closed.

This pattern is statistically real, but we really want to know whether it’s psychologically real too. Do English speakers think that closed monosyllables sound more profane than open monosyllables? There are different ways to figure this out. Here’s one type of circumstantial evidence. When English speakers invent new, fictional swearwords, do they tend to be closed? For instance, when English-speaking fantasy and science fiction writers invent new profanity in imaginary languages, what do those words sound like? Battlestar Galactica has frak (“fuck”). Farscape has frell (also “fuck”). Mork & Mindy had shazbot (a generic expletive). Dothraki, the invented language in HBO’s Game of Thrones, has govak (“fucker”) and graddakh (“shit”). Not all are monosyllabic, but they all end with closed syllables. In fact, it’s very hard to find fictional profanity ending with open syllables. The one glaring counterexample I’ve been able to dig up comes from the movie Star Wars: Episode 1, in which poodoo means “bantha fodder” and is used as a weak expletive. Just by way of speculation, the open syllable might have been selected because the target audience of the movie appears to have been quite young (it was rated PG), and so a more profane-sounding fictional profanity could have felt too strong.

The Rich Get Richer

So not only does English profanity tend to be pronounced with closed monosyllables, but English speakers moreover think that closed monosyllables sound more profane than open ones. In terms of how languages work in general, this isn’t entirely unprecedented. Sometimes within a language, you will find clusters of words with similar meanings that happen to have similar forms. Consider words in English that have meanings related to light or vision. Many of them happen to start with gl. I’ll give you a few: glisten, glitter, gleam, glow, glare, glint. And there are many more, from glaucoma to glower. We’ve uncovered a little dense spot in the English lexicon where words with similar meanings have similar forms for no better reason than that they do.

The story of how these sets of similar words come about goes something like this. In general, words arbitrarily pair together forms and meanings. But because the words of any language are governed in part by chance, there will happen to be some places in the lexicon of a language where a couple words that have similar meanings happen also to have similar forms. People who learn and use this language may notice these little clusters, or they may not (for example, you may or may not have noticed English gl-words before), but over time the clusters will act as a form of attractor for new words.

Old words that are misheard, mislearned, or misremembered will be slightly more likely to gravitate toward the form and meaning of a cluster, which appears to have happened in the history of the gl-words in English. And new words that people invent will also be attracted to the clusters such that they’re slightly more likely than chance to have meanings and forms aligned with the growing pattern. This, too, has happened in the history of English: see examples like glitzy (in 1966) and glost (a glaze used in pottery, in 1875).

It’s also a factor in product naming—imagine which glass-cleaning spray you’d prefer to buy: Brisserex or Glisserex. Over centuries, maybe even millennia, these clusters are reinforced in a kind of rich-get-richer process until you have English, where a healthy 39 percent of words starting with gl relate to light or vision.

And perhaps this is what happened with English profanity. Perhaps through historical accident there came to be a core set of profane English words that happen to be pronounced with a closed monosyllable. They exerted a gravitational tug on words around them—existing words came to be pronounced similarly, and newly coined words were more likely to follow the same pattern. We can see this in our newest profanity: monosyllabic acronyms like MILF, THOT, and FOB tend to be closed.

Excerpted from What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen. 2016. Available from Basic Books, an imprint of Perseus Books, a division of PBG Publishing, LLC, a subsidiary of Hachette Book Group, Inc.