“When you take into account the complexity of what’s going into these short turns, you start to realize that this is an elite behavior,” says Levinson. “Dolphins can swim amazingly fast, and eagles can fly as high as a jet, but this is our trick.”

Conversation analysts first started noticing the rapid-fire nature of spoken turns in the 1970s, but had neither interest in quantifying those gaps nor the tools to do so. Levinson had both. A few years ago, his team began recording videos of people casually talking in informal settings. “I went to people who were sitting outside on the patio and asked if it was okay to set up a video camera for a study,” says Tanya Stivers.

While she recorded Americans, her colleagues did the same around the world, for speakers of Italian, Dutch, Danish, Japanese, Korean, Lao, ≠Akhoe Hai//om (from Namibia), Yélî-Dnye (from Papua New Guinea), and Tzeltal (a Mayan language from Mexico). Despite the vastly different grammars of these ten tongues, and the equally vast cultural variations between their speakers, the researchers found more similarities than differences.

The typical gap was 200 milliseconds long, rising to 470 for the Danish speakers and falling to just 7 for the Japanese. So, yes, there’s some variation, but it’s pretty minuscule, especially when compared to cultural stereotypes. There are plenty of anecdotal reports of minute-long pauses in Scandinavian chat, and virtually simultaneous speech among New York Jews and Antiguan villagers. But Stivers and her colleagues saw none of that.

Instead, they uncovered what Levinson describes as a “basic metabolism of human social life”—a universal tendency to minimize the silence between turns, without overlaps. (Overlaps only happened in 17 percent of turns, typically lasted for just 100 milliseconds, and were mostly slight misfires where one speaker unexpectedly drew out their last syllable.)

The brevity of these silences is doubly astonishing when you consider that it takes at least 600 milliseconds for us to retrieve a single word from memory and get ready to actually say it. For a short clause, that processing time rises to 1500 milliseconds. This means that we have to start planning our responses in the middle of a partner’s turn, using everything from grammatical cues to changes in pitch. We continuously predict what the rest of a sentence will contain, while similarly building our hypothetical rejoinder, all using largely overlapping neural circuits.

“It’s amazing, like juggling with one hand,” says Levinson. “It’s been completely ignored by the cognitive sciences because traditionally, people who studied language comprehension were different to the ones who studied language production. They never stopped to think that, in conversations, these things are happening at the same time.”