We know a lot about language, but we know very little about how speech developed. Did we start with gesturing and grunts? Beating our chests and pointing? Most linguists agree that some combination of movement and sound probably got us started. But how did we decide which sounds to use for various words? Now, an experimental game has shown that speakers of English might use qualities like the pitch and volume of sounds to describe concepts like size and distance when they invent new words. If true, some of our modern words may have originated from so-called iconic, rather than arbitrary, expression—a finding that would overturn a key theory of language evolution.

For years, mainstream linguists have said that most of the sounds we use have no meaning. A few words—think “splash” and “bow-wow” in English—clearly have their origins in the noises of the natural world, and the universal “mama” might be the result of an infant puckering up for a kiss of milk. These kinds of words have what linguists refer to as iconicity—the ability to evoke an image in the mind’s eye. But the vast majority of words, from “fish” to “sushi,” are arbitrary. Or at least that’s what linguists thought.

To explore the idea, researchers asked pairs of students at the University of California, Santa Cruz, to invent new words for 18 contrasting ideas: up, down, big, small, good, bad, fast, slow, far, near, few, many, long, short, rough, smooth, attractive, and ugly. Their partners then had 10 seconds to guess which one of the ideas the “word inventors” were describing. The students weren’t allowed to use body language or facial expressions though they were sitting face to face, and they weren’t allowed to use sounds related to similar English words.

Surprisingly enough, the partners scored better than chance on the first round. And during subsequent rounds of the game, students got faster and more accurate at guessing which word was being created. Analyzing the data, author Marcus Perlman—a cognitive scientist at the University of Wisconsin, Madison—says the guessers were successful because the inventors consistently used certain types of vocalizations with certain words. For example, made-up words for “up” had a rising pitch, whereas made-up words for “down” had a falling pitch. “Slow” had a long duration and a low pitch, whereas “fast” had a short duration and high pitch. And “smooth” had a high degree of harmonicity, whereas “rough” had a high degree of the opposite quality—noise. Overall, each of the new words varied reliably from its opposite in at least one feature, and 57% of the words had unique prosodic “calling cards.”

Perhaps more surprising, when Perlman ran a second experiment with subjects recruited from a labor crowdsourcing website, they also guessed better than chance when listening to sample vocalizations. Their guesses were not nearly as good as the face-to-face participants—35.6% right versus 82.2%—but they had only one round in which to make their guess.

“It’s interesting to me that people are so consistent in their ideas of how to express these different meanings,” Perlman says. “[Students playing the game] are nervous at first, and they don’t have any idea of how to express these meanings the first time through. But, lo and behold, they are actually very consistent in what they do. They all share similar intuitions.”

This isn’t the first time that sound has been linked to meaning. In a series of cross-linguistic experiments, researchers have demonstrated something they call the “kiki-bouba” effect. Subjects consistently link round objects with rounded, back vowels (like the “ou” in bouba), and they link sharp, angular objects with unrounded, frontal vowels (like the “ee” sound in kiki). And developmental studies have shown that children often grasp iconic words first, whether they are using spoken language or sign language. Sotaro Kita, a psycholinguist at the University of Warwick in the United Kingdom who was not involved in the study, says the Perlman work is “theoretically very important,” and could “knock out” a common explanation for language evolution: that humans developed gestural language first, and only much later moved on to spoken language. Instead, says Kita, it is much likelier that gestures and spoken language evolved in lockstep.

“This study is informative about how language might have emerged because it asks people to create acoustic labels for different concepts,” Kita says. “So there is a kind of universal code that people are tapping into to express these concepts.”

Of course, the study used only native English speakers, and Kita says it’s crucial to test speakers of other languages to see if the findings hold. Perlman is already taking that advice to heart. Recently, he ran a similar study among deaf children at a boarding school in rural China. He found that the deaf children and their hearing counterparts, all native Mandarin speakers, consistently used longer and louder sounds to make up words for big objects and shorter and softer to make up words for small objects. There was one difference from the English-language speakers: They used higher pitches for bigger objects and lower pitches for smaller objects. Perlman suspects the tendency may have something to do with Chinese folk performances, which use high pitch to express strength and power.

University College London psycholinguist Gabriella Vigliocco praises Perlman’s work with the English speakers and says that future studies should also expand the classes of words used in the experiment. “From the data they have, there’s a big jump to the conclusions that they are making. But you have to start somewhere. I’m quite sympathetic with the conclusions.”