As tone languages go, Mandarin is by no means the most complicated. The Hmong language, spoken in China, Vietnam, Laos, and Thailand, can have seven or even eight tones. It’s dazzling, really. If you say paw like a statement, it means “female.” Say it like a question and it means “to throw.” Say it up high in an impatient way and you’re saying “ball.” Say it down low as if you ran into someone in a basement and didn’t want anyone upstairs to know you were down there, and it means “thorn.” Say it in a tone between the impatient high and the down-low and it means “pancreas.” If you say paw in a creaky way—kind of like the way one might imitate an elderly person’s voice—then it means “to see,” while if you say it in a breathy, amazed way as if you were seeing a horsey in the clouds, then it means “paternal grandmother.” (For what it’s worth, maternal grandmother is tai, said on the “basement” tone.)

Tone languages are spoken all over the world, but they tend to cluster in three places: East and Southeast Asia; sub-Saharan Africa; and among the indigenous communities of Mexico. Why there and not elsewhere? One thing these regions might have in common is heat, though it’s hard to imagine how that would make people speak more melodically. Yet environment may not be entirely unrelated to the phenomenon—according to one hypothesis, tone languages are less likely to develop in dry environments because dry air deprives the vocal cords of the suppleness required to produce subtle differences in tone.

The jury is still out on that one, but even if it turns out to be true, it only gets us so far. The theory proposes that where the climate isn’t dry, there’s no predicting whether a language will take on tones or not. As such, it’s easy to suppose—and fun to imagine—that people decide to “sing” language out of some kind of cultural impulse. The reality is less groovy, but just as interesting.

It’s ultimately a matter of one thing leading to another. Take the words “pay” and “bay.” It looks like the only difference between them is that they start with different letters, but there’s more to it up close. English-speakers tend to say the ay sound on a slightly lower pitch after a b than after a p, because of the different mechanics involved in saying those consonants. That is, one tends to say “pay” a little higher than one says “bay.” In daily life that’s so subtle as to be barely noticeable: What stands out is the good old difference between p and b. But p and b are very similar sounds, and sounds that are similar have a way of melting together—a Cockney English-speaker can say “bref” for breath and “fing” for thing because the f and th sounds are made close together at the front of the mouth. Suppose as time went by English-speakers started pronouncing b as p so that there was no more b sound at all?