Read a few lines of Chaucer or Shakespeare and you'll get a sense of how the English language has changed during the past millennium. Linguists catalogue these changes and work to discern why they happened. Meanwhile, evolutionary biologists have been doing something similar with living things, exploring how and why certain genes have changed over generations.

In a new study published in Nature, researchers in these two academic fields have joined forces at the University of Pennsylvania to solve an essential problem of how languages evolve: determining whether language changes occur by random chance or by a selective force.

Examining substantial collections of annotated texts dating from the 12th to the 21st centuries, the researchers found that certain linguistic changes were guided by pressures analogous to natural selection -- social, cognitive and other factors -- while others seem to have occurred purely by happenstance.

"Linguists usually assume that when a change occurs in a language, there must have been a directional force that caused it," said Joshua Plotkin, professor of biology in Penn's School of Arts and Sciences and senior author on the paper. "Whereas we propose that languages can also change through random chance alone. An individual happens to hear one variant of a word as opposed to another and then is more likely to use it herself. Chance events like this can accumulate to produce substantial change over generations. Before we debate what psychological or social forces have caused a language to change, we must first ask whether there was any force at all."

"One of the great early American linguists, Leonard Bloomfield, said that you can never see a language change, that the change is invisible," said Robin Clark, a coauthor and professor of linguistics in Penn Arts and Sciences. "But now, because of the availability of these large corpora of texts, we can actually see it, in microscopic detail, and begin to understand the details of how change happened."

Plotkin and Clark joined with lead authors Christopher A. Ahern, a Ph.D. student in the Department of Linguistics, and Mitchell G. Newberry, a Ph.D. student in the Department of Biology, on the work.

advertisement

Just as genomic analyses require massive amounts of data to see signs that one gene or another has risen in frequency over time in response to a selective pressure, this linguistic analysis required a large database of texts written over centuries to determine the role of selection in language evolution. These corpora are the result of generations of work, much of it by Penn linguists, to parse written texts and annotate parts of speech.

The researchers chose three well-characterized English language changes to evaluate for signs of selection.

One change is the regularization of past-tense verbs. Using the Corpus of Historical American English, comprising more than 100,000 texts ranging from 1810 to 2009 that have been parsed and digitized -- a database that includes more than 400 million words -- the team searched for verbs where both regular and irregular past-tense forms were present, for example, "dived" and "dove" or "wed" and "wedded."

They identified 36 such verbs. Using an analytical technique that Plotkin and colleagues had developed to detect natural selection in microbial populations, they studied the changing frequency of the different verb forms over time to conclude whether one had risen to dominance due to selective forces or due to chance.

For six of these verbs, the team found evidence of selection. In four of these cases, selection favored the irregular past tense form.

advertisement

"There is a vast literature and a lot of mythology on verb regularization and irregularization," Clark said, "and a lot of people have claimed that the tendency is toward regularization. But what we found was quite different."

Indeed, the analysis pointed to particular instances where it seems selective forces are driving irregularization. For example, while a swimmer 200 years ago might have "dived," today we would say they "dove." The shift towards using this irregular form coincided with the invention of cars and concomitant increase in use of the rhyming irregular verb "drive"/"drove."

The use of "quit" instead of "quitted," is another example that coincides with an overall increase in use of the rhyming irregulars "hit" and "split." Meanwhile "split" has taken on a new meaning since 1900: to depart.

"If you have a phonetic neighborhood with lots of rhyming irregular verbs, it acts like a gravitational force and makes it more likely that the past tense of other rhyming verbs will irregularize," said Clark.

Despite finding selection acting on some verbs, "the vast majority of verbs we analyzed show no evidence of selection whatsoever," Plotkin said.

The team recognized a pattern: random chance affects rare words more than common ones. When rarely-used verbs changed, that replacement was more likely to be due to chance. But when more common verbs switched forms, selection was more likely to be a factor driving the replacement.

The authors also observed a role of random chance in grammatical change. The periphrastic "do," as used in, "Do they say?" or "They do not say," did not exist 800 years ago. Back in the 1400s, these sentiments would have been expressed as, "Say they?" or "They say not."

Using the Penn Parsed Corpora of Historical English, which includes 7 million syntactically parsed words from 1,220 British English texts, the researchers found that the use of the periphrastic "do" emerged in two stages, first in questions ("Don't they say?") around the 1500s, and then roughly 200 years later in imperative and declarative statements ("They don't say.").

While most linguists have assumed that such a distinctive grammatical feature must have been driven to dominance by some selective pressure, the Penn team's analysis questions that assumption. They found that the first stage of the rising periphrastic "do" use is consistent with random chance. Only the second stage appears to have been driven by a selective pressure.

"It seems that, once 'do' was introduced in interrogative phrases, it randomly drifted to higher and higher frequency over time," said Plotkin. "Then, once it became dominant in the question context, it was selected for in other contexts, the imperative and declarative, probably for reasons of grammatical consistency or cognitive ease."

The researchers also confirmed longstanding hypotheses about selection operating to change the form of verbal negation, as "Ic ne secge" changed to "I ne seye not" and then to "I say not," from Old to Early Modern English. Previous support for this hypothesis relied on comparison across multiple languages, whereas the Penn team established the same result based on data from English alone.

The research team is continuing its collaboration, with plans to explore the forces at work in linguistic features such as baby naming as well as the evolution of spoken language.

As the authors see it, it's only natural that social-science fields like linguistics increasingly exchange knowledge and techniques with fields like statistics and biology.

"To an evolutionary biologist," said Newberry, "it's important that language is maintained through a process of copying language; people learn language by copying other people. That copying introduces minute variation, and those variants get propagated. Each change is an opportunity for a different copying rate, which is the basis for evolution as we know it."

"To be able to see this kind of microscopic detail in social evolution, that's a big deal, that's something we can sink our teeth into," said Clark. "By looking at the analogies between social science and biology, this work is pushing toward a unification between the two fields. I think both sides stand to gain."

The study was supported by the University of Pennsylvania Research Foundation, David & Lucile Packard Foundation, U.S. Defense Advances Research Projects Agency (Grant D12AP00025) and U.S. Army Research Office (Grant W911NF-12-1-0552).