Douglas Peebles / age fotostock / SuperStock

In Fiji, a star is a kalokalo. For the Pazeh people of Taiwan, it is mintol, and for the Melanau people of Borneo, bitén. All these words are thought to come from the same root. But what was it?

An algorithm devised by researchers in Canada and California now offers an answer — in this case, bituqen. The program can reconstruct extinct ‘root’ languages from modern ones, a process that has previously been done painstakingly ‘by hand’ using rules of how linguistic sounds tend to change over time.

Statistician Alexandre Bouchard-Côté of the University of British Columbia in Vancouver, Canada, and his co-workers say that by making the reconstruction of ancestral languages much simpler, their method should facilitate the testing of hypotheses about how languages evolve. They report their technique in the Proceedings of the National Academy of Sciences1.

Automated language reconstruction has been attempted before, but the authors say that earlier algorithms tended to be rather intractable and prescriptive. Bouchard-Côté and colleagues' method can factor in a large number of languages to improve the quality of reconstruction, and it uses rules that handle possible sound changes in flexible, probabilistic ways.

The program requires researchers to input a list of words in each language, together with their meanings, and a phylogenetic ‘language tree’ showing how each language is related to the others. Linguists routinely construct such trees using techniques borrowed from evolutionary biology.

Language trees

The algorithm can automatically identify cognate words (ones with the same root) in the languages. It then applies rules known to govern sound changes to deduce the probable root of each set of cognates. For example, sounds that are always paired will tend to get condensed into one if no semantic information is lost.

The researchers tested their approach on 637 Austronesian languages spoken mainly on islands in Southeast Asia and the Pacific, including Malaysia, the Philippines and Indonesia. Manual methods have been used previously to reconstruct the protolanguage of this large group, thought to have come originally from Taiwan.

Bouchard-Côté and his colleagues found that their predictions matched those of the manual method in about 85% of cases (including bituqen). “Our system uses only a subset of the factors taken into consideration by a linguist, so we feel most of the discrepancies reflect things to be improved in our method,” admits Bouchard-Côté.

“It looks as though this method could be a very useful labour-saving device”, says linguist Don Ringe of the University of Pennsylvania in Philadelphia. But he cautions that methods that are “correct or nearly correct in about 85% of the cases will never be good enough. Our reconstructions might be no better than an approximation, and if we settle for what look like approximations even to us, we might be plain wrong.”

Bouchard-Côté and his colleagues used the method to test a hypothesis about language evolution first proposed in 1955 (ref. 2), which states that sounds that are important for distinguishing words from each other are more resistant to change. Such a pattern is almost impossible to spot for just a few languages, but it emerged clearly from the data set of 637 languages.

This ‘functional load' hypothesis had been viewed with some scepticism, and Ringe says that “the demonstration that there might be something to it after all is interesting”.

He adds that “it’s refreshing to find colleagues in other disciplines tackling a problem that historical linguists actually care about”.