Since 2007, Ethnologue’s three-letter codes for languages have had the status of an ISO standard for languages. This has considerably enhanced their status in linguistics, and some linguists now use these codes (which were primarily intended as unique identifiers for technical and industrial purposes) in their prose texts, as additional identifiers of the language(s) they are talking about. But at the recent PARADISEC conference in Melbourne, Stephen Morey, Mark W. Post and Victor A. Friedman launched a direct attack on them (Morey et al. 2013). It seems to me that this is a very useful step, because it may lead to a serious public debate of what it is that we can expect from a standard language catalogue such as Ethnologue. (Some discussion of the issues has been taking place in closed circles or small workshops, e.g. at the Leipzig language catalogue workshop in 2007, but this is the first highly critical voice at a major conference, as far as I know.)

Morey et al. have five main points of criticism, the first two of which could in principle be addressed by changing the procedures (and have actually been addressed by Glottolog):

(1) Three-letter codes are problematic, because quite a few of them are based on obsolete and offensive designations for languages and speech communities (e.g. [jnj] for Yemsa, based on pejorative “Janejero”).

(2) Administration of ISO 639-3 codes by SIL International is problematic because it is a missionary organization, which does not work with sufficient transparency and accountability.

(3) Permanent identification of a language is incompatible with the constantly changeable nature of human language.

(4) Languages and dialects cannot be readily distinguished by rigorous and practical methods, and inevitably, socio-political concerns often enter into the decisions taken by linguists.

(5) ISO 639-3 has the potential to be misunderstood and misused by governments and other decision-making bodies.

They also voice objections to Ethnologue’s attempt to classify all languages into subfamilies and families, but this is quite independent of ISO 639-3, so this issue should be kept separate from the language code issue.

I share their concern with issues (1)-(2) and (4)-(5), but point (3) is not a serious objection. Yes of course, languages change all the time, but it is also the case that we can identify them and write papers and books about them. We can also give different identifiers to different stages of a language. So this is not a reasonable objection.

The main question that I have is whether language identification should be a task for ISO, the International Organization for Standardization. As Morey et al. note, ISO provides standards to ensure that “materials, products, processes and services are fit for their purpose”. ISO is basically an organization or industry, not for science. Of course, scientists need to identify their phenomena and catalogue them, but normally it is scientific organizations that take on the task of standardizing names of entities, such as the International Mineralogical Association that administrates names of c. 6,500 minerals (roughly on the order of the number of languages), and it is the International Astronomical Union that takes care of naming issues with regard to heavenly bodies. The reason why ISO got involved in language name issues in the first place is of course the economic significance of translation and localization, which is far greater than the relevance of distant stars for businesses. But does this mean that someone needs ISO’s industry standard to identify little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction?

The inappropriateness of ISO as an organization for maintaining nomenclature standards for linguistics was brought home to me when in 2011 I organized a small workshop on language cataloguing in Leipzig. I was aware that ISO had created another standard called 639-6, which also contains codes for language families – obviously of great interest to linguists as well. I found out that 639-6 was entrusted to an organization called Geolang Ltd. I invited a representative of Geolang to our workshop, who turned out to have a lot of technical expertise, but zero knowledge of comparative linguistics. This seemed very odd to me – at least SIL International has demonstrated expertise in the study of languages!

What is the solution? Surely we do not want to do without any unique identifiers of languages. These are simply too useful for tagging books and articles in libraries, for maintaining catalogues of resources such as OLAC, for linking between encyclopedias like Wikipedia and other resources. So how do we address the issue of problems in distinguishing between dialects and languages? How do we address the problems of accountability and transparency? What about potential misuse?

It seems to me that the best solution is not to have one single place where language identification information is stored and maintained, but multiple places that link to each other. And in fact, de facto there are are already four authoritative places: In addition to Ethnologue, there is Wikipedia, MultiTree and Glottolog. These all have different philosophies, but they are partially interlinked: Ethnologue links to OLAC, which links to Glottolog; Wikipedia links to Ethnologue; and Glottolog links to everything else. So from Glottolog, one can easily go to the other places and check out what they have to say about the language that one is interested in at the moment.

Like Ethnologue, Glottolog (Nordhoff et al. 2013) has unique identifiers for languages, but these look less like abbreviations (they consist of four letters and four digits), so it is hopefully clear that their purpose is a purely technical one, to make it easy for machines to link one resource to another one. They should not appear in the body of a scientific text (at most in a footnote), which addresses problem (1) above. Glottolog aims to be fully transparent and accountable, addressing problem (2) above, though its degree of transparency is currently hampered by not fully adequate funding. In addition to languages, Glottolog also lists dialects, thus partly addressing issue (4). In the future, we may switch to a system where languages are recognized at multiple levels, so that everything is recognized as a language that anyone ever called a language. After all, it rarely matters to linguists whether what they are talking about is a language, a dialect or a close-knit family of languages. Thus, linguists may choose to work at the “languoid” level, ignoring the difference between languages and dialects. This addresses problems (4) and (5).

Some readers may object to Glottolog’s introduction of an additional code (the “Glottocode”), because it would be more practical if there were just a single code, and we could somehow work together to improve the current ISO 639-3. But I don’t see any single authority that could impose its views. We have no International Union of Linguists (CIPL is underfunded and many linguists have never heard of it), and in general, knowledge of and interest in languages is very decentralized, and different people will always have different views. So it seems better to accept the diversity of approaches, but to adopt Glottolog’s model of keeping multiple identifiers and linking to several other language catalogues. This does mean that a certain amount of work is duplicated, but it also means that there is less danger of a single dominant point of view eclipsing dissenting voices.

References

Morey, Stephen & Post, Mark W. & Friedman, Victor A. 2013. The language codes of ISO 639: a premature, ultimately unobtainable, and possibly damaging standardization. Handout of a talk given at the PARDISEC RRR Conference, December 2013.

Nordhoff, Sebastian & Hammarström, Harald & Forkel, Robert & Haspelmath, Martin (eds.) 2013.

Glottolog 2.1. Leipzig: Max Planck Institute for Evolutionary Anthropology.