« previous post | next post »

In a post a few days ago ("Why you shouldn't use spellcheckers", 4/7/2009), Bill Poser argued that "if English had a decent writing system there would be no use for [spellchecking] software". I'm no defender of our current writing system — it makes life much harder than it should be for writers and readers alike, especially in the early stages of learning. But I think that Bill is overselling the potential benefits of reform.

Even for a language with a "decent writing system", a good spellchecker can be useful in catching typographical errors — slips of the finger or the brain. But there's a deeper argument for the defense. An alphabetic writing system can be lexically consistent only if it adopts and enforces a fairly elaborate set of spelling conventions, which people will not know merely by virtue of being able to speak the language being written. And therefore — if you want the same word to be spelled the same way every time it's written — a copy-editor that understands and applies those conventions is performing a valuable service. To the extent that a good spellchecker can do the same thing, it's also useful.

Of course, it's not obvious that a writing system needs to be "lexically consistent" in this sense. During Tudor and Elizabethan times, people wrote English in a catch-as-catch-can way. This was not just a matter of variation from person to person — the same person might spell the same word in different ways at different times, sometimes within the same paragraph. Thus according to Karl Elze's Biography of William Shakespeare, Sir Walter Raleigh signed his own name variously as Rauley and Ralegh as well as Raleigh; and "Edward Alleyn made use of the forms Aleyn, Alleyn, Allen, and Allin". Elze asserts that

The name Marlow is met with in ten different forms, Throckmorton in sixteen, Gascoigne in nineteen, Percey in twenty-three, Cholmondeley in twenty-five, Percival in twenty-nine, and Bruce in thirty-three different forms. And yet the name of Shakespeare is the one which exhibits the greatest variety of spellings, no less than fifty-five different forms having been counted [citation to Halliwell, Life of Shakespeare, pp. 278-283]; […] In the records of the Corporation of Stratford the name of John Shakespeare, the poet's father, occurs 166 times, and in the following fourteen different forms:–

1. Shackesper 4 times 2. Shackespere 3 times 3. Shacksper 4 times 4. Shackspere 2 times 5. Shakespere 13 times 6. Shaksper 1 times 7. Shakspere 6 times 8. Shakspeyr 17 times 9. Shakysper 4 times 10. Shakyspere 9 times 11. Shaxpeare 69 times 12. Shaxper 8 times 13. Shaxpere 18 times 14. Shaxpeare 9 times

And of course, it's not just proper names that are variable. In the LION database, works by writers between 1500 and 1600 spell "clothes" in five different ways that I could find:

clothes cloths clothys cloathes cloaths 429 11 8 74 17 79.6% 2% 1.5% 13.7% 3.2%

And LION's variants for "women" in that same period include [women | vveomen | vvoemen | vvomen | weemen | wemen | weomen | woemen | womenne | wommen | woomen | wymen | wymmen].

This lack of lexical consistency didn't prevent great works from being written and read. (Of course, it didn't get in the way of drivel, either.) Variable spelling obviously creates problems for indexing, record-keeping, looking things up in dictionaries, and so forth. On the other hand, standardizing spelling creates a significant additional task for school-children. I don't know what arguments were used in the 18th and 19th centuries to support the efforts to standardize English spelling — the contemporary discussions that I've been able to find seem simply to assume that it's obviously a Good Thing, without engaging any counter arguments in a serious way. For the purposes of this post, I'm going to assume this same conclusion, similarly without argument.

At this point, savvy readers may be muttering to themselves that the devil-may-care spelling of Elizabethan times, and the opaque complexity of the standard system that eventually replaced it, should not be taken as typical. The original anarchy was due to the unfortunate residue of the Great Vowel Shift and other sound changes in English, as well as a melange of spelling conventions borrowed from Anglo-Saxon, French, Dutch, Latin, and wherever. The subsequent standardization process was not the top-to-bottom re-design that was needed, but rather a quasi-random codification of chaos.

This is true, if exaggerated. But a more rational process doesn't generally yield lexical consistency either. Consider the case of Somali. The current writing system was adopted as standard in 1972 (back in the days when Somalia had a government), and taught to a generation of Somalis, who became one of the most literate nations in the area. The correspondence with the phonology of the language is simple and transparent, and as a medium of literacy, this system has been a big success — it seems to be easy for native speakers to learn to read and write it.

However, the result is certainly not lexical consistency. Thus on the web, I find the Somali word for "friends" spelled in at least six different ways:

saaxiibo 6050 83.3% saaxiibbo 946 13% saxibo 190 2.6% saaxibbo 47 0.6% saxiibbo 28 0.4% saxibbo 5 0.1%

At least according to the principles given in Zorc and Osman's Somali-English Dictionary, the "correct" spelling (and pronunciation) ought to be the second of those, "saaxibbo".

I find the Somali word for "health" spelled in at least seven different ways on the web — and some of the non-standard spellings are on medical-advice web sites, in government information brochures, and so on:

caafimaad 210,000 88.9% caafimad 22,800 9.7% cafimad 2080 0.9% cafimaad 734 0.3% caafiimaad 116 0.05% caafiimad 9 0.04% cafiimad 419 0.2%

(I believe that in this case, the most common spelling/pronunciation is also the standard one.)

Some of these may be mere typos, but others arise (I think) because some Somali dialects have lost or are losing the distinction between long and short vowels.And you shouldn't be surprised to learn that this is just the tip of the dialect-variation iceberg. As Zorc and Osman's front matter explains, "the student will come across many differences in vocabulary and in pronunciation, and the latter will often show up in writing. Allowances must be made for these variations." This same sort of issue exists in English to an even greater extent, even if we limit ourselves to variant pronunciations of standard formal forms of the language. As a result, even the most rationally-designed writing system faces a choice between lexical consistency and faithfulness to local pronunciation.

Another source of spelling variation in Somali is morphophonological change in context. Thus as a fact of pronunciation (pretty much across dialects, I think), a final short -e will become -a to match the vowel of a suffix. Thus bare "teacher" becomes barayaal "teachers"; and according to the standard writing system, 'a' should be written rather than 'e' in these cases. In Somali text on the web, however, "bareyaal" is somewhat commoner (337) than "barayaal" (241).

This sort of thing — where faithfulness to pronunciation points in one direction, and consistent spelling of a morpheme across contexts points in a different direction — is very common, in Somali and in almost every other language of the world, including of course English.

I'll mention just one more class of problem for the design of alphabetic writing systems. This is the (within-variety) case where X and Y are in general distinct, but not in context C; and the merged (or anyway non-distinct) segment in context C is phonetically somewhere in between X and Y. Should you spell it X, or should you spell it Y? In most contemporary dialects of English, the distinction between /i/ and /ɪ/ is neutralized in front of /ŋ/. Thus the contrast between "keen" and "kin" doesn't have any corresponding pair "keeng" and "king". But how should the vowel in "king" be spelled? In some varieties of English, it's clearly kin-like, while in other varieties, it's clearly keen-like; and sometimes it's about half way in between.

Therefore, if you reform English spelling so that /i/ and /ɪ/ are written in a consistent way, and take the view that people should do what comes naturally, you're going to get variable spellings for "king" (and all other words containing the same rhyme). In order to get a consistent outcome, you'd need to decide on one or the other, purely as a matter of convention, and teach people what to do. (And fix what they write when they do the wrong thing anyhow.)

The problem is much worse when the degree of merger and the pronunciation of the outcomes are more variable. In the case of English vowels before /r/ and /l/, there are people for whom Mary, merry, marry, and Murray are all the same, as well as people for whom they're all different. Likewise for col, call, cowl, coil, Cal, etc. And then there's the whole r-ful/r-less business.

It's possible that lexical consistency isn't worth the trouble — lack of it doesn't seem to have prevented 16th- and 17th-century English writers from expressing themselves effectively. But if you want the same word to be spelled the same way every time it's written, then like it or not, you're in the business of establishing, teaching, and enforcing arbitrary conventions. And a well-designed and well-implemented spell-checker will be a big help.

Permalink