Alphabets and Religion

There’s is some confusion in the Indo-European family due to versions of a language which look different even though they sound the same. Spoken Serbian, Bosnian, Montenegrin, and Croatian are the same Slavic language, but the Orthodox Serbs write it in Cyrillic (“Russian”) letters and the Catholic Croats in the Roman alphabet. Since the Croats insist on the alphabet of the Latin gospels and the Serbs on that of their Orthodox gospels, then logically the Bosniaks and Kosovars, Slavic Muslims who also speak Serbo-Croatian, should write the language in the Arabic letters of the Qur’an. Actually, they did write it in Arabic while under the Ottomans, but now they use Roman.

Bosniak is the correct term for “Bosnian Muslim”. US newspaper reporting about the 1992–94 war often uses “Bosnian” as if it only meant the Muslims, but the country of Bosnia (technically, Bosnia-Herzegovina) is half Muslim (Bosniaks) and half Christian (35% Orthodox Serb and 15% Catholic Croatian.) Cf. those same newspapers’ use of “Semitic” as if it meant “Jewish”. (Nigeria and Lebanon are other countries which are split down the middle between Christians and Muslims, and they also suffer frequent religious violence.)

Exactly the same situation exists on the Indian subcontinent — Hindi, the largest language in India, and Urdu, the national language of Pakistan, are the same when spoken, called Hindustani, but the Indians (mainly Hindu) write it in their own Devanagari script while the Pakistanis (mainly Muslim) use the Arabic alphabet. It’s been said, tongue in cheek, that Urdu is essentially Hindi with Arabic curse words added, or alternatively, Hindi is essentially Urdu with Hindu religious terms. (In formal speech or in print, Hindi will tend to have more esoteric words borrowed from Sanskrit, while Urdu will have more borrowed from Arabic, but both use “convenient” words from the other tradition.)

In print, Devanagari is quite recognizable; the letters all have a horizontal line at the top, so each word is connected by a top bar. Someone facetiously said it looks like a bunch of snakes hanging from a telephone line. For example, is the word “Sanskrit”. In theory the Devanagari top bar is redundant — since it is always there, it can’t be part of letter recognition — and in fact another Indian alphabet (Gujarati) saves millions of gallons of ink per year by using more or less the same forms without the bars. (In other words, the Gujarati snakes have all fallen off that wire.)

The original alphabet was called Brahmi, going back to 400 or 500 bce , but now there are at least ten Indian variations used by different languages — Devanagari, Gujarati, Bengali, Pallava, Tamil, Kannada, Telugu, and so on. In addition, there are another eight or ten Brahmi-descended alphabets in the surrounding area — Tibetan, Burmese, Javanese, Kymer, and more, none of which are Indo-European. There is even a distant echo in Japan, where the “alphabetical order” of the kana characters is the same as in the Indian languages, presumably under Buddhist influence. Most scholars think Brahmi developed out of the alphabet, namely the Phoenician characters of Aramaic, which was the language of the Persian empire at the time. Many of the consonants look similar, although some of them are reversed — understandable given that Aramaic was written right-to-left. Brahmi had to add vowels and a few consonants which did not exist in Semitic.

Note that even the non-Indo-European languages of South India (e.g. Tamil) use relatives of the northern Indian alphabets. Other examples of this phenomenon are Tajik, which is Farsi (Persian) written in Cyrillic, and Lao, which is Thai with its own script.

Many nationalists in India and Pakistan insist that Hindi and Urdu are distinct, but linguists disagree. For instance, Bollywood movies cheerfully mix allegedly pure Hindi or Pakistani words (i.e, from Sanskrit or Arabic), and everybody watching the movies in either country understands the dialog with no problem. The movie titles and credits are written in Arabic, Devanagari, and Roman letters.

In general, alphabets follow religion, not language. Eastern Europe is split down the middle by the alphabet-vs.-religion issue — Catholic Slavs like the Poles, Czechs, Lithuanians, and Croats use Roman letters, even though they don’t match the sounds of those languages very well, while the Orthodox Russians, Ukrainians, Serbs, Macedonians, and Bulgarians use Cyrillic, which was designed specifically for Slavic. The exception to this is Rumania, which uses the Roman alphabet (surprise!) even though the country is mostly Orthodox. In fact, Moldova, one of the former republics of the USSR, was once a province of Rumania and is in the process of switching back to the Roman alphabet from the Cyrillic that was imposed on them by the Soviets.

To show what happens when politics gets involved, consider Azerbaijan. Their language is central Asian, related to Turkish, and the citizens are mostly Shiite Muslims. Until 1929 the language was written in Arabic like their neighbors Turkey and Iran. In 1929 the Soviets imposed so-called “Uniform Turkic” on non-Slavic peoples of the USSR; in reality it was the Latin alphabet with some extra letters. That worked almost too well, with booming non-Slavic literacy at the expense of Russian, so in 1939 Stalin decided it had been a bad idea and enforced Cyrillic everywhere in the USSR. As soon as Azerbaijan became independent in 1991 it switched to Latin, so a 90-year-old Azerbaijani has gone through four official alphabets during his or her lifetime. (Turkmenistan and Uzbekistan have also symbolically thumbed their nose at the Soviets and the Orthodox church by switching to modified Latin alphabets since independence.)

Unlike the Bosniaks and Kosovars, several Islamic countries and peoples who have no linguistic connection to Arabic do use the Arabic alphabet to write their languages — Pakistanis, Iranians, Afghans, and Kurds (all speakers of Indo-European), Turks, and some Kazakhs, Chechens, Kyrgyz, and Azerbaijanis (all Central Asian), not to mention the completely isolated Malaysians. Albanian was written in Arabic once upon a time, but now they use a Latin-based system with some extra letters thrown in. Yiddish (German) and Ladino (Spanish) are written in Hebrew characters, and the Catholic inhabitants of Malta write their Arabic-derived language in Roman letters. (Although most Indonesians are Muslim, and the language is a variety of Malay, it is written in Roman.) I’ve already mentioned that southern Indians are Hindus and use alphabets derived from Sanskrit to write their unrelated Dravidian languages.

Closer to home, you are reading this document in Roman letters instead of Germanic runes because Catholic missionaries propagated them all across western Europe, replacing the Runic alphabet used by the Germanic tribes like the Franks, Goths, and Vandals. (Note that the proto-Germans didn’t have paper or ink, since the runes, with their straight lines, are obviously designed to be scratched on wood or stone.) The Kelts also got forced into using Latin letters when they adopted Catholic Christianity, leading to the notorious difference between spelling and sound in Gaelic and particularly, in Welsh. Gaelic once had a home-grown alphabet called Ogham, also designed to be carved rather than written in the normal sense. Here’s an example from Ireland; note that the edge of the stone served as the reference line for the letters.

Overall, this means that populations are far more likely to switch alphabets than to switch languages. This makes sense, particularly because for the great majority of history, the literate have been a very small minority. There are many cases where political and/or military force has attempted to impose a new language on the masses, and it usually doesn’t work. In England, the peasants stubbornly kept speaking English despite their French overlords. For 400 years, Persia was ruled by the Arabic-speaking Sassanids, but as soon as the Turks conquered the Sassanids, everyone spoke Persian/Farsi again. (It helped that, unlike England under the Normans, there was a flourishing scientific and literary community in Persia that kept using the old language all along.) Hundreds of years of Swedish rule didn’t manage to eradicate Finnish. The Germanic invaders of the western Roman empire (Goths, Vandals, Franks, Burgundians, etc.) didn’t manage to overthrow the Romance (vulgar Latin) languages, and on the other hand, hundreds of years of Roman rule left Greece, the Levant, and Egypt still speaking Greek. Three thousand years after the Aryans arrived, southern India still speaks Dravidian.

Certainly language change can happen —Egyptians speak Arabic, not Coptic, most Irish speak English instead of Gaelic, Mexicans speak Spanish instead of Nahuatl, and so on, but this is usually because of economic factors, not political or military. If you and your kids can get better jobs by learning a different language, this is a powerful incentive. Overwhelming force—use my language or I will kill you— sometimes works. Certainly Arabic abruptly disappeared in Andalusia after the reconquista. Another less bloody forcible method is requiring the schools to use a particular language. This is how English penetrated as a second language for the educated in India. Soviet higher education was all in Russian, and it was an important part of Spanish policy in Central and South America. The United States government forced all reservation schools to use English exclusively, but the extermination of most Amerindian languages was simply due to the overwhelming number of English speakers.

By the way, one South American country (not counting Brazil) does not speak Spanish because of a very explicit violation of that requirement I just mentioned to educate the natives to be bilingual in Spanish. In Paraguay, over 50% of the population is monolingual in Guarani, and 95% are fluent in it. This is because in that territory the Jesuit missionaries, in order to protect their parishioners, refused to teach them Spanish — since they only spoke the unintelligible (to the Spanish) Guarani they were less suitable as slaves and servants. (This Jesuit defiance was depicted in a well-received — seven Academy Award nominations, one win — 1986 movie, The Mission.)

Note that from the point of view of language X, learning language Y might be “hard” or “easy”. (Most speakers of English would agree that Dutch is easier than German which is easier than French which is easier than Mandarin….) Linguists still argue about whether there are inherently easier languages — after all, every language, by definition, is capable of being learned by infants — but the answer is probably “yes”. If so, English, with its abnormal lack of rules (minimal cases and tenses, the same word can be a noun, verb, or adjective, etc.) would seem to be in the “easy” family.

Interestingly, the closest major language to English in structure is Chinese! Both are characterized by subject-verb-object sentence order, almost no noun or verb inflections, and short words which are combined to build up meanings. (For example “He would have been going” is one word in many languages.) This combination of features is almost unique in the world’s other languages —the only other one I know of is Malay. In addition, both English and Chinese share the feature that what is on the page is only vaguely associated with what is spoken. (McCullough has a rough cough and hiccoughs while ploughing through the borough.) Good readers of English do not pay attention to the sound of the letters — i.e. using Phonics like their first-grade teacher told them to — they use whole-word recognition instead. In other words, they have learned to read English exactly the same way the Chinese read their language. (If I read a science-fiction novel with a gazillion made-up Alien names, I will finish the book with no idea how a single one of those words might be pronounced.) This implies that if English is “easy”, then so is Chinese.

I recently saw a quotation that perfectly makes the above point:

“Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.”

The only word that gave me grief at first glance was “rscheearch” —I knew it had too many letters to be “research” and figured out it had to be a misprint for ”researcher”. This actually proves the point, because if it had followed the rule of first and last letter anchors, it would have been something like “rscheeachr” and would have been instantly recognized.

Speaking of giving grief, you would think that my computer’s spell-checker would have totally choked at that paragraph, but smart checkers look for inadvertant permutations (“teh” for “the”, for example) and my software gave the correct word as one of the suggestions in almost every case, and it usually was the first suggestion! (Another inadvertent proof of my point — how many of you noted that I misspelled “inadvertent“ in the first sentence? I didn’t until I ran the checker. Cf. the notorious receive vs recieve.)

It’s been pointed out that the Serbo/Croatian and Hindi/Urdu contingents are blind, while the Chinese are deaf. This provocative statement is based on the observation that a Serb and a Croat (or an Indian and a Pakistani) can hold a conversation but cannot write notes to each other — the same situation as if two blind persons (or one blind and one sighted) were trying to communicate. On the other hand, all the many dialects of Chinese use the same characters in the written language, but each dialect may pronounce the words differently. Therefore, for example, speakers of Mandarin and Cantonese cannot talk to each other, but can always write notes — the same situation as if two deaf persons (or one deaf and one hearing) were trying to communicate. Another example of pseudo-deafness is from the Fertile Crescent: Sumerian cuneiform word characters were also used for several completely unrelated languages, the Semitic Akkadian of Babylonia and Assyria, and the Indo-European Hittite and Old Persian, with the cuneiform pronounced as the corresponding Sumerian, Akkadian, Hittite, or Persian word.

These analogies leave out such expedients as Braille and sign language, because those cases would involve one or both parties learning another method of communication they could share.

One good example of the difference in Chinese dialects is the word , the name of a refreshing drink. In the Amoy dialect it is pronounced “tee” but in Cantonese and Mandarin “chah”. The rest of the world is pretty well split down the middle which sound it has adopted — English tea, Spanish té, Hebrew tei, etc. but Arabic shai, Russian chai, Greek tsai and so on. English once used both “tea” and cha before settling on the former. 10Dec17 Note that “tea” was pronounced “tay” until the middle of the 18th century and was regularly rhymed with day, obey, etc.

Chinese writing is analogous to mathematical formulae: the sentence 2 + 6 / 3 = 7 is understandable on the page in most of the world, even though there are hundreds of mutually unintelligible ways of saying “two plus six divided by three equals seven”. (This is false, of course, but mathematical sentences can lie just as easily as any others.) Many writing systems have different indigenous symbols for the digits, but just about everyone can understand the ubiquitous “Arabic numerals” as shown above. Interestingly, Arabic doesn’t use them by default — they have a different set of ten symbols which they call “Indian numerals”!

It’s been seriously suggested that if humans ever encounter intelligent (or rather, technologically-advanced) aliens, our first practical communication will be mathematical and scientific. A sketch of an atom with six electrons means “carbon” anywhere in the universe, and something like II + III = IIIII and II + I = IIII - I establishes the meaning of the plus, minus, and equals signs in ten seconds flat, no matter how many tentacles the alien might have or what kind of weird thought patterns occur in whatever passes for its brain. (As mathematicians love to point out, if you are willing to admit that 1 + 1 = 2, then they can inductively prove all of mathematics — arithmetic, geometry, algebra, calculus, ….)

The comment about the Chinese being deaf is double-edged, because the various sign languages used by the deaf are not associated with spoken languages, either. American Sign Language (ASL), for example, is used in many countries where English is not spoken. Therefore a Mexican deaf person who only reads and writes Spanish, and a deaf resident of the US who does not know a word of that language, can communicate in ASL. On the other hand, ASL and British Sign Language are totally different, so deaf persons from England and America cannot sign to each other and would be reduced to writing notes again! (ASL is derived from the earliest systematic signing system, which was developed in France, but it doesn’t have anything to do with French vocabulary, either. Conceptually, you can think of sign languages as Chinese characters formed with the fingers.)

Other examples of language-independent communication are the younger generation sending messages using Emoji, pictorial signs that decorate places like airports, and “international-style” road signs. All users of the airport, no matter what languages they can or cannot read, can still (hopefully) find the rest room and the baggage carousel, and drivers of all nationalities recognize the “No Left Turn” sign with its arrow and diagonal red bar and know that yellow triangles mean “Yield” while red octagons mean “Stop” and a big blue /H/ points to a hospital. Yet another example is the now-common use of the symbols | and ○ to represent the concepts “on” and “off” on electrical equipment. If there’s a single on/off button, it has the combined symbol . This allows the manufacturer to sell the same product in different countries (or different areas of the same country, Canada for instance) without having to re-do the controls. Indeed, many cars now have dashboards with almost no words on them at all — only symbols for headlights, horn, volume control, fuel, etc.

My laptop has keys labeled as Caps Lock, Scroll Lock, Num Lock, Rewind, Stop, Pause/Run, Fast Forward, Volume Up, Volume Down, Mute, Battery Status, Brightness Up, Brightness Down, Menu, Up, Down, Left, Right, and Sleep without using a single English word. (It also has keys with weird pictures that I’m told have something to do with Microsoft Windows. I wouldn’t know.) Perhaps even more common are the icons used in computer Graphical Interfaces, which have been converging to a standard set of functional symbols for some time. For example, to mean “get information”, to start a search, to return to the home page, to undo a previous operation, to enter the file system, or to save the current file are understood by almost all computer users. (Note that the “save” icon is a picture of a floppy disk, which nobody in the current generation has ever seen. For that matter, the homepage icon presumably doesn’t look much like home if you live in a high-rise apartment or an igloo.) One of the less-recognizable-but-ubiquitous icons is , the menu symbol, aka the hamburger.

Just about the only really rational script is Hangul, the Korean alphabet. In 1446, King Sejong the Great of Korea got tired of the confusion in trying to use Chinese characters for the unrelated Korean language, and created a new alphabet from scratch. (The king was one of the best phoneticians in the country, and many scholars think he did much of the work himself.) Whoever did the work, the net result is a phonetic system where the shape of the letters indicate where the tongue should be and how open the mouth should be for each consonant, whether the vowels are “bright” or “dark”, etc. Another unique feature of Hangul is that the individual letters are not written in a row like most alphabets; they are grouped into syllable clusters, so that at first glance Korean looks more complex to the Western eye than it really is. For example, the letters ㅎ (H), ㅏ (A), ㄴ (N), ㄱ (G or K), ㅜ (U) and ㄱ (G or K again) become the two syllable word 한쿡 Hanguk, the Korean word for Korea. Here it is in a larger font so you can clearly see how the clusters are assembled: (There are rules for cluster formation — in general they are assembled clockwise starting at the top left — so it’s possible for a Korean computer user to key in the letters individually and let the software worry about correctly displaying the clusters.) A friend told me that, while on a plane to Korea, she studied the principles and by the time it landed found she could pronounce almost every written Korean word she encountered.

Even if an alphabetic system is “linear”, that doesn’t mean it has to be done with the sounds in strict left-to-right or right-to-left order. There are scripts where the letters of each word are in alphabetical order, or where the initial sound is in the center and the succeeding sounds go on both sides, sort of as if English spelled “baker” as “abekr” or “eabkr”. There are a couple of alphabets (e.g. Tibetan) where consecutive consonants without an intervening vowel are written one beneath the other, as if English “Mister” was written . Even a “western” language like classical Greek sometimes was written in so-called boustrophedon style where alternate lines ran left-to-right and right-to-left. The word is Greek for “ox-turning”, like furrows in plowing. An example is

Mary had a little lamb, ‮Its fleece was white as snow, ‪And everywhere that Mary went ‮The Lamb was sure to go.

Every now and then some efficiency expert points out that boustrophedon would increase reading speed, since the eye would not have to snap back to the beginning of each new line and occasionally lose its place while doing so. Unfortunately, the chance of everyone being retrained is zero. Re-doing the books would be easy; everything is on a computer these days, and a very trivial program can change an entire book into boustrophedon. The same is true for web pages; modern browsers have a flag to format and display text either left-to-right or right-to-left. For example, I could stick in a sentence written in Arabic, Farsi, or Hebrew and the browser would handle it just fine. In fact, I just did it! What I actually typed to show the boustrophedon poem above was:

Mary had a little lamb, ‮Its fleece was white as snow, ‪And everywhere that Mary went ‮The lamb was sure to go.

...where the magic codes are the Unicode instructions for RTL and LTR display. Right-click on the page and use the “Show Source” option if you don’t believe me. Now you understand the joke in this cartoon.

Similarly, we’ll probably still be using the very inefficient QWERTY keyboard arrangement hundreds of years from now, and it doesn’t take more than a week or so for a good typist to convert to a much more logical arrangement such as Dvorak and increase their typing speed. (Just like boustrophedon, computers can switch keyboard arrangements back and forth with a single click. A trivial example is changing from the English QWERTY to the French AZERTY or German QWERTZ layout.) On the other hand, far more people now know how to text on their portable phones than know how to touch-type, and they normally aren’t using QWERTY but rather the phone’s ten-digit arrangement, using nothing but their thumbs.

The Japanese should have paid attention to the Koreans. As mentioned previously, Japanese also is not related to Chinese, and so now the poor Japanese use four writing systems — Chinese-based word symbols (Kanji), two home-grown phonetic syllable sets (Katakana and Hiragana), and Romaji (aka Roman letters) — sometimes all mixed up in the same sentence. Hiragana and Katakana contain the same 45 syllables, but Hiragana is used for adaptations of Old Japanese words and modifiers (tenses, for example) that didn’t have Chinese symbol equivalents, while Katakana is used for phonetically rendering new or foreign terms as described above. Since the syllable systems are easy to memorize (vs. about 2,100 Kanji symbols in common use and many other rare ones used in things like personal names), Japanese children’s books use Hiragana to phonetically spell all words, including those where adults would use a Kanji character instead. (Japanese schoolchildren aren’t expected to know all those 2,100 Kanji words until 8th grade.) In theory, Hiragana or Katakana could be used to represent all Japanese words; the Kanji forms are kept mainly to avoid homonyms. Because the Japanese seem to think Roman letters are prestigious, everything from store and product names to license plates use Roman. Numerals might be Kanji, Hiragana, or “Arabic” 1…2…3.

In Japanese “alphabetical order” —dictionaries, encyclopedias, directories of all kinds — Kanji words are now placed as if they were spelled in Katakana, as compared to the Chinese system, which relies on the number of strokes used to write the symbol. One might expect that soon China will switch to a pronunciation-based system as well, with all words in Pinyin order (i.e., English alphabetical order) instead.

Since Hiragana and Katakana are the same syllables, one can think of them as simply variants, much as capital letters differ from lower-case in English typography and Roman letters differ from Italic, while both are wildly different from the hand-written versions. Since Katakana is used for foreign terms, the best analogy is perhaps to the way italics are used in English.

German did the same sort of thing until recently; German-language text was set 𝔦𝔫 𝔉𝔯𝔞𝔨𝔱𝔲𝔯 𝔱𝔶𝔭𝔢 (generically called Blackletter), but embedded foreign words and phrases were set in Roman. For example, look at this excerpt from a German dictionary, where the Latin, French, English, and Italian forms of the word “antiquary” are printed in Roman. The change to exclusive use of Roman is one of the few useful things for which Hitler was responsible, along with autobahns and the Volkswagen — in 1941 Fraktur and other Blackletter forms were declared to be Judenlettern and officially abolished. (They weren’t “Jewish letters” — they were a modification of the handwritten Latin letters used at the time of Charlemagne — but presumably the Nazis realized Fraktur was a handicap in their occupied territories, since the French, Dutch, Danes, etc. couldn’t read it easily and furthermore didn’t have the right typesetting equipment. At the conclusion of the war, the victorious Allies occupying Germany didn’t like it either, preventing any chance of a comeback.) Well, mostly. On a vacation in Germany a few years ago, I found myself in a town which was determined to be old-timey cute and had all the street signs in Fraktur, so that several times I had to stop the car and carefully figure out where I was. (I still got lost.)

In this typical Japanese street scene, the WonderGOO storefront has Kanji words, Katakana syllables, Hiragana modifiers, Roman letters, Arabic numerals, and some apparent Western punctuation (exclamation points, dashes) thrown in for good measure, not to mention an international “handicapped parking” symbol! Just to make the visual confusion even worse, some of the writing is horizontal and some vertical. Note that “WonderGOO” isn’t even pronounceable in Japanese — it would have to be Wunderagu or Wundagu or something like that.

Also note that the characters on the sign tend to line up underneath each other. East Asian languages tradionally write each character, whether it is complex or simple, within a square box of the same size. Paradoxically, although ideographic languages have tens of thousands of characters, they are easy to typeset and print because, unlike western alphabets, proportional spacing and kerning don’t have to be taken into account. The only exception is Katakana, a modern invention which has some half-width characters. (One-glyph-per-box is probably the reason for Hangul’s syllable clusters, making them the same size and shape as any intermixed Chinese so things look better on the page. Cf. the WonderGOO sign, where the Kanji is jarringly different from the kana.)

OK, now to explain kerning. The Roman alphabet has letter combinations that look awkward together, and so it is common to, either by hand or automatically, change the inter-letter spacing, particularly in large type sizes like headlines. In this example note how the letter spacings are changed, including “tucking in” the /y/ under the /T/, to make the word look better. Here are some examples of hilariously bad kerning — the /FL/, /FI/, /LI/, etc. sequences are notorious for being misread. In fact, graphic designers have a word for this sort of thing — keming — defined as the act of bad kerning. (Lower-case /rn/ is far too easily read as /m/, particularly in srnall type sizes where that space becomes almost microscopic. Heh, heh. You thought that word was “small”, didn’t you?”) E-books which have been generated by an OCR scan of a previously printed version (and then not properly proof-read or, even worse, simply run through a spell-checker) are notorious for these errors.