Nathan Cunningham

The mnemonic ‘i before e except after c’ is something we’ve probably all encountered at one point or another and can be a useful trick for figuring out awkward spellings. However, an episode of QI I watched recently claimed the rule has more exceptions than adherents, that words containing ‘cie’ actually outnumber those containing ‘cei’, rendering the latter half of the rule useless. This got me interested in two things: 1) just how useless are we talking? and 2) is it possible to come up with any modifications to the rule which aren’t useless?

To do this, first I gathered a list of English words and loaded it up in R. The source is a txt file of over 350,000 words. With this in hand, it’s simple to use grep to extract all words containing an ‘ei’/‘ie’ pair:

words <- RCurl::getURL('https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt') cat(words, file = "dict.txt") dict <- read.table("dict.txt") ie_words <- grep("ie", t(dict)) ei_words <- grep("ei", t(dict))

Note that each word can only feature once in each list. As such, while ‘weightiest’ can appear in both the ie_words and ei_words list, the word ‘zeitgeist’ only appears once in ei_words . This will lead to some undercounting, but I expect it to be minimal.

i before e… So far, the rule is serving its purpose; if you’re struggling to order an ‘ei’/‘ie’ pair in a word, there’s an approximately three to one chance that the ‘i’ will go first.

…except after c So far, not so interesting. The QI episode I mentioned only raised an issue with the ‘except after c’ part of the rule. I checked this in the same manner as before, comparing the number of words containing ‘cei’ with those containing ‘cie’. Oh. Well, that doesn’t really look any different at all. So much so that I had to check that R didn’t just spit out the same plot both times. It didn’t. It turns out if an ‘ei’/‘ie’ pair follows a ‘c’, it’s slightly less likely that the ‘i’ goes first than in the general case, but the difference is so marginal that it makes this addendum to the rule completely useless. You still have roughly three to one odds that the ‘i’ goes first.

except after…? With that aspect of the rule rubbished, is there any letter where the rule tends not to hold? Exactly as before I found the number of ‘ei’/‘ie’ words following each letter of the alphabet. (‘^’ denotes words beginning with either ‘ei’ or ‘ie’). In almost all cases if you’re faced with uncertainty the odds will be in favour of putting the ‘i’ before the ‘e’. There are, however, a few letters which seem to favour the ‘ei’ ordering. In some cases (‘i’, and ‘a’) these exceptions don’t represent very many words; however, there are over 100 words with ‘^ei’, or ‘eei’ (mostly a double ‘e’ followed by ‘-ing’ or ‘-ism’), and just shy of 200 words with ‘wei’. So, perhaps the rule might be better phrased as “i before e, except after w, or e or at the beginning of the word”. Somewhat less catchy though. The long form of the original rule also states that you favour the ‘ei’ order when it’s pronounced like ‘A’. As far as I’m aware there is no regular expression for pronunciations (yet), so I’ll have to settle for interrogating the short form of the rule. It should be noted, however, that the ‘wei’ words feature a lot of variations on the word ‘weight’ meaning they still adhere to the original rule.