A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable Daniel Shawcross Wilkerson

version 1.0, 26 January 2009

Identifiers (IDs) are pervasive throughout our modern life. We suggest that these IDs would be easier to manage and remember if they were easily readable, spellable, and pronounceable. As a solution to this problem we propose using PRO-nouncable QUINT-uplets of alternating unambiguous consonants and vowels: proquints.

In this essay we first derive an initial naive solution from the relevant concerns. Next we review other aspects of human interaction with the solution and thereby derive the details. We then compute the information density of the solution. Finally we summarize the resulting protocol.

In general what would help the humans would be if IDs were not only unique, but were also that are readable, spellable, and pronounceable and just generally convenient for a human to use — as well as being a reasonably efficient information encoding. We therefore propose PRO-nouncable QUINT-uplets of alternating unambiguous consonants and vowels, or "proquints", as the solution.

Most codes are designed to maximize information density. Humans often must manipulate identifiers personally and there are therefore other dimensions to the problem that should also be considered. Here our goal here is not to have a narrow, "vertical", solution that is optimized for only one dimension, such as information density, but a broad "horizontal" solution that is a good solution for all of the relevant concerns, especially those of humans.

Eight hex-digit Memory addresses, e.g. 0xF074B0CD, will soon, with 64-bit machines, be twice as long.

Dotted quads IP addresses, e.g. 127.0.0.1, will soon (someday?), with IPv6, be twice as long.

Our life is full of identifying numbers. However the number-ness of these numbers is irrelevant. A credit-card number is not a number exactly — when was the last time you did arithmetic on one? Mostly these numbers are identifiers: their only function is to be unique. The problem is, they are getting longer as the world of things that must be uniquely identified gets larger: the number of things goes up, previously disjoint contexts merge, and history gets longer.

If we just had two more vowels we would have 8 sounds or 3 bits. Then we could use consonant-vowel order (reversing the pair) or stress (indicated by, say, a capital letter) to get one more bit. This would yield a total of 8 bits for a consonant-vowel pair.

There are 6 unambiguous 1-letter vowels (again disagreeing with the categorization of the letter "r"):

Having 16 sounds gives 4 bits right there and all in a single-character spelling, so we think sticking to the above one-letter consonants is good.

Some consonants are too hard to encode or to hear the difference between. There are 16 unambiguous 1-letter consonants (omitting what we consider to be the wrongly-categorized letter "r"):

Our initial idea is to encode numbers as strings of alternating phonetically-unambiguous consonants and vowels. There seem to be [pronounce] 24 consonants and 20 vowels in English. It gets more complex [pronounce2] if you look more carefully.

There is only so much Shannon entropy [Shannon-1948, Shannon-1949, Shannon-1993, Shannon-1951] you can get by pushing air through flapping organs originally purposed for eating and breathing. We assume it best to re-use the method to which natural language has already converged.

Recall that the proquint goals are multiple and include readability and spellability and just general convenience and general fitness to the problem: this is a system to used by humans who do not like complexity.

Therefore we think it is important to make the textual encoding of the sounds simple, by

writing all proquints using the same character width,

not introducing case differences to encode information, and

not using non-alpha chars to encode other vowels (@ for the "a" in "at").

Further, a simple (C=consonant, V=vowel) CVCVC pronounceable-five-group or "proquint" feels right as a word. In fact, we can get rid of two of the vowels, say "e" and "r", and still encode 2 bits in one vowel. This method makes the information contained in one proquint a convenient number:

(log2 (* 16.0 4.0 16.0 4.0 16.0)) 16.0

So four proquints suffices to encode 64 bits and two suffice for 32 bits. We wonder why IP addresses aren't encoded this way — they sure would be easier to remember.

Since one may also want to fall back on saying each letter separately and therefore each should likely be a single syllable in English, we eliminate "w", which as previously-mentioned was a source of ambiguity already. In its place we promote "r" to a consonant.

Getting rid of ambiguities between vowels is a good idea for making this system more international, though there are persistent difficulties: Japanese have difficulty distinguishing "l" and "r", Hungarians have difficulty pronouncing "w", and "l" can easily sound like a "w" (a mutation occurring in the history of my middle name). My proof-readers have sent me many more examples from other languages — attempting to eliminate all such ambiguities is impossible, but we have prevented some of the worst ones above.

Some proquint sequences ending in "h" take some extra effort to pronounce and hear, but since there is always exactly one consonant where you expect one, there is less ambiguity than in natural spoken English: if you don't hear a consonant when you expected one, it was an "h", the empty consonant. The only other option is to replace "h" with "x" and we really can't say that doing so is an improvement: "x" is usually pronounced as "ks" which is really a consonant cluster and confuses the rhythm.

There is another subtle reason not to use "x": the standard notation for a hex string starts with a prefix that includes an "x" so it is therefore possible to unambiguously distinguish between a proquint and a hex string in standard notation.

A sequence of proquints should be separated by something standard, such as dashes. While the dash is intended to induce a pause, for easy flow of pronunciation we suggest an unused vowel sound be used. Since we don't use the vowel "e", we suggest the dash be pronounced as "eh".