Someone picked my brain the other day looking for a technique to compress language files.

After walking away to think about it… my method was to re-order the ASCII code to the letters by their frequency and the most common words by their frequency.

Where lowercase e is stored as an ASCII value using 1 byte

ASCII e = 0x61 = 0b1100001 = 7 bits

vs

APK e = 0x1 = 1 bit

… this method stores an E in 1 bit. This is similar to the Huffman Code with the addition of whole words being included in the code.

For example:

“because” is the 94th most used word in the english language and in this method is stored in 7 bits.

I don’t know if this has been done before… but I would imagine it could compress Language files substantially.

I have thought about a third addition of using the most used 2 or three letter combinations commonly used.