The Caesar Cipher Authors: Chris Savarese and Brian Hart '99

One of the simplest examples of a substitution cipher is the Caesar cipher , which is said to have been used by Julius Caesar to communicate with his army. Caesar is considered to be one of the first persons to have ever employed encryption for the sake of securing messages. Caesar decided that shifting each letter in the message would be his standard algorithm, and so he informed all of his generals of his decision, and was then able to send them secured messages. Using the Caesar Shift (3 to the right), the message,

"RETURN TO ROME"

would be encrypted as,

"UHWXUA WR URPH"

In this example, 'R' is shifted to 'U', 'E' is shifted to 'H', and so on. Now, even if the enemy did intercept the message, it would be useless, since only Caesar's generals could read it.

Thus, the Caesar cipher is a shift cipher since the ciphertext alphabet is derived from the plaintext alphabet by shifting each letter a certain number of spaces. For example, if we use a shift of 19, then we get the following pair of ciphertext and plaintext alphabets:

Plaintext: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ciphertext: T U V W X Y Z A B C D E F G H I J K L M N O P Q R S

To encipher a message, we perform a simple substitution by looking up each of the message's letters in the top row and writing down the corresponding letter from the bottom row. For example, the message

THE FAULT, DEAR BRUTUS, LIES NOT IN OUR STARS BUT IN OURSELVES.

MAX YTNEM, WXTK UKNMNL, EBXL GHM BG HNK LMTKL UNM BG HNKLXEOXL.

Essentially, each letter of the alphabet has been shifted nineteen places ahead in the alphabet, wrapping around the end if necessary. Notice that punctuation and blanks are not enciphered but are copied over as themselves.

Breaking a Caesar Cipher (Cryptanalysis)

Can a computer guess what shift was used in creating a Caesar cipher? The answer, of course, is yes. But how does it work?

The unknown shift is one of 26 possible shifts. One technique might be to try each of the 26 possible shifts and check which of these resulted in readable English text. But this approach has limitations. The main problem is that the computer would need a comprehensive dictionary in order to be able to recognize the words of any given cryptogram.

A better approach makes use of statistical data about English letter frequencies. It is known that in a text of 1000 letters of various English alphabet occur with about the following relative frequencies:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 73 9 30 44 130 28 16 35 74 2 3 35 25 78 74 27 3 77 63 93 27 13 16 5 19 1

This information can be useful in deciding the most likely shift used on a given enciphered message. Suppose the enciphered message is:

K DKVO DYVN LI KX SNSYD, PEVV YP CYEXN KXN PEBI, CSQXSPISXQ XYDRSXQ.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 4 3 0 0 0 3 0 4 1 0 4 1 4 3 1 6 0 0 4 0 7 4 0

Now we can now shift the two tallies so that the large and small frequencies from each frequency distribution match up roughly. For example, if we try a shift of ten on the previous example, we get the following correspondence between English language frequencies and the letter frequencies in the message.

English Language Frequencies



A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 73 9 30 44 130 28 16 35 74 2 3 35 25 78 74 27 3 77 63 93 27 13 16 5 19 1

Enciphered Message Frequencies



K L M N O P Q R S T U V W X Y Z A B C D E F G H I J 4 1 0 4 1 4 3 1 6 0 0 4 0 7 4 0 0 1 2 4 3 0 0 0 3 0

Note that in this case the large frequencies for cipher X and Y correspond to large for English N and O, the bare spots for cipher T and U correspond to bare spots for English J and K. Also, an isolated large frequency for cipher S correpsonds to a similar one for English I. In view of this evidence we needn't even worry too much about the drastic mismatch for English E, which is usually the most frequent letter in a random sample of English text.

If we now apply this substitution to the message we get:

A TALE TOLD BY AN IDIOT, FULL OF SOUND AND FURY, SIGNIFIYING NOTHING.

Using the Chi-square Statistic

chi-square statistic

Let ef(c) stand for the english frequency of some letter of the alphabet

stand for the english frequency of some letter of the alphabet Let mf(c) stand for the frequency of some letter of the message

stand for the frequency of some letter of the message For each possible shift s between 0 and 25:

between 0 and 25: For each letter c of the alphabet

of the alphabet Compute the sum of squares of mf((c + s) mod 26) divided by ef(c)

This is the algorithm that is used in CryptoToolJ's Caesar Analyzer.

For further study and enjoyment

CryptoToolJ. Try breaking the above cryptogram using CryptoToolJ's Caesar Analyzer. This requires a Java-enabled browser. You'll have to paste the message into CryptoTool's input window.