In my previous post I explained how the Vigenère cipher works and how to implement it in Rust. I also mentioned that nowadays this cipher doesn’t offer any security, since it can be easly broken with the help of a computer. Well, that is exactly what we are going to do now.

The algorithm

There are several methods to break Vigenère, usually the outline is:

Guess the length of the key. There are several probabilistical methods, the main ones, the Kasiski examination and the Friedman test are described in Wikipedia. Once we have a likely key length we group all the characters from the cipher text that are encrypted with each character in the key. So, for example, if the key has size three, we make three groups, one with the characters in position 1, 4, 7, 11 …, another with the ones at 2, 5, 8… and so on, because all of them would have been encrypted using the same charater of the key. Each of the groups from before are encrypted using the same character, this is a Caesar cipher. To solve it we can just try all the 256 possible values (all the possible values for a byte) and pick the one that “looks better” according to some scoring function.

Also, this particular problem is one of the cryptopals challenges, their instructions about how to solve it are quite good, you can find them here.

The implementation

Now that we know all the parts of the project, let’s start from the top and write what we need.

Guessing the key size

This is the most difficult bit. There are several alternatives and all of them are probabilistic, so we will have to get a set of the best candidates and try them all. We could even try brute force and test every possible key size until we find one that works.

The alternative described in cryptopals looks fairly easy to implement, so we could start there and see how well it works.

The idea is to try different key sizes. For each key size K, take the first and second groups of K bytes from the cipher text and calculate how “different” they are using the Hamming distance and normalizing the result divinding by K. The key size with the smallest normalized result is likely to be the key. The description from crytopals mentions that you could take more than two blocks and average the results to improve the accuracy of the guess.

The truth is that I don’t understand very well why this method works (I guess that since you are repeating the same key, the blocks are more likely to be similar if they match the key size). I implemented it and, while it worked with the cryptopals challenge file, it didn’t guess very well some of my own encrypted files.

This is my implementation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 pub fn guess_key_size ( cipher : & cipher :: CipherText ) -> Vec < u32 > { let mut heap = BinaryHeap :: new (); let mut best = Vec :: new (); for i in 1. . 40 { let score = calc_size_score ( & cipher . as_bytes (), i ); heap . push ( KeyScore { size : i as u32 , score : score , }); } let mut count = 0 ; while let Some ( v ) = heap . pop () { if count > 3 { break ; } else { best . push ( v . size ); count += 1 ; } } best } fn calc_size_score ( cipher : & [ u8 ], size : i32 ) -> f32 { let b1 = & cipher [ 0. . size as usize ]; let b2 = & cipher [ size as usize .. 2 * size as usize ]; let b3 = & cipher [ 2 * size as usize .. 3 * size as usize ]; let b4 = & cipher [ 3 * size as usize .. 4 * size as usize ]; let d1 = hamming_dist ( b1 , b2 ) as f32 ; let d2 = hamming_dist ( b1 , b3 ) as f32 ; let d3 = hamming_dist ( b1 , b4 ) as f32 ; let d4 = hamming_dist ( b2 , b3 ) as f32 ; let d5 = hamming_dist ( b2 , b4 ) as f32 ; let d6 = hamming_dist ( b3 , b4 ) as f32 ; (( d1 + d2 + d3 + d4 + d5 + d6 ) / 6.0 ) / size as f32 }

The code returns the three most likely sizes so that we can test them all.

Grouping the characters

The logic behind this is not too difficult, but it will be easier to isolate this bit so that it can be tested independently. So we start with the cipher text, a vector of bytes. If we look at them as ASCII characters for a moment, we will have something meaningless like:

V: wjmzbfapk

Now, if our key size is three, we want to break it down into three vectors:

V1: wza V2: jbp V3: mfk

We now that V1 was encrypted with the first byte of the key, V2 with the second and so on. Now we would run our brute force decryption function (described in the next section) and reassemble the output again into a single vector:

V': whizzbang

To represent this we can use a struct, keeping the byte vector and the size of the key we are trying as fields.

1 2 3 4 pub struct ByteMatrix { matrix : Vec < Vec < u8 >> , row_size : usize , }

We will need some operations to transpose the contents and to reassemble the result into a single vector again.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 pub fn to_matrix ( vector : & [ u8 ], size : usize ) -> ByteMatrix { let mut vectors : Vec < Vec < u8 >> = Vec :: new (); for _ in 1. . size + 1 { vectors . push ( Vec :: new ()); } for ( i , byte ) in vector . iter (). enumerate () { vectors [ i % size ]. push ( * byte ); } ByteMatrix { matrix : vectors , row_size : size , } } pub fn reassemble ( & self ) -> Vec < u8 > { let mut bytes : Vec < u8 > = Vec :: new (); let max_size = self . matrix [ 0 ]. len (); for i in 0. . max_size { for j in 0. . self . row_size { let this_size = self . matrix [ j ]. len (); if this_size > i { bytes . push ( self . matrix [ j ][ i ]); } } } bytes }

Now we can deal with each row independently but we still need a way to transform the data:

1 2 3 4 5 6 7 8 9 10 11 pub fn transform < F > ( & self , fun : F ) -> ByteMatrix where F : FnMut ( & Vec < u8 > ) -> Vec < u8 > { let vecs = self . matrix . iter (). map ( fun ). collect (); ByteMatrix { matrix : vecs , row_size : self . row_size , } }

Breaking the Caesar cipher

And now the last step. We have several byte vectors encoded using a single byte (Caesar cipher), so we are going to try each possible key value and see which one of the outputs makes sense! Obviously we are not going to print them all and pick ourselves, we need a scoring function to pick one for us.

There are many ways to score the deciphered text, one common way is to check the frequency of each character for your particular language and see how well your text follows the distribution. For our purposes we don’t need to do anything that complicated. Since we know that most of the text is going to be made of lowercase latin characters we can add one to the score every time we found one. We also know that there are a few others that are very unlikely, so we reduce the score by one when we find them.

This is one possible scoring function we could use:

1 2 3 4 5 6 7 8 9 10 11 fn score ( input : & [ u8 ]) -> u32 { input . iter (). fold ( 0 , | acc , b | { if * b >= 97 && * b <= 122 { acc + 1 } else if * b >= 33 && * b <= 64 && acc > 0 { acc - 1 } else { acc } }) }

Now we use this function with each one of our candidates:

1 2 3 4 5 6 7 8 9 10 11 12 13 pub fn decode_single_key ( cipher : & cipher :: CipherText ) -> cipher :: PlainText { let mut plain = None ; let mut best_score = 0 ; for key in 0. .. 255_ u8 { let candidate = cipher :: decrypt_single_key ( cipher , key ). unwrap (); let score = score ( & candidate . as_bytes ()); if score > best_score { best_score = score ; plain = Some ( candidate ); } } plain . unwrap () }

Note the inclusive range in the for loop, that is an unstable feature in rust, so we need to include #![feature(inclusive_range_syntax)] in the definition of our module.

Putting it all together

Now we have all the pieces, let’s use them to break the cipher!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 pub fn break_cipher ( cipher : & cipher :: CipherText , key_size : u32 ) -> cipher :: PlainText { let matrix = byte_matrix :: ByteMatrix :: to_matrix ( & cipher . as_bytes (), key_size as usize ); let matrix = matrix . transform ( | vec : & Vec < u8 >| { let cipher = cipher :: CipherText :: new ( vec ); let plain = decode_single_key ( & cipher ); plain . as_bytes () }); let decoded = matrix . reassemble (); let plain = cipher :: PlainText :: new ( & decoded ); plain } pub fn decode_text ( cipher : & cipher :: CipherText ) -> Result < cipher :: PlainText , cipher :: Error > { let mut best_score = 0 ; let mut candidate = None ; let key_size_guesses = guess_key_size ( cipher ); for key_size in key_size_guesses . iter () { let plain = break_cipher ( cipher , * key_size ); let score = score ( & plain . as_bytes ()); if score > best_score { best_score = score ; candidate = Some ( plain ); } } match candidate { None => Err ( cipher :: Error :: Failure ( "Couldn't decode text" . to_string ())), Some ( plain ) => Ok ( plain ), } }

Just like last time, you can find the full source for this post in my github account.