The second task of the 15th Weekly Challenge was to implement

an encoder and a decoder for the Vigenère Cipher. But that’s a

little more complicated than it seems, because the cipher that's named

after Blaise de Vigenère wasn’t actually invented by him, and the cipher

that Vigenère actually invented isn’t named after him.

So should we implement the Vigenère Cipher...or Vigenère’s cipher?

Why not both!

The Vigenère Cipher was devised by Giovan Battista Bellaso in 1553,

and then misattributed to Vigenère about three hundred years later.

It uses a tabula rēcta to translate from a message text to a ciphertext,

and back again.

Given an user-provided key (e.g. "BELLASO" ), we encipher a message

(e.g. "VIGENEREDIDNOTINVENTTHIS" ) by matching up respective letters of the

key and message, then using them as two indices to look up the corresponding

cipher character in the appropriate column and row of the tabula rēcta.

And if the key is shorter than the message, we just recycle the key

as many times as necessary.

For example:

Key: B E L L A S O B E L L A S O B E L L A S O B E L L A...

Text: V I G E N E R E D I D N O T I N V E N T T H I S

Table: ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓

Cipher: W M R P N W F F H T O N G H J R G P N L H I M D

In other words, each column of the tabula rēcta is a separate Caesar Cipher

(or a ROT-N transcription) with each successive letter of the key selecting

which substitution to perform on the corresponding letter in the message.

This approach was widely considered “indéchiffrable” back in the 1500s,

though Charles Babbage broke an example of it within two weeks of its

“rediscovery” in 1854, and Friedrich Kasiski published a reliable general

attack for it less than a decade later. Nevertheless, the Vigenère

Cipher provides reasonable security, provided the chosen key is

long enough and unusual enough to prevent simple dictionary attacks.

Ironically, as we’ll see later, Vigenère’s actual cipher elegantly

solves both these problems with the Vigenère Cipher.

In order to implement the Vigenère Cipher, we first need to build a

tabula rēcta. In Raku, that’s just:

constant @alphabet = ['A'...'Z'];



constant @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎

= @alphabet, *.rotate ... *.head eq @alphabet.tail;

The table is just a sequence of arrays, the first of which is the original

alphabet, with the rest being constructed by taking the previous array

and cyclically rotating it by one character ( *.rotate ). So ['A'...'Z']

is followed by ['B'...'Z', 'A'] , which is followed in turn by

['C'...'Z', 'A'...'B'] . Rinse and repeat, until we eventually

generate a rotation in which the first letter ( *.head ) is the last letter

of the original alphabet ( @alphabet.tail ). In other words,

we generate every rotation until we reach ['Z', 'A'...'Y'] .

Note that, we can change the alphabet we’re using simply by replacing the

contents of @alphabet . For example, if we had wanted to be able to encode a

message like "Vigenère did NOT invent this!" with a key such as

"I❤️Bellaso" , then we’d reconfigure our alphabet like so:

# Valid message characters are all printable Latin-1 + lurve...

constant @alphabet = [' '...'ÿ', '❤️'];

Raku is a modern language with built-in support for the full Unicode

character set in both data and source code, so it’s perfectly okay to use

a string with emoji, like "I❤️Bellaso" , or to declare

a variable like @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 , whose name contains those weird

Unicode mathematical italic characters and ēvēn ā cōmbīnīng mārk.

Using italicized Unicode variable names might seem a crazy thing to do

but it also means that we're no longer constrained by Anglophone Linguistic

Imperialism. Instead of being forced to name a critical variable $cipher-text ,

we could name it naturally, in the language we typically speak:

$texte-chiffré or $النص-المشفر or $加密文字

or $Κρυπτογραφημένο-κείμενο or even $ĉifrita-teksto .

Raku aims to empower 100% of the world’s programmers.

We could use our @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 array directly to look up the cipher character

for a given text/key character pair, but it’s a little ungainly to do that.

We’d need to convert each character back to its integer ASCII code

(via the built-in .ord method) and then normalize it down into the

0..25 range of table indices, like so:

my $key-index = $key-char .ord - 'A'.ord;

my $text-index = $text-char .ord - 'A'.ord;



my $cipher-char = @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 [ $key-index ][ $text-index ];

And that code would become significantly more complicated if we later

switched to a non-continuous alphabet like the one including '❤️' .

It would be much simpler if the 2D look-up table was a 2D dictionary

(in Raku jargon: a hash), and could be directly indexed by the actual

characters from our alphabet, instead of by integers starting at zero.

And that’s easy to arrange:

constant %encoder

= @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 .map:

{ @^cipher .head => hash @alphabet Z=> @^cipher }



Here, for each cipher row ( @^cipher ) of the tabula rēcta, we build a hash entry

whose key is the first letter of the rotated alphabet ( @^cipher .head ),

and whose value is a nested hash that maps each letter of the alphabet to the

corresponding letter in the cipher row ( @alphabet Z=> @^cipher ).

The Z=> operator zips together corresponding elements of its two list

arguments, passing each two to the => pair constructor, so the map

operation produces hash entries like:

# Key Text->Cipher, etc.



'M' => %('A'=>'M', 'B'=>'N', 'C'=>'O' ... 'Y'=>'K', 'Z'=>'L')

So now when we want to look up the appropriate cipher character,

the top-level hash index selects the key column, and the second-level

hash index selects the plaintext character’s row. Like so:

my $ciphertext-char = %encoder{ $key-char }{ $plaintext-char };

Better still, it’s just as easy to use this hash-based approach to

build a 2D decoding dictionary, simply by reversing the

nested mapping from alphabet characters to cipher characters:

constant %decoder

= @𝑡𝑎𝑏𝑢𝑙𝑎-𝑟𝑒̄𝑐𝑡𝑎 .map:

{ @^cipher .head => hash @^cipher Z=> @alphabet }

After which we can decode an enciphered character like so:

my $plaintext-char = %encoder{ $key-char }{ $ciphertext-char };

Once we have those two 2D dictionaries, the Vigenère Cipher is easy

to implement:

sub Vigenère (Str :$text!, Str :$key!, Bool :$decode --> Str) {



# Extract lists of table indices from the two strings...

my @textchars = $text.comb;

my @keychars = |$key.comb xx ∞;



# Chose our trancoder...

my %recoder = $decode ?? %decoder !! %encoder;



# Pair up indices, transcode, and concatenate the results...

(@keychars Z=> @textchars)

.map({ %recoder{.key}{.value} // 0.chr })

.join;

}

The subroutine takes two named string arguments: a text (either

plaintext or ciphered) and the encryption/decryption key. The

exclamation mark after the two parameter names tells the Raku compiler

that those two arguments are required. There is also an optional named

boolean argument (with no trailing exclamation mark) that indicates

whether we want to decode. The long arrow at the end of the signature

( --> Str ) specifies that the subroutine returns a string.

Within the subroutine, we first split both the text string and the key into

lists of individual characters ( $text.comb and $key.comb ). We also

need to repeat the key-character sequence so that the key is at least as

long as the text string. We could work out precisely how many repetitions

were needed, but it’s much easier just to make the key repeat infinitely

often ( |$key.comb xx ∞ ). The vertical bar at the beginning

is needed to flatten the list of key characters. Otherwise, instead of

a list of repeated key-characters, we’d end up with a list of repeated

lists-of-key-characters.

We then select which of the two coding dictionaries we need,

depending on whether or not we’re decoding. In Raku, the ternary

selection operator is test ?? true-value !! false-value ,

instead of test ? true-value : false-value .

Once we have everything set up, the actual encoding or decoding

is trivial:

(@keychars Z=> @textchars)

.map({ %recoder{.key}{.value} // 0.chr })

.join;

We first match up each character from the text with the next

successive character from the repeated key, once again using

the zip-into-a-list-of-pairs operator ( Z=> ) between

the two lists: (@keychars Z=> @textchars)

We then map that list of pairs to a list of encrypted/decrypted

characters, by looking up each key character ( .key ) and each

corresponding text character ( .value ) in the lookup table. If there

is no available conversion for the combination of characters, we just

use a null character ( 0.chr ) as a placeholder. This ensures that any

characters outside the specified alphabet are cleanly mapped out

of the encrypted string, without compromising its future decryption.

Once all the characters have been converted, the list is simply

concatenated ( .join ) to produce the final string.

And that’s the Vigenère Cipher.

So what about Vigenère’s cipher?

The encryption technique discussed above has some serious weaknesses

as a cipher. In 1863, Friedrich Kasiski observed that repeating the

key string in order to generate a long enough key means that a single

encrypted message may sometimes reuse the same columns of the

tabula rēcta to encode the same sequence of plaintext

characters reappearing later in the original message.

So, if we intercept an encrypted string like:

LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV

...then it’s relatively easy to spot the two places where repeated

words in the plaintext message were apparently encrypted using

two separate copies of the same key characters:

LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV



LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV

The first apparent word repetition happens 18 characters apart, so if

the two words were encoded using the same key characters, the key must

repeat after 18 characters. Which means that the key length must either

be 18 or some integral factor of 18 (i.e. 9, 6, 3, 2, or 1). The second

repetition occurs 24 characters apart, so the key length must

simultaneously be some factor of 24 (i.e. 12, 8, 6, 4, 3, 2, or 1).

Since it’s the same key in both cases, the key length must therefore

be a common factor of both 18 and 24: either 6, 3, 2, or 1.

Given those possibilities, the key is most likely 6 characters long,

since the other lengths are too short, psychologically speaking.

In which case, we can probably decode the message simply by trying

a dictionary attack using all 17706 English six-letter words.

To do that we first need to know what the most likely key lengths are:

sub large-factors ($n) { set (4..$n).grep($n %% *) }



my $gap

= any keys [∩] map {large-factors(.<gap>.chars)},

$encoded ~~ m:ex/$<gap>=[ $<seq>=[. ** 3..*] .+] $<seq>/;

Here we exhaustively match ( m:ex ) the encoded text against a regex that

locates any sequence of three or more consecutive characters

( $<seq>=[. ** 3..* ] ), then a gap ( .+ ), then the previous character

sequence again ( $<seq> ). We then determine the length of each gap

( .<gap>.chars ), build a set the significant factors of each length

( map {large-factors(...)} ), and take the intersection ( [∩] ) of those

sets to find the common factors. Finally, we convert the resulting set

of common factors to a single value, using an any junction.

Subsequently we'll be able to match word lengths against that junction

to select potential key words of the appropriate length(s).

Once we know how long the decryption key is, we can load up an

appropriate dictionary and filter it for suitable candidate keys:

my @candidate-keys

= '/usr/share/dict/words'.IO.lines.grep(*.chars==$gap)».uc;



my @common-words

= '/usr/share/dict/common'.IO.lines.grep(*.chars > 2)».uc;

We keep only those words of the appropriate length(s): .grep(*.chars==$gap)

We also load a second, smaller lexicon of common words, which we’ll use

to detect plausible decryptions.

Then we decode the ciphertext with every possible key of suitable length,

and print out any decryptions that seem to include a plausible number

of English words:

# Request text to be deciphered...

my $encoded = prompt "Ciphertext: ";



# Try each potential key...

for @candidate-keys -> $key {



# Decrypt with that key...

my $candidate = Vigenère ( :text($encoded), :$key, :decode);



# Count any real words found in the decryption...

my $found = 0;

for @common-words -> $word {



# Report any plausible decryptions found...

if $candidate.contains($word) && ++$found > 3 {

say "$key --> $candidate";

last;

}

}

}

In just under 20 seconds that produces:

Ciphertext: LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV



HYDROA --> ENDVYZPTDRUVTVGEHBDTSVYZPTDRULEGALUVTVGEH

HYDROA --> ENDVYZPTDRUVTVGEHBDTSVYZPTDRULEGALUVTVGEH

HOGGIE --> EXAGEVPDACARTFDPNXDDPGEVPDACAHEQXWARTFDPN

SECRET --> THEVIGENERECIPHERISNTVIGENERESTABLECIPHER

STRICH --> TSPEKSEYPAGOIASNTUSYEEKSEYPAGETLMUGOIASNT

WAGGIE --> PLAGEVARACARETDPNXORPGEVARACAHPEXWARETDPN

WEDGIE --> PHDGEVANDCAREPGPNXONSGEVANDCAHPAAWAREPGPN



The fundamental problem here (as with so many cipher techniques)

is that the key was simply too short. We could insist that the user

provide a key that’s at least as long as the actual message:

sub Vigenère (

Str :$text!,

Str :$key! where { $key.chars >= $text.chars },

Bool :$decode

--> Str)

{...}

However, in 1586 Blaise de Vigenère found a much easier solution:

take whatever key they offer and, instead than extending it with

extra copies of itself, extend the key with the “random” characters

of the message.

That is, instead of:

Key: SECRETSECRETSECRETSECRETSECRETSECRETSECRET...

Text: THEVIGENERECIPHERISNTVIGENERESTABLECIPHER

Table: ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

Cipher: LLGMMZWRGIIVATJVVBKRVMMZWRGIILLEDCIVATJVV

...we use:

Key: SECRETTHEVIGENERECIPHERISNTVIGENERESTABLECIPHER

Text: THEVIGENERECIPHERISNTVIGENERESTABLECIPHER

Table: ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

Cipher: LLGMMZXUIMMIMCLVVKACAZZOWAXMMYXNFCIUBPIPV

This produces no significant repetitions in the ciphertext because

the repeated words in the plaintext are no longer being enciphered

with repetitions of a single short key. This approach is known as

an autokey cipher, because the full key is generated for us

automatically from the message itself.

Suppose we wanted to extend our previous Vigenère subroutine to offer

autokey encryption as well. We might just add another parameter to

the signature and then select the appropriate behaviour depending on

which arguments are actually provided:

sub Vigenère (

Str :$text!,

Str :$key,

Str :$autokey,

Bool :$decode

--> Str

) {

if $autokey {...}

elsif $key {...}

else { fail 'Must specify :key or :autokey' }

}

But Raku supports multiple dispatch, so there’s a much cleaner way

to add another mutually exclusive behaviour to the Vigenère subroutine.

We simple change the existing code from a sub to a multi :

multi Vigenère (Str :$text!, Str :$key!, Bool :$decode --> Str)

{

# Body exactly as before

}

Then we add a second multiply dispatched variant, with a

different set of parameters and a different body:

multi Vigenère (Str :$text!, Str :$autokey! --> Str) {

Vigenère ( :$text, :key($autokey ~ $text) );

}

This version requires an :autokey parameter, instead of a :key parameter.

It then immediately redispatches to the original (Bellaso) variant, passing the text

straight through unchanged ( :$text ), but constructing a new (non-auto) key

by concatenating the supplied autokey with the text: :key($autokey ~ $text)

The interpreter always knows which variant of Vigenère to call at any point,

because it examines the actual arguments passed into each call, and eliminates

from consideration any variant whose parameter list would not accept those

specific arguments. Which means, if we pass a :key argument, the interpreter

will always call the original (Bellaso) variant, whereas if we pass an :autokey

argument the interpreter will always call our new (Vigenère) variant.

Note that this new autokey variant of the Vigenère subroutine doesn’t allow

a :decode argument. That’s because, whereas encoding a text using

an autokey is just a minor variation on the Bellaso’s algorithm,

decoding it again is a little more complicated.

To decode text that has been autokey encrypted we obviously need

the original encryption key. But in an autokey system, most of that key

is provided by the original unencrypted text. So order to get back the

unencrypted text we must provide the decoder with...the unencrypted text.

Hmmmmm.

Except, of course, that the initial few characters of the autokey aren’t

part of the original plaintext; they’re just the characters of the

regular key (e.g. "BELLASO" , "SECRET" , etc.) that was specified.

Given this partial key, we can at least decipher that many initial

characters of the encrypted text. That gives us the first few characters

of the original message, so we can immediately use those characters

as the next part of the key, enabling us to decrypt a little more of the

encrypted text, thereby giving us a few more message characters,

which gives us some additional key characters, with which we can

extract several more plaintext characters, which...et cetera, et cetera,

until the entire text is decoded.

In Raku, that looks like this:

multi Vigenère (Str :$text!, Str :$autokey!, :$decode! --> Str)

{

# Break out text that was enciphered by the original key...

my ($cipherkey, $ciphertext)

= $text.split: /<?at: $autokey.chars>/;



# Decipher that key-enciphered initial text...

my @autokey

= Vigenère ( :text($cipherkey), :key($autokey), :decode )

.comb;



# Progressively decipher the rest...

[~] flat gather for $ciphertext.comb {

FIRST take @autokey;

@autokey.push:

take %decoder{@autokey.shift}{$^cipherchar} // 0.chr;

}

}

This third variant of the Vigenère multisub takes three required arguments:

the text, the original autokey, and the :decode flag. So this variant

will only be called when decoding an autokey encryption.

We first need to break the text into two strings: the initial N characters

that were encrypted using the original key ( $cipherkey ), and the

remainder of the text which was encrypted using the unencrypted

message itself ( $ciphertext ). To do that, we call the text string’s

.split method, passing it a regex. The text will be split wherever

the regex matches, so we use an <?at: position> assertion

within the regex, to force it to match only at the length of the original key:

/<?at: $autokey.chars>/

Once we have the first N characters in $cipherkey , we can simply

decrypt them with the original key (in $autokey ) by calling the original

(Bellaso) decoding variant, which we invoke by passing the original key

as a :key argument, instead of as :autokey . Like so:

Vigenère ( :text($cipherkey), :key($autokey), :decode )

We then use the .comb method to convert that decrypted plaintext

into a list of characters, which we store in @autokey .

Now that we have decrypted those first N characters of plaintext,

we can start iterating through the list of characters from the remaining

encrypted text ( for $ciphertext.comb {...} ), using the next

known autokey character ( @autokey.shift ) to decrypt the next

message character ( %decoder{@autokey.shift}{$^cipherchar} ).

That newly decoded character is then saved for use as a subsequent

autokey character ( @autokey.push: ).

As we’re decrypting, we also need to capture every decoded character,

so we can eventually concatenate and return them as the decrypted text.

The easiest way to do that is with a gather block, which automagically

remembers anything we take while inside it.

In this case, we first have to gather all of the initial characters that were

decrypted with the original key and then stored in @autokey . So we tell

the for loop to take them, but only during its very first iteration:

FIRST take @autokey;

Then every time we decrypt another character within the loop, we have to

take it too, just before we push it onto the @autokey array:

@autokey.push:

take %decoder{@autokey.shift}{$^cipherchar} // 0.chr;

Once the decrypting for loop terminates, the gather will return a list

of everything we have passed to take . We concatenate

those characters ( [~] flat ) into a single string...and we’re done.

So now, when we want the Vigenère Cipher, we write:

$ciphertext

= Vigenère ( :text($plaintext), :key( $secret ) );



# and later...



$plaintext

= Vigenère ( :text($ciphertext), :key( $secret ), :decode );

And when we want Vigenère’s actual cipher, we write:

$ciphertext

= Vigenère ( :text($plaintext), :autokey( $secret ) );



# and later...



$plaintext

= Vigenère ( :text($ciphertext), :autokey( $secret ), :decode );



If you accept modern theories of linguistic relativity, then the language

in which we think can shape the way in which we think. And perhaps

even limit the ways in which we can think.

Programming languages are the tools with which programmers think

about solutions. So your choice of programming language may shape

how you think of solutions, and may even constrain how it’s possible

for you think about them.

When you’re thinking about solutions in a world as linguistically

ambiguous and as computationally complex as Vigenère’s (or ours!),

you need a language that’s sufficiently linguistically sophisticated, and

computationally expressive, to enable you think clearly and effectively.

And that’s why I program in Raku.

Damian

