Introduction

Have you ever wondered what the "randomart" or "visual fingerprint" is all about when creating OpenSSH keys or connecting to OpenSSH servers? Surely, you've seen them. When generating a key on OpenSSH version 5.1 or later, you will see something like this:

$ ssh-keygen -f test-rsa Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in test-rsa. Your public key has been saved in test-rsa.pub. The key fingerprint is: 18:ff:18:d7:f4:a6:d8:ce:dd:d4:07:0e:e2:c5:f8:45 aaron@kratos The key's randomart image is: +--[ RSA 2048]----+ | | | | | . . E | | + = o | | . S + = = | | * * * ..| | . + + . +| | o . o.| | o . .| +-----------------+

I'm sure you've noticed this, and probably thought, "What's the point?" or "What's the algorithm in generating the visual art?" Well, I'm going to answer those questions for you in this post.

This post is an explanation of the algorithm as explained by Dirk Loss, Tobias Limmer, and Alexander von Gernler in their PDF "The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm". You can find their PDF at http://www.dirk-loss.de/sshvis/drunken_bishop.pdf‎. In the event that link is no longer available, I've archived the PDF at http://aarontoponce.org/drunken_bishop.pdf.

Motivations

Bishop Peter finds himself in the middle of an ambient atrium. There are walls on all four sides and apparently there is no exit. The floor is paved with square tiles, strictly alternating between black and white. His head heavily aching—probably from too much wine he had before—he starts wandering around randomly. Well, to be exact, he only makes diagonal steps—just like a bishop on a chess board. When he hits a wall, he moves to the side, which takes him from the black tiles to the white tiles (or vice versa). And after each move, he places a coin on the floor, to remember that he has been there before. After 64 steps, just when no coins are left, Peter suddenly wakes up. What a strange dream!

When creating OpenSSH key pairs, or when connecting to an OpenSSH server, you are presented with the fingerprint of the keypair. It may look something like this:

$ ssh example.com The authenticity of host 'example.com (10.0.0.1)' can't be established. RSA key fingerprint is d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9. Are you sure you want to continue connecting (yes/no)?

At this point, as a responsible citizen of the community, you call up the system administrator of the host "example.com", and verify that the fingerprint you are being presented with is the same fingerprint he has on the server for the RSA key. If the fingerprints match, you type "yes", and continue the connection. If the fingerprints do not match, you suspect a man-in-the-middle attack, and type "no". If the server is a server under your control, then rather than calling up the system administrator for that domain, you physically go to the box, pull up a console, and print the server's RSA fingerprint.

In either case, verifying a 32-character hexadecimal string is cumbersome. If we could have a better visual on the fingerprint, it might be easier to verify that we've connected to the right server. This is where the "randomart" comes from. Now, when connecting to the server, I can be presented with something like this:

The authenticity of host 'example.com (10.0.0.1)' can't be established. RSA key fingerprint is d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9. +--[ RSA 2048]----+ | o . | | . .o.o .| | . o .+.. | | . ...=o+| | S . .+B+| | oo+o.| | o o. | | . | | E | +-----------------+ Are you sure you want to continue connecting (yes/no)?

Because I have a visual representation of the server's fingerprint, it will be easier for me to verify that I am connecting to the correct server. Further, after connecting to the server many times, the visual fingerprint will become familiar. So, upon connection, when the visual fingerprint is displayed, I can think "yes, that is the same picture I always see, this must be my server". If a man-in-the-middle attack is in progress, a different visual fingerprint will probably be displayed, at which point I can avoid connecting, because I have noticed that the picture changed.

The picture is created by applying an algorithm to the fingerprint, such that different fingerprints should display different pictures. Turns out, there can be some visual collisions that I'll lightly address at the end of this post. However, this visual display should work in "most cases", and cause you to start verifying fingerprints of OpenSSH keys.

The Board

Because the bishop finds himself in a room, with no exits, and only walls, we need to create a visually square room on the terminal. This is done by creating a room with 9 rows and 17 columns, creating a total of 153 total squares the bishop can travel. The bishop must start in the exact center of the room, thus the reason for odd-numbered rows and columns.

Our board setup then looks like this, where "S" is the starting location of the drunk bishop:

1111111 01234567890123456 +-----------------+x (column) 0| | 1| | 2| | 3| | 4| S | 5| | 6| | 7| | 8| | +-----------------+ y (row)

Each square on the board can be thought as a numerical position from Cartesian coordinates. As mentioned, there are 153 squares on the board, so each square gets a numerical value through the equation "p = x + 17y". So, p=0, for (0,0); p=76 for (8,4), the starting location of our bishop; and p=152, for (16,8), the lower right-hand corner of the board. Having a unique numerical value for each position on the board will allow us to do some simple math when the bishop begins his random walk.

The Movement

In order to define movement, we need to understand the fingerprint that is produced from an OpenSSH key. An OpenSSH fingerprint is an MD5 checksum. As such, it has a 16-byte output. An example fingerprint could be "d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9".

Because the bishop can only move one of four valid ways, we can represent this in binary.

"00" means our bishop takes one move diagonally to the north-west.

"01" means our bishop takes one move diagonally to the north-east.

"10" means our bishop takes one move diagonally to the south-west.

"11" means our bishop takes one move diagonally to the south-east.

With the bishop in the center of the room, his first move will take him off square 76. After his first move, his new position will be as follows:

"00" will place him on square 58, a difference of -18.

"01" will place him on square 60, a difference of -16.

"10" will place him on square 92, a difference of +16.

"11" will place him on square 94, a difference of +18.

We must now convert our hexadecimal string to binary, so we can begin making movements based on our key. Our key:

d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9

would be converted to

11010100:11010100:11111101:11001010:...snip...:00111011:11100100:10111010:11101001

When reading the binary, we read each binary word (8-bits) from left-to-right, but we read each bit-pair in each word right-to-left (little endian). Thus our bishop's first 16 moves would be:

00 01 01 11 00 01 01 11 01 11 11 11 10 10 00 11

Or, you could think of it in terms of steps, if looking at the binary directly:

4,3,2,1:8,7,6,5:12,11,10,9:16,15,14,13:...snip...:52,51,50,49:56,55,54,53:60,59,58,57:64,63,62,61

Board Coverage

All is well and good if our drunk bishop remains in the center of the room, but what happens when he slams into the wall, or walks himself into a corner? We need to take into account these situations, and how to handle them in our algorithm. First, let us define every square on the board:

+-----------------+ |aTTTTTTTTTTTTTTTb| a = NW corner |LMMMMMMMMMMMMMMMR| b = NE corner |LMMMMMMMMMMMMMMMR| c = SW corner |LMMMMMMMMMMMMMMMR| d = SE corner |LMMMMMMMMMMMMMMMR| T = Top edge |LMMMMMMMMMMMMMMMR| B = Bottom edge |LMMMMMMMMMMMMMMMR| R = Right edge |LMMMMMMMMMMMMMMMR| L = Left edge |cBBBBBBBBBBBBBBBd| M = Middle pos. +-----------------+

Now, let us define every move for every square on the board:

Pos Bits Heading Adjusted Offset a 00 NW No Move 0 01 NE E +1 10 SW S +17 11 SE SE +18 b 00 NW W -1 01 NE No Move 0 10 SW SW +16 11 SE S +17 c 00 NW N -17 01 NE NE -16 10 SW No Move 0 11 SE E +1 d 00 NW NW -18 01 NE N -17 10 SW W -1 11 SE No Move 0 T 00 NW W -1 01 NE E +1 10 SW SW +16 11 SE SE +18 B 00 NW NW -18 01 NE NE -16 10 SW W -1 11 SE E +1 R 00 NW NW -18 01 NE N -17 10 SW SW +16 11 SE S +17 L 00 NW N -17 01 NE NE -16 10 SW S +17 11 SE SE +18 M 00 NW NW -18 01 NE NE -16 10 SW SW +16 11 SE SE +18

How much of the board will our bishop walk? Well, with our fingerprints having a 16-byte output, that means there are 64 total moves the bishop can walk. As such, the most board a bishop could cover, is if each square was only visited once. Thus 65/153 ~= 42.48%, which is less than half of the board.

Position Values

Remember that our bishop is making a random walk around the room, dropping coins on every square he's visited. If he's visited a square in the room more than once, we need a way to represent that in the art. As such, we will use a different ASCII character as the count increases.

Unfortunately, in my opinion, I think the OpenSSH developers picked the wrong characters. They mention in their PDF that the intention of the characters they picked, was to increase the density of the characters as the visitation count to a square increases. Personally, I don't think these developers have spent much time working on ASCII art. Had I been on the development team, I would have picked a different set. However, here is the set they picked for each count:

Freq 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Char . o + = * B O X @ % & # / ^ S E

The special characters "S" and "E" are to identify the starting and ending location of the bishop respectively.

Additional Thoughts

Now that you know how the random art is generated for a given key, you can begin to ask yourself some questions:

When addressing picture collisions, how many fingerprints produce the same picture (same values for all positions)?

How many fingerprints produce the same shape (same visited squares with different values)?

How many different visualizations can the algorithm produce?

Can a fingerprint be derived by looking only at the random art?

How many different visualizations can a person easily distinguish?

What happens to the visualizations when changing the board size, either smaller or larger?

What visualizations can be produced with different chess pieces? How would the rules change?

What visualizations can be produced if the board were a torus (like Pac-man)?

Could this be extended to other fingerprints, such as OpenPGP keys? If so, could it simplify verifying keys at keysigning parties (and as a result, speed them up)?

Conclusion

Even though this post discussed the algorithm in generating the random art, it did not address the security models of those visualizations. There are many questions that need answers. Obviously, there are collisions. So, how many collisions are there? Can it be discovered to predict the number of visual collisions based on the cryptographic hash, and size of the board? Does this weaken security when verifying OpenSSH keys, and if so, how?

These questions, and many others, should be addressed. But, for the time being, it seems to be working "well enough", and most people using OpenSSH probably only ever see the visualization when creating their own private and public key pair. You can enable it for every OpenSSH server you connect to, by setting "VisualHostKey yes" in your ssh_config(5).

In the meantime, I'll be working on a Python implementation for this on OpenPGP keys. It will use a larger board (11x19), and a different set of characters for the output, but the algorithm will remain the same. I'm interested to see if this can improve verifying people have the right OpenPGP key, by just checking the ASCII art, rather than reading out a 20-byte random string using the NATO alphabet. Keep an eye on https://github.com/atoponce/scripts/blob/master/art.py.