Date Sat 27 May 2017 Tags haskell / cryptography / spake2

There’s a joke about programmers that’s been doing the rounds for the last couple of years:

We do these things not because they are easy, but because we thought they would be easy.

This is about how I became the butt of a tired, old joke.

My friend Jean-Paul decided to start learning Haskell by writing a magic-wormhole client.

magic-wormhole works in part by negotiating a session key using SPAKE2: a password-authenticated key exchange protocol, so one of the first things Jean-Paul needed was a Haskell implementation.

Eager to help Jean-Paul on his Haskell journey, I volunteered to write a SPAKE2 implementation in Haskell. After all, there’s a pure Python implementation, so all I’d need to do is translate it from Python to Haskell. I know both languages pretty well. How hard could it be?

Turns out there are a few things I hadn’t really counted on.

I know next to nothing about cryptography Until now, I could summarise what I knew about cryptography into two points: It works because factoring prime numbers is hard I don’t know enough about it to implement it reliably, I should use proven, off-the-shelf components instead This isn’t really a solid foundation for implementing crypto code. In fact, it’s a compelling argument to walk away while I still had the chance. My ignorance was a particular liability here, since python-spake2 assumes a lot of cryptographic knowledge. What’s HKDF? What’s Ed25519? Is that the same thing as Curve25519? What’s NIST? What’s q in this context? python-spake2 also assumes a lot of knowledge about abstract algebra. This is less of a problem for me, since I studied a lot of that at university. However, it’s still a problem. Most of that knowledge has sat unused for fifteen or so years. Dusting off those cobwebs took time. My rusty know-how was especially obvious when reading the PDFs that describe SPAKE2. Mathematical notation isn’t easy to read, and every subdiscipline has its own special variants (“Oh, obviously q means the size of the subgroup. That’s just convention.”) For example, I know that what’s in spake2/groups.py is the multiplicative group of integers modulo n, and I know what “the multiplicative group of integers modulo n” means, but I understand about 2% of the Wikipedia page on the subject, and I have even less understanding about how the group is relevant to cryptography. The protocol diagrams that appear in the papers I read were a confusing mess of symbols at first. It took several passes through the text, and a couple of botched explanations to patient listeners on IRC before I really understood them. These diagrams now seem clear & succinct to me, although I’m sure they could be written better in code.

python-spake2 is idiosyncratic The python-spake2 source code is made almost entirely out of object inheritance and mutation, which makes it hard for me to follow, and hard to transliterate into Haskell, where object inheritance and mutation are hard to model. This is a very minor criticism. With magic-wormhole and python-spake2, Warner has made creative, useful software that solves a difficult problem and meets a worthwhile need.

crypto libraries rarely have beginner-friendly documentation python-spake2 isn’t alone in assuming cryptographic knowledge. The Haskell library cryptonite is much the same. Most documentation I could find about various topics on the web pointed to cr.yp.to pages, which either link to papers or C code. I think this is partly driven by a concern for user safety, “if you don’t understand it, you shouldn’t be using it”. Maybe this is a good idea. The problem is that it can be hard to know where to start in order to gain that understanding. To illustrate, I now sort of get how an elliptic curve might form a group, but have forgotten enough algebra to not know about what subgroups there are, how that’s relevant to the implementation of ed25519, how subgroups and groups relate to fields, to say nothing of how elliptic curve cryptography actually works. I don’t really know where to go to remedy this ignorance, although I’m pretty sure doing so is within my capabilities, I just need to find the right book or course to actually teach me these things.

Protocols ain’t protocols The mathematics of SPAKE2 are fairly clearly defined, but there is a large gap between “use this group element” and sending some bits over the wire. python-spake2 doesn’t clearly distinguish between the mathematics of SPAKE2 and the necessary implementation decisions it makes in order to be a useful networked protocol. This meant that when translating, it was hard to tell what was an essential detail and what was accidental detail. As Siderea eloquently points out, software is made of decisions. When writing the Haskell version, which decisions do I get to make, and which are forced upon me? Must this salt be the empty string? Can I generate the “blind” any way I want? Eventually, I found a PR implementing SPAKE2 (and SPAKE2+, SPAKE2-EE, etc.) in Javascript. From the discussion there, I was able to synthesize a rough standard for implementing. Jean-Paul helped by writing an interoperability test harness, which gave me an easy way to experiment with design choices.