I just checked the date on my last post and it seems that I haven’t blogged in nearly a month. Believe me, this isn’t for lack of trying. The world has just been a very, very busy place.

But this is the Thanksgiving holiday and every (US) American knows the best part of Thanksgiving is sitting around for hours waiting for a huge bird to cook thoroughly enough that it won’t kill your family. And that means I finally have time to write about my favorite wonky topic: cryptographic protocols. And how utterly confusing they can be.

ZRTP

The subject of today’s post is the ZRTP key agreement protocol. ZRTP has recently gotten some press for being the primary key agreement used by Silent Circle, a new encrypted voice/video/text service launched by PGP inventor Phil Zimmermann and some other bright folks. But it’s also used in other secure VoIP solutions, like Zfone and Moxie’s RedPhone.

Silent Circle’s an interesting case, since it’s gotten some gentle criticism lately from a variety of folks — well, mostly Nadim Kobeissi — for being partially closed-source and for having received no real security audits. Nadim’s typical, understated critique goes like this:

Silent Circle is morally bankrupt for promoting its unreviewed, closed-source, centralized crypto software and marketing it as “secure.”

— Nadim Kobeissi (@kaepora) November 6, 2012

And is usually followed by the whole world telling Nadim he’s being a jerk. Eventually a tech reporter notices the fracas and chimes in to tell us the whole Infosec community is a bunch of jerks:

This twitter.com/marshray/statu… is exactly the kind of thing that makes a community look juvenile, petty and not ready for primetime.

— Quinn Norton (@quinnnorton) November 7, 2012

And the cycle repeats, without anyone having actually learned much at all. (Really, it’s enough to make you think Twitter isn’t the right place to get your Infosec news.)

Now, as unhelpful as all this is, maybe we can make lemonade and let all of this serve as a teaching moment. For one thing, it gives us a (wonky) chance to learn a little bit about this ZRTP protocol that so many people seem to be using.

Overview of ZRTP

The ZRTP protocol is fully laid out in RFC 6189. This is one of the more confusing specs I’ve read — partly because the critical information is spread out across so many different sections, and partly because ZRTP seems determined to address every possible attack simultaneously.

Fortunately the Intro does a good job of telling us what the protocol’s up to:

ZRTP is a key agreement protocol that performs a Diffie-Hellman key exchange during call setup in the media path and is transported over the same port as the Real-time Transport Protocol (RTP) media stream which has been established using a signaling protocol such as Session Initiation Protocol (SIP) . This generates a shared secret, which is then used to generate keys and salt for a Secure RTP (SRTP) session.

So: simple enough. ZRTP lets us establish keys over an insecure channel using Diffie-Hellman.

However we all know that Diffie-Hellman isn’t secure against active (Man-in-the-Middle) attacks. Normally we prevent these by signing Diffie-Hellman messages using keys distributed via a PKI. ZRTP is having none of it:

Although it uses a public key algorithm, [ZRTP] does not rely on a public key infrastructure (PKI) … [ZRTP] allows the detection of man-in-the-middle (MiTM) attacks by displaying a short authentication string (SAS) for the users to read and verbally compare over the phone. … But even if the users are too lazy to bother with short authentication strings, we still get reasonable authentication against a MiTM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call’s DH shared secret, giving it key continuity properties analogous to Secure SHell (SSH).

So our protection is twofold: (1) every time we establish a connection with some remote party, we can verbally compare a “short authentication string” (SAS) to ensure that we’ve both agreed on the same key. Assuming that our attacker isn’t a voice actor, this should let us easily detect a typical MiTM. And just in case we’re too lazy to bother with this, even completing one ‘un-attacked’ connection leaves us with (2) a long-term shared secret that we can use to validate future connection attempts.

So is this a reasonable model? Well, you can draw your own conclusions — but it works fine for me. Moreover, I’m willing to accept just about any assumption that allows us to not think about the ‘Bill Clinton’ or ‘Rich Little attacks‘. So let’s just assume that this voice thing works… and move on to the interesting bits.

The Guts of the Protocol

ZRTP is an interaction between two parties, defined as an Initiator and a Responder. This figure from the spec illustrates the flow of a typical transaction:

Note that the identity of the party acting as the Initiator is determined during the protocol run — it’s the one that sends the Commit message (F5). The protocol breaks down into roughly four phases:

Discovery and protocol negotiation (F1-F4), in which the parties start up a protocol transaction and agree on a supported ZRTP version and ciphersuites. Diffie-Hellman key establishment (F5-F7). This is almost ‘missionary position’ Diffie-Hellman, with one exception (the F5 message), which we’ll talk more about in a second. Key confirmation (F8-F10), in which the parties verify that they’ve agreed on the same key. Secure communication. That’s the last section, labeled “SRTP begins”.

Discovery and Protocol Negotiation Messages F1-F4 are responsible for setting up a ZRTP connection. This stuff is (almost) entirely non-cryptographic, which means this is the part of the protocol where stuff is most likely to go wrong.

Let me explain: prior to completing the Diffie-Hellman key exchange in messages F5-F7, we can’t assume that Alice or Bob share a cryptographic key yet. Without a key, they can’t authenticate their messages — at least not until after the Diffie-Hellman transaction is complete. That means an attacker can potentially re-write any portion of those messages and get away with it. At least for a while.

A 4-character string containing the version of the ZRTP protocol. A Client Identifier string (cid), which is 4 words long and identifies the vendor and release of the ZRTP software. The 96-bit-long unique identifier for the ZRTP endpoint (ZID). A Signature-capable flag (S) indicates this Hello message is sent from a ZRTP endpoint which is able to parse and verify digital signatures. The MiTM flag (M), which has something to do with PBXes. The Passive flag (P), which is set to true if and only if this Hello message is sent from a device that is configured to always act as a Responder (not Initiator). The supported Hash algorithms, Cipher algorithms (including Diffie-Hellman handshake type), SRTP Auth Tag Types, Key Agreement Types, and SAS Types.

Which is to say: quite a lot! And some of these values are pretty critical. Changing the version number might allow an attacker to roll us back to an earlier (potentially insecure) version of the protocol. Changing the ciphers might (theoretically) allow us to switch to a weaker set of Diffie-Hellman parameters, hash function or Short Authentication String algorithm.

At this point a careful protocol reviewer will be asking ‘what does ZRTP does to prevent this?’ ZRTP gives us several answers, both good and not-so-good:

On the bad side, ZRTP allows us to send a hash of the Initiator’s Hello message through a separate integrity-protected channel, e.g., secure SIP. (This protection gets folded on into later messages using a crazy hash-chain and MAC construction.) In general I’m not a fan of this protection — it’s like saying a life-preserver will keep you alive… as long as you have a separate lifeboat to wear it in. If you have a secure channel in the first place, why are you using ZRTP ? (Even the ZRTP authors seem a little sheepish about this one.)

of the Initiator’s Hello message through a separate integrity-protected channel, secure SIP. (This protection gets folded on into later messages using a crazy hash-chain and MAC construction.) In general I’m not a fan of this protection — it’s like saying a life-preserver will keep you alive… as long as you have a separate lifeboat to wear it in. If you have a secure channel in the first place, ? (Even the ZRTP authors seem a little sheepish about this one.) On the confusing side, Hello messages are MACed using a MAC key that isn’t revealed until the subsequent message ( e.g., Commit). In theory this means you can’t forge a MAC until the Commit message has been delivered. In practice, an MiTM attacker can just capture both the Initiator Hello (F3) and the Commit message (F5), learn the MAC key, and then forge the Hello message to its heart’s content… before sending both messages on to the recipient. This entire mechanism baffles me. The less said about it the better.

Commit). In theory this means you can’t forge a MAC until the Commit message has been delivered. In practice, an MiTM attacker can just capture both the Initiator Hello (F3) the Commit message (F5), learn the MAC key, and then forge the Hello message to its heart’s content… before sending both messages on to the recipient. This entire mechanism baffles me. The less said about it the better. A more reasonable protection comes from the key derivation process. In a later phase of the protocol, both ZRTP parties compute a hash over the handshake messages. This value is folded into the Key Derivation Function (KDF) that’s ultimately to compute session keys. The hashing process looks like: total_hash = hash(Hello of responder || Commit || DHPart1 || DHPart2) But… zounds! One important message is missing: the Initiator’s Hello (F3). I can’t for the life of me figure out why this message would be left out. And unless there’s something really clever that I’m missing, this means an attacker can tamper with the Initiator’s Hello message without affecting the key at all.*So is this a problem? Well, in theory it means you could roll back any field in the Initiator Hello message without detection. It’s not exactly clear what practical benefit this would have. You could certainly modify the Diffie-Hellman parameters or ciphers to turn off advanced options like ECDH or 256-bit AES. Fortunately ZRTP does not support ‘export weakened‘ ciphers, so even the ‘weak’ options are pretty strong. Still: this seems like a pretty big oversight, and possibly an unforced error. For the moment, let’s file it under ‘this protocol is really complicated‘ or ‘don’t analyze protocols at Thanksgiving‘. At very least I think this could use a good explanation. (Update 11/25: After some back and forth, Moxie points out that the Hash, SAS and Cipher information is repeated in the Commit message [F5], so that should provide an extra layer of protection against rollback on those fields — but not the other fields like version or signature capability. And of course, an implementation might have the Responder prune its list of supported algorithms by basing it off the Initiator’s list, which would be a big problem.)

Diffie-Hellman Key Establishment Assuming that the two parties have successfully completed the negotiation, the next phase of the protocol requires the two parties to establish a shared key. This is done using (nearly) straight-up Diffie-Hellman. The parameters are negotiated in the previous phase, and are drawn from a list defined by RFC3526 and a NIST publication.

Normally a Diffie-Hellman agreement would work something like this (all exponentiation is in a group, i.e., mod p): Alice picks a, sends g^a (message F6). Bob picks b, sends g^b (message F7). Both parties compute g^ab and HMAC this value together with lots of other things to derive a session key s0. The parties further compute a 32-bit Short Authentication value as a function of s0, convert this into a human-readable Short Authentication String (SAS), and compare notes verbally. The problem with the traditional Diffie-Hellman protocol is that it doesn’t play nice with the Short Authentication String mechanism. Say an MiTM attacker Eve has established a connection with Bob, during which they agreed on SAS value X. Eve now tries to run the Diffie-Hellman protocol with Alice. Once Alice sends her choice of g^a, Eve can now run an offline attack, trying millions of candidate b values until she gets one such that the derived SAS between her and Alice is also X.This is a problem, since Alice and Bob will now see the same SAS, even though Eve is duping them.ZRTP deals with this by forcing one party (Bob) to commit to g^b before the protocol even begins. Thus, in ZRTP Bob picks b and sends H(g^b) before Alice sends her first message. This ‘commitment’ is transmitted in the “Commit” message (F5).

This prevents Bob from ‘changing his mind’ after seeing Alice’s input, and thus the remote party has at most one chance to get a colliding SAS per protocol run. Which means Eve is (probably) out of luck.**

Key Confirmation & Secure Communication

The rest of the protocol does all of the stuff you expect from a key agreement protocol: the two parties, having successfully completed a Diffie-Hellman exchange, now derive a session key by HMACing together the value g^ab with total_hash and any pre-shared secrets they hold from older sessions. If all goes well, the result should be a strong secret that only the two parties know — or an SAS mismatch that reveals Eve’s tampering.

From this point it’s all downhill, or it should be at least. Both parties now have a shared secret that they can use to derive secure encryption and MAC keys. They now construct “Confirm” messages (F8, F9), which they encrypt and (partially) MAC. In principle this exchange gives both parties a chance to detect a mismatch in keys and call the whole thing off.There’s not a ton to say about this section except for one detail: the MAC on these messages is computed only over the encrypted portion (between the == signs below), and leaves out critical details like the Initialization Vector that’s used to encrypt them: ***

Structure of a ZRTP “Confirm” message (source).

Once again it’s not clear if there’s any practical impact here, but at least technically speaking, someone could tamper with this IV and thus change the decryption of the message (specifically, the first 64 bits of H0). I have no idea if this matters — or even if it’s a real attack — but in general it seems like the kind of thing you want to avoid. It’s another example of a place where I just plain don’t understand ZRTP.

Conclusion

I think it should be obvious at this point that I have no real point with this post — I just love protocols. If you’re still reading at this point, you must also love protocols… or else you did something really wrong, and reading this post is your punishment.

ZRTP is a fun protocol to analyze, since it’s a particularly complicated spec that deals with many use-cases at once. Plus it’s an old protocol and it’s been subjected to lots of analysis (including analysis by robots). That’s why I’m so surprised to see loose ends at this date, even if they’re not particularly useful loose ends.

Regardless of the results, anyone who’s interested in cryptography should try their hands at this from time to time. There are so many protocols that we rely on in our day-to-day existence, and far too few of these have really been poked at.

Notes:

* An older paper by Gupta and Shmatikov also notes the lack of authentication for ZID messages, and describes a clever attack that causes the peers to not detect mutually shared secrets. This seems to be fixed in later versions of the protocol, since the ZID values are now included in the Key Derivation Function. But not the rest of the fields in the Initiator Hello.

*** This is another place where the specification is a bit vague. It says that the MAC is computed over the “encrypted part of the message”, but isn’t entirely clear if the MAC is computed pre- or post- encryption. If it’s pre- encryption this is a bit non-standard, but not a major concern.