Craig Gentry on board the mothership. (credit)

A couple of weeks ago I polled readers for the subjects that they were interested in. You gave me some excellent responses, and I promise they’re all in the hopper.

By far the most popular request was for some background on the recent results in computing on encrypted data, or ‘Fully-Homomorphic Encryption’. Even though the current techniques are still in the research phase — way outside the domain of the ‘practical’ crypto I usually talk about — this topic is so neat that it deserves a few words.

Before I get started, I want to make a few important stipulations. First, I’m hardly the world’s leading expert on the subject. Moreover, plenty of real experts have already published highly accessible introductory pieces. If you’re interested, you should check out Craig Gentry’s fantastic intro paper, or even his (surprisingly readable) PhD thesis. Alternatively, you can go directly to some of the recent papers on FHE.

My last warning is that this subject is kind of involved. I’m going to do my best to keep this explanation relatively non-technical (see the papers above if you want the gory details), but it could still get fairly long.

In this first post I’m going to cover some of the background behind FHE, and explain why it’s such a neat problem.

Why encryption is not like a safe

People love to use analogies to talk about encryption. Sometimes these are helpful, sometimes they’re just limiting. Consider this one:

Encrypting a document is like placing it inside of a locked safe.

The locked safe is a great teaching example because cryptography and physical safes (usually) serve the same purpose: they ensure the confidentiality of sensitive data. In practice, they also share many of the same drawbacks.

If you’ve ever worked in an environment where safe-storage is required (e.g., a bank or intelligence agency) you probably know what I’m talking about. Once you lock a document into a safe, your document is locked inside of a damn safe.

Consequently, people tend to remove useful documents from safe storage at the first chance they get. This exposes them to all the usual threats, and explains why so few cases of document theft involve safecracking. Typically the same principle holds for encryption. People decrypt their data so they can use it.

But analogies are never perfect. Encrypting a document isn’t the same as putting it into a physical lockbox. And this is a good thing! Because in fact, there is a kind of encryption that allows us to bypass some of these limitations. We refer to this as homomorphic encryption, and its defining characteristic is this: you can perform useful operations on encrypted values without decrypting them first.

This may seem like an exotic property. Trust me, it’s not. In fact, cryptographers have put a lot of effort into removing the homomorphic properties from common public-key schemes like Elgamal and RSA. Without those protections, both schemes are homomorphic with respect to (modular) multiplication. This means you can multiply together any two Elgamal ciphertexts, and upon decryption you’ll find that the (single) resulting ciphertext now embeds the product of the two original plaintexts. Neat!

Homomorphic encryption has some immediate practical applications. Consider the Paillier scheme that’s used in several electronic voting protocols. Paillier is homomorphic with respect to addition. Now imagine: each voter encrypts their their ballot as a number (0 or 1) and publishes it to the world. Anyone can now tally up the results into a final ciphertext, which makes it hard for a corrupt election judge to throw away legitimate votes. Decrypting the final ciphertext reveals only the total.*

A few bits of history

Homomorphic encryption is hardly a new discovery, and cryptographers have long been aware of its promise. Way back in 1978 (about five seconds after the publication of RSA), Rivest, Adleman and Dertouzos proposed homomorphic encryption schemes that supported interesting functions on encrypted data. Regrettably, those first attempts kind of sucked.** Thus, the agenda for researchers was twofold: (1) come up with secure encryption schemes that could handle useful homomorphisms, and (2) figure out how to do interesting things with them.

To be interesting, a homomorphic encryption scheme should at very least permit the evaluation of useful mathematical functions, e.g., polynomials. But no computer scientist in history has ever been satisfied with mere polynomials. The holy grail was something much neater: a scheme that could handle arbitrary computations — embodied as real computer programs! — on securely encrypted inputs.

This idea — sometimes called ‘cryptocomputing’, or ‘computing on encrypted data‘ — has a way of capturing the imagination. There’s something fascinating about a computer that works on data it can’t see. More practically, a technology like this would eliminate a very real weakness in many security systems — the need to decrypt before processing data. It could even spawn a whole business based on outsourcing your computations to outside parties. (Something you obviously wouldn’t do without strong cryptographic protections.)

Anyway, it was a beautiful dream. There was just one problem: it didn’t work.

To explain why, let’s go back to some of the encryption schemes I mentioned above. Throughout the ’80s and ’90s researchers came up with these, and many more interesting schemes. Quite a few supported some kind of homomorphism, usually multiplication or addition. However, none seemed capable of handling even both operations simultaneously — at least not without serious limitations.

For researchers this was frustrating. Coming up with such a ‘doubly homomorphic’ scheme was an obvious first step towards the higher purpose. Even better, they quickly realized, this ‘first step’ was also the last step they’d need to achieve arbitrary computation.

How’s that? Well, imagine that you have a doubly homomorphic encryption scheme that encrypts bits, meaning that every plaintext is either 0 or 1. Given ciphertexts encrypting bits A and B, we could use this scheme to compute the simple function 1+A*B. Keeping in mind that all arithmetic is binary (i.e., modulo 2), such a function would produce the following truth table:

A B : 1+A*B

0 0 1

0 1 1

1 0 1

1 1 0 Why the excitement? Well, this table describes a NAND gate. And any computer engineer can tell you that NAND is a big deal: once you’ve got it, you can derive all of the other useful boolean logic gates: AND, OR, NOT, XOR and XNOR.*** And that means you can implement circuits.

To a theoretical computer scientist this is a Big Deal. Given an encryption scheme like this, we could encrypt our input one bit at a time, then send the encrypted values to a third party for processing. This party would run an arbitrary program just by rendering it into a huge circuit — a series of connected boolean logic gates — and evaluating the result one gate at a time. At the end of the process we’d get back a bunch of ciphertexts containing the (bit) results.

possible. If only we had such an encryption scheme. In theory, the existence of an appropriate encryption scheme would give us everything we need to, for example, play Halo on encrypted inputs. This would obviously be a poor gaming experience. But it would be. If only we had such an encryption scheme.

A brief note At this point I’d like to take a quick break to address the more practical kind of reader, who (I suspect) is recoiling in horror. I know what you’re thinking: I came here for computing, and this is what you’re giving me? Break the input into single bits and process them one gate at a time? Well, yes. That’s exactly how it’s going to work — at least, if we want general computation. And I stipulate that in many ways it’s going to suck. Consider, for example, a loop like this one: while (encrypted_value < 100) { perform_some_operation_on(&encrypted_value); } Just try converting that into a circuit. I mean, it’s not impossible to unroll loops (if you know the maximum number of iterations), but the resulting circuit is not likely to be practical. Moreover, this isn’t purely an issue with the use of circuits, but rather with the use of encrypted data. No matter what computational model you employ, you’re always going to have difficulty with things like control flow changes that depend on input data that the executing party can’t see. This makes it tough to implement the efficient programs that we’re accustomed to running on typical random access machines. Simply writing a bit to encrypted ‘RAM’ might require you to recalculate every bit in memory, at least, if the write location is dependent on the input data. And no, I’m not going to reassure you that it gets better from here. Actually it’s going to get a lot worse once cryptography comes into the picture. That’s because each of these ‘bits’ is actually going to become a ciphertext — potentially hundreds or thousands of bits in length. Not to mention that evaluating those logic gates is going to require some pretty serious computing. I’m pointing this out not to dismiss the research — which we’ll get to, and is pretty amazing — but rather, to point out that it is research. We aren’t going to be outsourcing general programs with this anytime soon — and in fact, we may never do so. What we might do is find ways to implement specialized subroutines with very high sensitivity requirements: e.g., stock trading models, proprietary bioinformatics processes, etc. By combining these with other less-general techniques, we could accomplish something pretty useful. In Summary I’ve written just about all I can fit in a reasonable blog post, and I realize that I’ve barely covered any of the actual research. What I did accomplish was to lay out some of the background behind the recent developments in fully-homomorphic encryption. In the next post we’ll talk about the search for an appropriate encryption scheme, some of the failures, and Gentry’s eventual success. Notes: * Obviously there’s more to this. See, for example, this paper for some of the complexity. ** This might sound insulting, but it’s not. As I’ve said before, ‘suck’ is a purely technical term for schemes that aren’t semantically secure, i.e., indistinguishable under chosen plaintext attack. *** Two notes here: First, you can obviously derive these gates more directly. For example, AND is (A*B). Second, while I’ve used the example of a scheme that encrypts only bits (meaning that addition and multiplication are always mod 2), the encryption scheme doesn’t have to be limited this way. For example, consider a scheme that encrypts arbitrary integers (say, a finite ring). As long as you know that the inputs (A, B) are both in {0, 1}, you can implement the NAND gate as 1-(A*B). This is a more common description and you’ll see it in most papers on the subject.