Recently at Ars we've had a couple of discussions about the use of HTTPS—that is, HTTP secured using SSL or TLS—for every website, as a way of keeping sensitive information out of reach of eavesdroppers and ensuring privacy. That's definitely a good thing, but it has a flaw: it requires HTTPS to actually be effective at protecting privacy. Recent goings on at Certificate Authority (CA) Comodo provide compelling evidence that such trust is misplaced.

There are two interrelated aspects to SSL. The first is encryption—ensuring that nobody can understand the communication between a client and a server—and the second is authentication—proving to the client that it is actually communicating with the server it thinks it's communicating with. When a client first connects to an HTTPS server, both parties have a bit of a problem. They would like to encrypt the information they send each other, but to do this, they both need to be using the same encryption key. Obviously, they cannot just send the key to each other, because anyone listening in on the connection will be able to watch them do so, and use the key to decrypt the communication themselves. Fortunately, clever mathematics allows both parties to share an encryption key without it being disclosed to any eavesdroppers.

Defeating the man-in-the-middle

But what if instead of merely eavesdropping, the malicious party actually interferes with the connection, placing itself between the client and the server, intercepting everything sent between the two, known as a man-in-the-middle (MITM) attack. This would be a big problem. The MITM could act as the server (as far as the client was concerned) and the client (as far as the server were concerned), sharing one key with the client and another with the server. He could then decrypt anything the client said, examine it, and then re-encrypt it and send it to the server, and neither side would be any the wiser.

This is where authentication, in the form of certificates, comes to the rescue. Certificates are an application of public key cryptography. With normal encryption, the key used to encrypt data is the same key as is used to decrypt data; if you know the key, you can both encrypt and decrypt as you see fit. Public key cryptography, however, uses two keys: a private key, that is kept secret, and a public key, that is shared with the world. Each key only works "one way"; anything encrypted with the public key can only be decrypted with the private key, and anything encrypted with the private key can only be decrypted with the public key.

Public key cryptography is very powerful, because it enables the establishment of trust. If a public key can be used to decrypt a piece of information then it's all but certain that the information was originally encrypted with the corresponding private key. And so, this mechanism is built into SSL. The server publishes a certificate—a little chunk of data that includes a company name, a public key, and some other bits and pieces—and when the client connects to the server, it sends the server some information encrypted using the public key from the certificate. The server then decrypts this using its private key. This information is used to encrypt subsequent communication.

Since only the server knows the private key—and hence only the server can decrypt the information encrypted with the public key—this allows the client to prove that it's communicating with the rightful owner of the certificate. That's still not quite enough to safeguard against MITM attacks, however. To defeat this setup, the MITM just has to do a little bit more work—he would have to create his own certificate with a private/public key pair—but with this, he could still sit between client and server, acting as server to the client and client to the server, listening in on everything sent between the two.

The solution: trust

So there's one more piece to the puzzle: a chain of trust. To verify the authenticity and identity of the certificates themselves, they are linked back to a trustworthy source of certificates. Instead of simply generating a certificate oneself (called a "self-signed certificate"), one instead pays some money to a Certificate Authority (CA) and has it generate the certificate. Every certificate the CA generates is marked as originating from them (again using the properties of public key cryptography), and most Web browsers and operating systems will only trust certificates that directly or indirectly link back to one of a handful of CAs, the "root CAs." Any certificate that doesn't link back to a root CA—such as a self-signed certificate—will generate a big scary warning in the browser. Operating systems and browsers have preinstalled copies of the root CA certificates so that they can validate these links.

In principle, each CA will only issue a certificate if the organization buying the certificate proves their identity to the CA by sending notarized paperwork or some similar mechanism. This means that a certificate purporting to represent, say, Amazon must genuinely have been issued to Amazon. Some certificates, called Extended Validation (EV) certificates have an even higher identification threshold (and price) before they can be issued. The CAs shouldn't issue certificates claiming to represent Amazon to any company that isn't Amazon.

This is what allows the man-in-the-middle to finally be defeated. Although he can create his own certificate pretending to belong to the server that the client is trying to connect to, what he can't do is to create a certificate that is linked back to a root CA—the root CA will only issue certificates to their rightful owners. And since the Web browser won't trust any certificate that doesn't link back to one of the root CAs it knows about, the MITM can no longer secretly place himself between the client and the server—any attempt to do so will result in a big warning or error message in the client's Web browser.

So, that's how it should all work. And each part is necessary: without the chain of trust, the certificate authentication can't be trusted; without the certificate authentication, the encryption can't be trusted; and without the encryption, there's no protection against eavesdroppers.

The mathematics behind the authentication and encryption are pretty robust (at least given current knowledge), so those parts are reasonably safe. But an awful lot of trust is placed on those root CAs. If a root CA starts issuing certificates to people that it shouldn't—giving a hacker a certificate purporting to be Amazon, say—then the whole system collapses. The hacker can act as a man-in-the-middle and the client's Web browser will actually trust his certificate. No warning about self-signed certificates; everything will just work as if nothing were wrong.