This past weekend we ran a piece from Wired that looked at the issues surrounding unencrypted HTTP traffic and wondered why all websites don't use HTTPS by default. The article puts forth an interesting premise—the wholesale encryption of all HTTP traffic—and lists a number of reasons why this hasn’t happened yet.

The only problem is that many of these issues, mostly technical in nature, are red herrings and can be easily handled with cleverness by an engineering team focused on transmitting its entire application over an encrypted channel. The real issues begin to arise, however, when your application must include assets served by servers which also do not support SSL. We’re going to discuss some of the issues raised by the article, correct some of the more specious arguments, explain how an organization can work with the real constraints of HTTPS, and give some insight into what we consider to be the real barriers to wholesale HTTPS encryption of the Web.

Caching

According to the Wired article, one of the things keeping SSL down is that content served over SSL cannot be cached. There's a lot of confusion about this issue, but the takeaway is that it is indeed very possible to make sure that your content transmitted over SSL is cacheable by end users.

There are a few levels of resource caching for most HTTP requests. Client-side browser caching is important and helps browsers avoid making duplicate requests for fresh content. Server-side caching (in proxies, for instance) is important for some users some of the time.

Modern browsers cache secure content like they’re supposed to, even when using SSL. They respect the various cache-control headers servers send and let Web developers minimize HTTP requests for commonly requested content. On Ars Technica, for instance, you pull down a number of stylesheets, scripts, and image files that will sit around in your local cache for a very long time. When you visit a new article, you will only need to load the text and images specific to that piece. Assuming we transmitted content over SSL, our client-side caching would still work gloriously well.

Public proxy caching, though, does not work for SSL traffic. Public proxies can’t even “read” responses as they pass through, which is kind of the point of SSL (think of a proxy as a man in the middle). ISPs in countries like Australia would historically run caching proxies to help minimize latency for commonly requested files. This practice is becoming less common, partially because global content delivery networks (CDNs) mean static files are geographically closer to their users, and partially because users spend time using sites like Facebook where pages are tailored specifically to them.

Web developers dealing with SSL need to understand the various cache-control headers along with how they instruct browsers to keep content around, and make sure to use them properly.

Performance impact

SSL has a performance impact on both ends of a connection—it should be noted that Google has been spending considerable effort to improve the situation, on both the server and the client side, and is trying to gently push the whole community into joining it. The initial negotiations are somewhat intensive, and Web providers need to be aware of the extra horsepower required to encrypt requests.

Performance is only rarely the reason sites don’t use SSL, though. It’s a consideration, but only a minor one for the vast majority of site owners. In the absence of other issues, it really wouldn’t matter for most people.

Firesheep: how a great UI can make the Internet more secure

Sending and receiving unencrypted data is not generally a big deal, until you need to identify yourself to a website and see customized pages—Facebook, Twitter, and even Google deliver content that is unique to a signed-in user. Most sites of this nature generate a unique session token for your account and transmit it to your browser, where it’s stored as a cookie. On all subsequent requests to that domain, your browser will send the content of all your cookies—including your session cookie, which will uniquely identify you. There are a number of techniques to make this practice more secure.

This works remarkably well until you’re sitting in your local PeetsArgobucks coffee shop on its open WiFi and your HTTP data is whizzing past other peoples’ heads—and past their wireless connections. It’s all easy enough to snatch out of the air—passwords, session tokens, credit cards, you name it. You are on a hubbed connection where everyone’s traffic is exposed, unencrypted, to all other users of the same network.

Until recently, most people—and most website operators—gave little thought to the prospect of someone snarfing these unique strings from plain-text connections. Most people really only considered SSL a necessity for online banking or similar “sensitive” applications.

The combination of ubiquitous WiFi and the number of services linked directly and deeply to our personal lives ensured it was only a matter of time before someone came up with a dead-simple way of grabbing your data out of thin air. Don’t be fooled though; this has always been possible. A kid with a Linux laptop running Wireshark (née Ethereal) could have done the exact same thing in 2001—the only thing holding it back was this tool’s arcane UI and übernerd operating parameters.

Firesheep is a user-friendly way of doing exactly this with a simple Firefox plug-in that anyone can operate. Each time you request a Facebook page, your browser sends its token along and your friendly coffee shop skeezebag grabs it with Firesheep. He or she then sends that token back to Facebook, successfully pretending to be you.

SSL prevents this, mostly. That is, when you make a request over SSL, the would-be interloper can’t see what’s in that request. So having an SSL option is great, right? The answer is a definitive “maybe."

The HTTP spec defines a “Secure” flag for cookies, which instructs the browser to only send that cookie value over SSL. If sites set that cookie like they’re supposed to, then yes, SSL is helping you out. Most sites don’t, however, and browsers will happily send the sensitive cookies over unencrypted HTTP. Our hypothetical skeezebag really just needs some way to trick you into opening a normal HTTP URL, maybe by e-mailing you a link to http://yourbank.com/a-picture-of-ponies-and-rainbows.gif so he can sniff the plain-text cookie off your unencrypted HTTP request, or by surreptitiously embedding a JavaScript file via some site’s XSS vulnerability.

Mixed modes

We’ve all experienced “mixed mode” warnings, with some browsers being much more annoying about them than others. "Mixed mode" means you requested a page over SSL, but some of the resources needed to fully render that page are only available over unencrypted HTTP. The page you’re looking at includes a mix of components, including images, scripts, stylesheets, and third-party assets—all of which would need to be delivered via SSL to avoid mixed-mode warnings.

Browsers will rightfully complain about mixed mode, usually by styling the SSL/HTTPS browser icons. Chrome puts a nice red X over the lock icon and strikes the “https” text. Some browsers are even more annoying and strict. IE pops up a modal warning each time a user requests a secure page with insecure assets. Some organizations have it locked down even further, keeping IE from even requesting insecure content—resulting in badly formatted pages being displayed.

Annoyances aside, mixed mode is still a problem to be avoided. Annoying as the warnings are, they also potentially subvert the entire purpose of encrypting traffic in the first place. A nefarious hotspot operator can not only read unencrypted traffic, he can also alter it as it crosses his network. A “secure” page that includes insecure JavaScript makes it relatively easy to hijack session tokens (again). In many cases, JavaScript in a page has access to the same cookie data the server does. The HTTP spec does define a “HttpOnly” flag for cookies that instructs browsers to keep the value out of the DOM. It’s extremely rare to see that set, though.

Doing SSL right

We’ve looked pretty extensively at serving Ars Technica over HTTPS in the past. Here’s what we’d need to do to make this a reality:

First, we would need to ensure that all third-party assets are served over SSL. All third-party ad providers, their back-end services, analytics tools, and useful widgets we include in the page would need to come over HTTPS. Assuming they even offer it, we would also need to be confident that they’re not letting unencrypted content sneak in. Facebook and Twitter are probably safe (but only as of the past few weeks), and Google Analytics has been fine for quite a while. Our ad network, DoubleClick, is a mixed bag. Most everything served up from the DoubleClick domain will work fine, but DoubleClick occasionally serves up vetted third-party assets (images, analytics code) which may or may not work properly over HTTPS. And even if it “works,” many of the domains this content is served from are delivered by CDNs like Akamai over a branded domain (e.g. the server’s SSL cert is for *.akamai.com, not for s0.mdn.net, which will cause most browsers to balk).

Next, we would need to make sure our sensitive cookies have both the Secure and HttpOnly flags set. Then we would need to find a CDN with SSL abilities. Our CDN works really well over HTTP, just like most other CDNs. We even have a lovely “static.arstechnica.net” branded host. CDNs that do expose HTTPS are rare (Akamai and Amazon’s CloudFront currently support it), and leave you with URLs like “static.arstechnica.net.cdndomain.com”. It would work, but we’d be sad to lose our spiffy host name and our great arrangement with CacheFly.

We would also have to stick another web server in front of Varnish. We use Varnish as a cache, which would still work fine since it would speak plain HTTP over our private network. It can’t encrypt traffic, though, so we’d need another proxy to decrypt data from readers and take Varnish responses and encrypt them.

Lastly, we would have to find a way to handle the user-embedded-content scenario. Images in comments or forums can come from any domain, and these hosts almost universally support SSL poorly or not at all. One solution is to prohibit user-embedded content (which we don’t want to do), or proxy it through a separate HTTPS server that we control. Github has implemented this using a product called camo and Amazon's CloudFront CDN. When every page is rendered, its front-end application rewrites all image links specified with the ‘http://’ protocol to be requested from its camo server, which requests and caches the image and serves it over SSL.

All of this is very technically doable; the difficulty comes in getting our partners on board. Our hands are unfortunately tied by the limits of their capabilities.