Hi, I’m Mark Nottingham. I currently co-chair the IETF HTTP and QUIC Working Groups, and am a member of the Internet Architecture Board . I usually write here about the Web, protocol design, HTTP, and caching. Find out more .

What to Look For in a HTTP Proxy/Cache

Part of my job is maintaining Yahoo!’s build of Squid and supporting its users, which use it to serve everything from the internal Web services that make sites go to serving Flickr’s images.

In that process, I often am asked “what about X?”, where X is another caching or load balancing product (yes, Squid can be used as a load balancer). For example, Varnish, or lighttpd.

Generally, these comparisons come down to three factors; performance, features and manageability. Almost invariably, Squid doesn’t do as well as newer comers in performance (although it generally is faster than Apache), but wins on features and manageability — and that’s why it’s so widely used.

I’m not going to argue that Squid is best for every deployment, but I do think that it’s important to evaluate the whole picture, rather than just one metric. So, here are a few initial thoughts about what’s important when you’re evaluating a proxy/cache:

Performance

Performance can mean a lot of things. The least interesting but most widely cited benchmark for this kind of server is “how many 1k responses can it serve from memory per second?” but that doesn’t tell you how it will do serving 200K (or 200M) responses from disk, which is a much more difficult thing to manage.

Try looking at:

Persistent connections — support is necessary for good performance, and a good proxy server should be able to handle tens of thousands of idle connections with nearly no overhead. This means epoll, kqueue, libevent or similar under the covers. Make sure it supports HTTP/1.1 chunked encoding as well.

— support is necessary for good performance, and a good proxy server should be able to handle tens of thousands of idle connections with nearly no overhead. This means epoll, kqueue, libevent or similar under the covers. Make sure it supports HTTP/1.1 chunked encoding as well. Hit rates — it’s necessary to test the code paths both to the cache and forward to the origin server.

— it’s necessary to test the code paths both to the cache and forward to the origin server. Response sizes — seeing how a server handles large vs. small responses can be revealing about its internal pipeline; if the data is copied a lot, large responses will have a disproportionate effect.

— seeing how a server handles large vs. small responses can be revealing about its internal pipeline; if the data is copied a lot, large responses will have a disproportionate effect. Response latency — intermediation means adding an extra layer, so it’s critical that it doesn’t add latency as well. A proxy should be able to serve a hit in less than a millisecond, easy.

— intermediation means adding an extra layer, so it’s critical that it doesn’t add latency as well. A proxy should be able to serve a hit in less than a millisecond, easy. Overload behaviour — when the proxy gets overloaded, it should degrade gracefully.

— when the proxy gets overloaded, it should degrade gracefully. Disk traffic — Disk caching is much more difficult to get right. Testing this is tricky, because you have to have a working set bigger than the memory cache. See polygraph for a start.

— Disk caching is much more difficult to get right. Testing this is tricky, because you have to have a working set bigger than the memory cache. See polygraph for a start. Cache efficiency — some implementations take shortcuts which improve benchmark speeds by making the cache less than perfectly efficient for a given workload. This may or may not be good for your use case, but you need to be aware of it.

— some implementations take shortcuts which improve benchmark speeds by making the cache less than perfectly efficient for a given workload. This may or may not be good for your use case, but you need to be aware of it. Extra features — often, caches are benchmarked with everything “extra” — e.g., logging, ACLs — turned off. If they’re in the critical path, these can affect performance greatly.

— often, caches are benchmarked with everything “extra” — e.g., logging, ACLs — turned off. If they’re in the critical path, these can affect performance greatly. Per-object overhead — caches need to be able to find things quickly, and this means some level of per-cached-response overhead in memory. The amount of overhead can limit the overall scalability of your cache (because it consumes too much memory), but sometimes having more metadata in memory helps the cache be more efficient.

Features

Concurrence

How does the proxy handle multiple requests for the same URL? This is often critical in “reverse proxy” deployments, where a flood of requests can come in for the same thing if it gets suddenly popular, or when you first bring a cache online. If the response isn’t cached and fresh, that flood of requests can quickly overcome your back-end servers.

There are a few techniques for dealing with this. Collapsed forwarding will only allow one request for a URL to go forward at a time, if there isn’t anything in cache; if the response is cacheable, it will be sent to all waiting clients, saving those requests from going forward and swamping the origin server.

If something is cached but stale, stale-while-revalidate lets the cache serve the stale response while it refreshes what it has in the background. Not only does this save you from a flood of validation requests, but it also effectively hides the latency of refreshing your content from your clients, offering better quality of service.

ACLs

In my experience, one of the biggest things that gets a workout in a proxy/cache is the ACL system. Make sure you have maximum flexibility here; e.g., can you apply access control to something based on whether it’s a cache miss? Can an ACL select things by the request method, URL, headers, client address? Can you combine ACL directives? Can you extend the ACL system?

Streaming and Buffering

A good proxy will offer fine-grained control over how it buffers requests and responses. For example, if you’re deploying as a reverse proxy, you want to be able to buffer up the entire response, so that you can free up resources on the origin server as quickly as possible if the client is slow. Likewise, buffering the request before sending it to the origin server can help conserve resources in some deployments, increasing capacity.

Conversely, however, it’s not good if your proxy requires responses to be buffered before they’re sent; this consumes too many resources on the proxy if you’re sending large responses, and doesn’t work at all for streaming applications (e.g., video).

Cache Behaviour Tuning

Although HTTP has excellent controls to allow both the origin server and the client to say how caches should behave, inevitably there will be cases where you’ll need to… ahem… fine-tune them. This includes tuning the heuristic algorithm, which is what to do when there are no such instructions.

It also includes overriding the specified behaviour. For example, a reverse proxy probably wants to ignore Cache-Control: no-cache, since the cache is under control of the origin server.

All of these tuning knobs need to be applicable in a fine-grained way; Squid does it with regular expressions against the URL (in refresh_patterns).

Cache Configuration

The cache as a whole needs to be configurable as well.

For example, when the set of cached objects gets larger than the allocated memory or disk space, the cache needs to evict some. As a mountain of research will attest, some replacment policies are more efficient than others, especially under different workloads.

Resilience to Errors

Networked systems inevitably fail. Besides the obvious aspects of this (e.g., configurable timeouts ), in a cache it’s also important to handle failures as gracefully as possible, to preserve both quality of service and cache efficiency.

Stale-If-Error helps to hide temporary back-end problems by allowing a cache to use a stale cached response (if available) when it can’t get a fresh one, or if the server returns an error code like 500 Internal Server Error. For situations where having something stale is better than nothing at all, this helps.

Quick Abort works from the other side; when the client aborts (because of a network or software problem, or a simple timeout), the cache should be able to be configured to continue downloading the response from the server, so that the next client will have the benefit of having it in cache.

Peering

Caches are often deployed in sets, both to increase capacity and also to assure availability. In these deployments, support for inter-cache protocols like ICP and HTCP means a better hit rate and, perhaps more importantly, the ability to bring a “cold” cache up-to-speed without overloading origin servers.

When evaluating support for peering, keep in mind that HTCP is more capable than ICP, because it takes into account the request headers, not just the URL. Also, HTCP CLR support means that something becoming invalid in one cache can trigger purges from neighbouring caches too (a pattern I’ll talk more about soon). Good implementations should also have a means of assuring that forwarding loops don’t happen.

Finally, Cache Digests are an interesting way to use a Bloom filter; by keeping a lossy digest of peers’ contents, it’s possible to predict whether a given request will be a hit. This is useful when the latency between peers makes “normal” inter-cache protocols too expensive (e.g., deployments between coasts or continents).

Routing

Proxies often get used as layer 7 routers; usually, to shift traffic around to the right server, for some value of “right.” A good proxy will have a number of tools to help you do this, including active and passive monitoring of peers and origin servers (to determine health and availability), flexible request rewriting (including both the request URI and response Location headers), and controls over how many connections can go to a particular server, as well as how many idle connections to keep open to each server.

Another form of routing is CARP, which routes based upon a consistent hashing algorithm — like DHTs. This allows you to build a massive array of caches to serve a very large working set (e.g., photos, a CDN).

One thing that often goes hand in hand with routing is retries — i.e., being able to try a different origin server (or IP address, or peer) if you can’t get a successful answer on the first try (if allowed by the protocol; this makes sense for GET, not POST, obviously).

Getting the Standards Right

Really, this isn’t a feature, it’s a floor to entry. If you’re going to use a proxy/cache, you have to be sure that it’s going to behave in a predictable, interoperable way, and that means conforming to HTTP1.1, SSL and all of the other applicable standards.

In the case of HTTP, this means not taking shortcuts; for example, variant caching is hard, but it’s necessary to have it for a cache to be useful. A great tool to help evaluate this is Co-Advisor.

Manageability

Stability

A proxy is worthless if it goes down all of the time, or if you’re worried that it will. Part of this is how mature it is, and part is how well it’s been tested. One of the reasons I like Squid is that it’s used in thousands (if not tens of thousands) of applications around the world; it’s been around for more than a decade, so it’s been hammered on hard.

Because of this breadth of deployment, I can confidently use it in a new (to me) situation, knowing that it’s probably been used in that way before. Contrast this with software that’s been designed for a particular purpose and hasn’t been used outside that narrow profile very much.

Metrics

Managing a cache means knowing what it’s doing, and what went wrong if you have a problem. A good implementation should have extremely extensive metrics available, ideally in many forms (e.g., over HTTP, SNMP, in logs), as well as easy-to-use debugging mechanisms, because at the end of the day all of these platforms are really complex beasts.

Ease of Use

Finally, caches have to be intuitive to use. Typically, they’re designed for a sysadmin or a netadmin, not a developer, and I think this is a shame, because these days that should be a primary audience.