This piece is provoked by Ryan Daigle’s What’s New in Edge Rails: Simpler Conditional Get Support (ETags). I think it’s an important subject. I realize that many readers here understand what ETags are and why they matter, and will see right away how the API Ryan describes fits into the picture. If you don’t, and you build Web apps, it’s probably worth reading this and following some links.

Caching in General · If, every time a browser processed a URI beginning http://... , it actually fired off a GET at a server, and if, every time a GET hit a server, the server actually recomputed and sent the data, the Web would melt down PDQ. There is a lot of machinery available, on both the client and server side, to detect when the work of computing results and sending them over the network can be avoided. Normally, we use the term “caching” to refer to all this stuff.

There is browser caching and expiration dates and cache-control headers, which are worth implementing but I’m not going to cover here. And on the server side, there are a variety of caching tools, of which memcached is the best-known example.

If You Must Ask · Even with all the caching, there are lots of occasions when a Web client has to fire off that GET , and it gets through to your server-side application code. But that doesn’t necessarily mean you have to compute and transmit. If the server discovers that whatever the URI identifies hasn’t changed since that client last fetched it, the server can send an (essentially) one-line response labeled with the HTTP Status Code “304 Not Modified”. This saves network bandwidth, because you don’t retransmit the whole resource representation, and if it’s done cleverly, may save a whole bunch of computation and I/O for the server.

Time-stamping · The most obvious way to accomplish this is for the client to send an HTTP If-Modified-Since header containing the date the URI was last fetched. This works just fine, particularly for a resource which is a static file (in fact, popular Web servers have this built right in). But sometimes a single time-stamp isn’t enough information for a server to figure out whether the client needs a fresh copy of whatever it’s asking for.

ETags are for this situation. The way it works is, when a client sends you a GET, along with the result you send back an HTTP header like so:

ETag: "1cc044-172-3d9aee80"

Whatever goes between the quotation marks is a signature for the current state of the resource that’s been requested. Here’s an example: Suppose you’ve got some sort of social-networking Web app, and a user asks to see her profile page. The way the page looks depends on a few things:

Who the user is. Whether the app has been updated (i.e. new templates, stylesheets) since the last fetch. Whether the profile has been updated since the last fetch.

The first you know, and it shouldn’t be tough to make the second available to application code; you could have an app-version global, or store it in your database, or just have a file somewhere that gets updated so you can check its timestamp. As for the third, this requires that you have a version number or update-timestamp field associated with the user profile, which you probably already do.

So what you do is turn those three things into a signature (probably by concatenating them and hashing the string) and sending the ETag header along with the profile page.

Then, when the client wants to look at the profile page again, it sends an HTTP header along with the request like so:

If-None-Match: "1cc044-172-3d9aee80"

When you see this, you have a quick glance at the user id, app version, and profile version, recompute the signature, and if it matches, you just send back a 304 Not Modified and your job is done. (the header is called If-None-Match because the client can send a bunch of different ETags along; but I’ve never seen anyone do that).

In many cases, this is going to be a lot less computing than fetching the profile information out of the database tables and re-running the template to create the HTML you were going to send along.

When This Matters · This matters if your Web app is maxed on some combination of CPU and database, and a noticeable proportion of requests don’t really need a page-rebuild, and your existing caching and last-modified setup isn’t getting the job done. This isn’t going to be true of all Web apps, nor even of all Web apps that are suffering from overload. But my feeling, on surveying the landscape, is that there are a lot of apps out there where smart ETagging could cut the CPU load and database traffic down by a few percentage points, and those percentage points are damn precious in a server that’s breathing hard in public.

This is particularly likely to be true if your Web app is written in a language that isn’t the world’s fastest (like Rails), and has an elaborate, complex object-relational mapper (like Rails), and was built in a big hurry to meet a perceived need, without much pre-optimization (like a lot of Rails apps).

I’m impressed by the response.etag and request.fresh? API Ryan Daigle describes; it’s typically-elegant in the Rails style: “Tell us what matters and we’ll do the housekeeping”.

I’m sure that other Web frameworks offer similar tools; perhaps readers might contribute pointers below?

Further Reading · The trade-offs around this, like the trade-offs around everything having to do with Web-app performance, are complicated. I wrote about this subject before (essentially requesting exactly what Edge Rails now has) in On Being for the Web. The comments to that piece are erudite and instructive, and link to lots of valuable primary materials: