One of the problems we use to see frequently on Green Felt happened when we’d update a Javascript API: We’d add some parameters to some library function and then update some other files so that they called the function with the new parameters. But when we’d push the changes to the site we’d end up with few users that somehow had the old version of one of the files stuck in their cache. Now their browser is causing old code to call new code or new code to call old code and the site doesn’t work for them. We’d then have to explain how to reset their cache (and of course every browser has different instructions) and hope that if they didn’t write back that everything went OK.

This fragility annoyed us and so we came up with a solution:

We replaced all of our <script> tags with calls to a custom “script” function in our HTML template system (we use Template::Toolkit; [% script("sht.js") %] is what the new calls look like).

tags with calls to a custom “script” function in our HTML template system (we use Template::Toolkit; is what the new calls look like). The “script” function is a native perl function that does a number of things: Reads the Javascript file into memory. While reading it understands C style “#include” directives so we can structure the code nicely (though we actually don’t take advantage of that yet) Uses JavaScript::Minifier::XS to minify the resulting code. Calculates the SHA hash of the minified code. Saves the minified code to a cache directory where it is named based on its hash value which makes the name globally unique (it also keeps it’s original name as a prefix so debugging is sane). Keeps track of the original script name, the minified script’s globally unique name, and the dependencies used to build the image. This is stored in a hash table and also saved to the disk for future runs. It returns a script tag that refers to the globally unique Javascript file back to the template which ends up going out in the html file. For example, <script src="js/sht-bfe39ec2e457bd091cb6b680873c4a90.js" type="text/javascript"></script>

There’s actually a step 0 in there too. If the original Javascript file name is found in the hash table then it quickly stats its saved dependencies to see if they are newer than the saved minified file. If the minified file is up to date then steps 1 through 5 are skipped.

The advantages of this approach

It solves the original problem.

When the user refreshes the page they will either get the page from their browser cache or they will get it from our site. No matter where it came from the Javascript files it references are now uniquely named so that it is impossible for the files to be out of date from each other.

That is, if you get the old html file you will reference all the old named Javascript files and everything will be mutually consistent (even though it is out of date). If you get the new html file it guarantees you will have to fetch the latest Javascript files because the new html only references the new hashed names that aren’t going to be in your browser cache.

It’s fast.

Everything is cached so it only does the minification and hash calculations once per file. We’re obviously running FastCGI so the in memory cache goes across http requests. More importantly the js/ dir is statically served by the web server so it’s exactly as fast as it was before we did this (since we served the .js files without any preprocessing). All this technique adds is a couple filesystem stats per page load, which isn’t much.

It’s automatic.

There’s no script to remember to run when we update the site. We just push our changes up to the site using our version control and the script lazily takes care of rebuilding any files that may have gone out of date.

So you might be thinking, isn’t all that dependency stuff hard and error prone? Well, it’s really only one line of perl code:

sub max_timestamp(@) { max map { (stat $_)[9] || 0 } @_ } # Obviously 9 is mtime

It’s stateless.

It doesn’t rely on incrementing numbers (“js/v10/script.js” or even “js/script-v10.js”). We considered this approach but decided it was actually harder to implement and had no advantages over the way we chose to do it. This may have been colored by our chosen version control system (darcs) where monotonically increasing version numbers have no meaning.

It allows aggressive caching.

Since the files are named by their contents’ hash, you can set the cache time up to be practically infinite.

It’s very simple to understand.

It took less than a page of perl code to implement the whole thing and it worked the first time with no bugs. I believe it’s taken me longer to write this blog post than it took to write the code (granted I’d been thinking about it for a long time before I started coding).

No files are deleted.

The old js files are not automatically deleted (why bother, they are tiny) so people with extremely old html files will not have inconsistent pages when they reload. However:

The js/ dir is volatile.

It’s written so we can rm js/* at any point and it will just recreate what it needs to on the next request. This means there’s nothing to do when you unpack the source into a new directory while developing.

You get a bit of history.

Do a quick ls -lrt of the directory and you can see which scripts have been updated recently and in what order they got built.

What it doesn’t solve

While it does solve the problem of Javascript to Javascript API interaction, it does not help with Javascript to server API interaction–it doesn’t even attempt to solve that issue. The only way I know to solve that is to carefully craft the new APIs in parallel with the old ones so that there is a period of time where both old and new can work while the browser caches slowly catch up with your new world.

And… It seems to work

I’ve seen similar schemes discussed but I’ve not seen exactly what we ended up with. It’s been working well for us–I don’t think I’ve seen a single bug from a user in a couple months that is caused by inconsistent caching of Javascript files by the browser.

9 Comments »