It's About The Hashbangs

28 MAY

Before I get started here’s the disclaimer: The opinions expressed in this rant are my own personal opinions on web development and do not represent the views of my employer, its engineering organisation or any other employees.

A few months back there was a flurry of blog posts and conversations over Twitter both for and against the now fairly common practice of using hashbang urls (example) and JavaScript routing in favor of traditional URLs and full page loads. There is also growing interest around several JavaScript MVC frameworks that are make heavy use of this technique. Since people started doing this kind of thing I’ve been pretty squeamish about the idea. At the time that this discussion erupted across the web I really wanted to comment on it but until recently, although I was almost certain that hashbang URLs were destructive, I found myself unable to put in definite terms why.

As you probably know if you’ve been reading this blog for a while, I have for a long time been an avid proponent of progressive enhancement and as many people correctly pointed out many of the arguments against hashbang URLs seemed to fold this philosophy in which clouded the issue quite a lot. In a well reasoned post, my colleague, Ben Cherry pointed this out and expressed that it wasn’t really hashbangs that were the problem and that they were merely a temporary work around until we get pushState support. As he put it, “It’s Not About The Hashbang”.

After quite a lot of thought and some attention to some of the issues that surround web apps that use hashbang URLs I’ve come to conclusion that it most definitely is about the hashbangs. This technique, on its own, is destructive to the web. The implementation is inappropriate, even as a temporary measure or as a downgrade experience.

Let me explain.

URLs are important. The reason the web is so powerful is that it is a web of information. Any piece of content can reference any other piece of content directly. Our information is no longer siloed into various disconnected libraries, now all our data is linked together. The web is much better at doing this than as a platform for delivering applications but yeah, that’s a whole other blog post. The means by which one piece of data is linked with another piece of data is via a URL. That makes the URL possibly the most important part of the web. If you are working on a web app I assume you value its content. If you value the content that a web app holds then you need to value it’s URLs even more. Directly addressable content is what makes web apps better than desktop apps. It’s certainly not the UIs.

URLs are forever. The web has a pretty long memory. Techniques and technology may change but content published to the web gets indexed, archived and otherwise preserved as do the URLs that they link to. There’s no such thing as a temporary fix when it comes to URLs. If you introduce a change to your URL scheme you are stuck with it for the forseeable future. You may internally change your links to fit your new URL scheme but you have no control over the rest of the web that links to your content.

Cool URLs don’t change. For this and other reasons, Tim Berners-Lee wrote a classic article, Cool URLs don’t change in which he explains how to make future proof URLs and why that is important. If you change your URLs you sever links with from the rest of the web. You’ve just turned your web app into a data silo. Your content has just become a lot less useful. However, as much as we try it’s pretty impossible not to introduce change from time to time, sometimes data does need to be deleted, sometimes you need to move to a new domain name, sometimes you just need to reorganise.

Luckily, HTTP gives us the tools to handle this gracefully. If content is deleted we can tell the web it’s no longer there with a 410 (thanks Nick!), if it’s moved to a different place on the web we can tell the world its new location with a 301 or a 302. HTTP gives us the ability to manage change. Further to that, it’s years old, fairly well specified and most importantly understood by not just browsers but all devices that can access the web including search engines and other spiders.

Going under the radar. So, you’ve implemented hashbang URLs. This means that the part of the URL after the #, the identifies the specific content, is not even sent in the HTTP request. It’s completely invisible to your server. As far as your server is concerned it’s receiving requests for the root document and sending it with a 200 success code no matter what. It no longer has the ability to determine if the URL has moved to a different location or even if the content being requested exists at all. This entire job is left up to some JavaScript that happens to be running on that page. Sure, your javascript can examine the hash portion of the URL, show the relevant content or if it’s missing show a ‘Content not found’ message. It can even redirect to different locations internal and external to the web app.

The important difference is that this is entirely opaque to anything that hasn’t got a JavaScript runtime and a document object model. Spiders and search indexers can and do sometimes implement JavaScript runtimes. However, even in this case there’s no well recognised way to say ‘this is a redirect’ or ‘this content is not found’ in a way that non-humans will understand. You’ve just rendered your content invisible to everything apart from people running certain browsers. The hashbang itself is an attempt to address this by Google but it’s quite a painful thing to implement and why get yourself into a situation where you are creating a fix for something you just broke. Just don’t break it in the first place.

Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.

It’s not all doom and gloom. For the web apps that have made the jump already it’s too late but I urge you to think really hard about making the jump to hashbang URLs when creating new content or considering a switch from traditional URLs. There is a path forward in the not too distant future. pushState is coming to browsers at quite a rate and, as Kyle Neath said to me in a bar last week, is probably the most important innovation in web development since Firebug. You can implement, as Github have done, pushState for browsers that support it but by all means fall back to traditional URLs rather than hashbang URLs. Even if some users are getting hashbang URLs they will be publishing content linking to them, tweeting them and bookmarking them and you’ll be stuck with supporting them all the same.

Can we all agree to let it go the way of flash intros, please?