joshua schachter's blog

on url shorteners

URL shortening services have been around for a number of years. Their original purpose was to prevent cumbersome URLs from getting fragmented by broken email clients that felt the need to wrap everything to an 80 column screen. Addendum: They're useful in print, too. But it's 2009 now, and this problem no longer exists. Instead it's been replaced by the SMS-oriented 140 character constraints of sites like Twitter. (Let's leave aside the fact that any phone that can run a web browser and thus follow links can also run a proper client, and doesn't have to hew to the SMS character limit.) Since TinyURL, there has been a rapid proliferation of shortening services.

Aside from the raw utility of allowing URLs to fit within a Twitter message, newer services add several interesting bits of functionality. The most important of these is that let the linker turn any link into THEIR link, and view metrics on how far it's spread and how many clicks it's gotten. Showing a user how popular his actions are is inevitably addictive. Shorteners are relatively easy and lightweight to set up. Adding a simple interstitial before the redirect provides an obvious way to monetize. And maybe someday all the link data will be worth something.

So there are clear benefits for both the service (low cost of entry, potentially easy profit) and the linker (the quick rush of popularity). But URL shorteners are bad for the rest of us.

The worst problem is that shortening services add another layer of indirection to an already creaky system. A regular hyperlink implicates a browser, its DNS resolver, the publisher's DNS server, and the publisher's website. With a shortening service, you're adding something that acts like a third DNS resolver, except one that is assembled out of unvetted PHP and MySQL, without the benevolent oversight of luminaries like Dan Kaminsky and St. Postel.

There are three other parties in the ecosystem of a link: the publisher (the site the link points to), the transit (places where that shortened link is used, such as Twitter or Typepad), and the clicker (the person who ultimately follows the shortened links). Each is harmed to some extent by URL shortening.

The transit's main problem with these systems is that a link that used to be transparent is now opaque and requires a lookup operation. From my past experience with Delicious, I know that a huge proportion of shortened links are just a disguise for spam, so examining the expanded URL is a necessary step. The transit has to hit every shortened link to get at the underlying link and hope that it doesn't get throttled. It also has to log and store every redirect it ever sees.

The publisher's problems are milder. It's possible that the redirection steps steals search juice — I don't know how search engines handle these kinds of redirects. It certainly makes it harder to track down links to the published site if the publisher ever needs to reach their authors. And the publisher may lose information about the source of its traffic.

But the biggest burden falls on the clicker, the person who follows the links. The extra layer of indirection slows down browsing with additional DNS lookups and server hits. A new and potentially unreliable middleman now sits between the link and its destination. And the long-term archivability of the hyperlink now depends on the health of a third party. The shortener may decide a link is a Terms Of Service violation and delete it. If the shortener accidentally erases a database, forgets to renew its domain, or just disappears, the link will break. If a top-level domain changes its policy on commercial use, the link will break. If the shortener gets hacked, every link becomes a potential phishing attack.

There are usability issues as well. The clicker can't even tell by hovering where a link will take them, which is bad form. Some sites offer link previews, but there's no way to make a preview preference stick globally across the many shortening services. And just like ad networks, link shortening services could track a user's behavior across many domains. That makes the paranoid among us uncomfortable. We hope the shortener never decides to add interstitials or otherwise "monetize" the link with ads, but we have no guarantee.

For these reasons, I feel that shorteners are bad for the ecosystem as a whole. But what can be done to improve the situation?

One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.

You could guarantee that the shortened link is the one that was originally shortened by using a cryptographic hash. But this causes URLs that aren't as short as is possible.

A variety of greasemonkey scripts resolve shortened URLs and replace them inline.

Finally, shortening services could provide archives of their entire database - but this raises all sorts of privacy concerns that I hesitate to even dig into.

The most likely, of course, is that we don't do anything and that the great linkrot apocalypse causes all of modern culture to dissapear in a puff of smoke. Hopefully.

With thanks to Maciej Ceglowski

Updates