Opinions on shortened URLs are a dime a dozen these days, but the basic facts are:

They’re awfully convenient for passing around (and this was true even before Twitter came about) They are, by nature, short-lived (either the services or the URL s) You should never rely on their being around later on

So basically you have absolutely no excuse to not be able to handle them. I decided to mess around with the concept a few weeks back to see how simple I could make it all work, and came up with a couple of useful Python classes that I can share with the world:

Creating short URL s

The trouble with creating short URLs is that there are entirely too many shortening services, and far too many variations on APIs – in fact, nearly all of them suffer from “not invented here” syndrome and try to “enhance” their APIs to give you a lot of stuff that you basically don’t (ever) need, and wrap their results in JSON or XML

Me, I refuse to put up with that kind of crap.

So I poked around a bit, found the simplest services to work against and created the following class, which will try all its known services in turn until it gives you a working URL:

import urllib , urllib2 , urlparse , httplib BITLY_AUTH = 'login=foo&apiKey=bar' class URLShortener : services = { 'api.bit.ly' : "http://api.bit.ly/shorten?version=2.0.1& %s &format=text&longUrl=" % BITLY_AUTH , 'api.tr.im' : '/api/trim_simple?url=' , 'tinyurl.com' : '/api-create.php?url=' , 'is.gd' : '/api.php?longurl=' } def query ( self , url ): for shortener in self . services . keys (): c = httplib . HTTPConnection ( shortener ) c . request ( "GET" , self . services [ shortener ] + urllib . quote ( url )) r = c . getresponse () shorturl = r . read () . strip () if ( "Error" not in shorturl ) and ( "http://" + urlparse . urlparse ( shortener )[ 1 ] in shorturl ): return shorturl else : continue raise IOError

Yes, the error handling is naïve – any network exceptions and stuff ought to be caught upstream from this – but it works fine so far.

Expanding short URL s

This is the really fun bit, because it is not immediately obvious whether or not a short URL will actually be immediately useful – there are plenty of times when you’ll actually be redirected to something else, and while fooling around with the Google Reader API (something I’ll eventually write about alter), I found that also applied (in spades) to Feedburner links and whatnot.

So I decided to build some smarts into the process and have it not only ping some known hosts twice, but also turn it into a link checker of sorts, and learning which hosts were actually redirecting to other places:

import urllib , urllib2 , urlparse , httplib class URLExpander : # known shortening services shorteners = [ 'tr.im' , 'is.gd' , 'tinyurl.com' , 'bit.ly' , 'snipurl.com' , 'cli.gs' , 'feedproxy.google.com' , 'feeds.arstechnica.com' ] twofers = [ u ' \u272A df.ws' ] # learned hosts learned = [] def resolve ( self , url , components ): """ Try to resolve a single URL """ c = httplib . HTTPConnection ( components . netloc ) c . request ( "GET" , components . path ) r = c . getresponse () l = r . getheader ( 'Location' ) if l == None : return url # it might be impossible to resolve, so best leave it as is else : return l def query ( self , url , recurse = True ): """ Resolve a URL """ components = urlparse . urlparse ( url ) # Check weird shortening services first if ( components . netloc in self . twofers ) and recurse : return self . query ( self . resolve ( url , components ), False ) # Check known shortening services first if components . netloc in self . shorteners : return self . resolve ( url , components ) # If we haven't seen this host before, ping it, just in case if components . netloc not in self . learned : ping = self . resolve ( url , components ) if ping != url : self . shorteners . append ( components . netloc ) self . learned . append ( components . netloc ) return ping # The original URL was OK return url

This one’s a bit more convoluted but has turned out to be very useful indeed, and you can simply pickle the whole object to preserve its learned hosts.