1. Overview

In this article we're going to show how to unshorten an URLs using HttpClient.

A simple example is when the original URL has been shortened once – by a service such as bit.ly.

A more complex example is when the URL has been shortened multiple times, by different such services, and it takes multiple passes to get to the original full URL.

If you want to dig deeper and learn other cool things you can do with the HttpClient – head on over to the main HttpClient tutorial.

2. Unshorten the URL Once

Let's start simple – unshorten an URL that has only been passed through a shorten URL service once.

First thing we'll need is an http client that doesn't automatically follow redirects:

CloseableHttpClient client = HttpClientBuilder.create().disableRedirectHandling().build();

This is necessary because we'll need to manually intercept the redirect response and extract information out of it.

We start by sending a request to the shortened URL – the response we get back will be a 301 Moved Permanently.

Then, we need to extract the Location header pointing to the next, and in this case – final URL:

public String expandSingleLevel(String url) throws IOException { HttpHead request = null; try { request = new HttpHead(url); HttpResponse httpResponse = client.execute(request); int statusCode = httpResponse.getStatusLine().getStatusCode(); if (statusCode != 301 && statusCode != 302) { return url; } Header[] headers = httpResponse.getHeaders(HttpHeaders.LOCATION); Preconditions.checkState(headers.length == 1); String newUrl = headers[0].getValue(); return newUrl; } catch (IllegalArgumentException uriEx) { return url; } finally { if (request != null) { request.releaseConnection(); } } }

Finally, a simple live test expanding an URL:

@Test public void givenShortenedOnce_whenUrlIsUnshortened_thenCorrectResult() throws IOException { String expectedResult = "/rest-versioning"; String actualResult = expandSingleLevel("http://bit.ly/13jEoS1"); assertThat(actualResult, equalTo(expectedResult)); }

3. Process Multiple URL Levels

The problem with short URLs is that they may be shortened multiple times, by altogether different services. Expanding such an URL will need multiple passes to get to the original URL.

We're going to apply the expandSingleLevel primitive operation defined previously to simply iterate through all the intermediary URL and get to the final target:

public String expand(String urlArg) throws IOException { String originalUrl = urlArg; String newUrl = expandSingleLevel(originalUrl); while (!originalUrl.equals(newUrl)) { originalUrl = newUrl; newUrl = expandSingleLevel(originalUrl); } return newUrl; }

Now, with the new mechanism of expanding multiple levels of URLs, let's define a test and put this to work:

@Test public void givenShortenedMultiple_whenUrlIsUnshortened_thenCorrectResult() throws IOException { String expectedResult = "/rest-versioning"; String actualResult = expand("http://t.co/e4rDDbnzmk"); assertThat(actualResult, equalTo(expectedResult)); }

This time, the short URL – http://t.co/e4rDDbnzmk – which is actually shortened twice – once via bit.ly and a second time via the t.co service – is correctly expanded to the original URL.

4. Detect on Redirect Loops

Finally, some URLs cannot be expanded because they form a redirect loop. This type of problem would be detected by the HttpClient, but since we turned off the automatic follow of redirects, it no longer does.

The final step in the URL expansion mechanism is going to be detecting the redirect loops and failing fast in case such a loop occurs.

For this to be effective, we need some additional information out of the expandSingleLevel method we defined earlier – mainly, we need to also return the status code of the response along with the URL.

Since java doesn't support multiple return values, we're going to wrap the information in a org.apache.commons.lang3.tuple.Pair object – the new signature of the method will now be:

public Pair<Integer, String> expandSingleLevelSafe(String url) throws IOException {

And finally, let's include the redirect cycle detection in the main expand mechanism:

public String expandSafe(String urlArg) throws IOException { String originalUrl = urlArg; String newUrl = expandSingleLevelSafe(originalUrl).getRight(); List<String> alreadyVisited = Lists.newArrayList(originalUrl, newUrl); while (!originalUrl.equals(newUrl)) { originalUrl = newUrl; Pair<Integer, String> statusAndUrl = expandSingleLevelSafe(originalUrl); newUrl = statusAndUrl.getRight(); boolean isRedirect = statusAndUrl.getLeft() == 301 || statusAndUrl.getLeft() == 302; if (isRedirect && alreadyVisited.contains(newUrl)) { throw new IllegalStateException("Likely a redirect loop"); } alreadyVisited.add(newUrl); } return newUrl; }

And that's it – the expandSafe mechanism is able to unshorten URL going through an arbitrary number of URL shortening services, while correctly failing fast on redirect loops.

5. Conclusion

This tutorial discussed how to expand short URLs in java – using the Apache HttpClient.

We started with a simple usecase with an URL that is only shortened once, and then implemented a more generic mechanism, capable of handling multiple levels of redirects and detect redirect loops in the process.

The implementation of these examples can be found in the github project – this is an Eclipse based project, so it should be easy to import and run as it is.