CAPTCHAs, typically in the form of an image comprised of a distorted string of random characters, are a fact of life on the Internet. They're typically used to prevent spammers from setting up e-mail accounts or posting links in forums and blog comments and, as such, have set off a bit of an arms race between service providers and spammers. There have been several notable instances of CAPTCHAs being cracked by automated systems, and academic researchers have been exploring further vulnerabilities and developing alternate systems. The latest entry in this arms race comes from researchers at Google, who have taken a new approach to foiling automated cracking systems: force them to attempt to orient specific images.

Google has some obvious reasons for being interested in improved CAPTCHAs, as the company provides both e-mail and blogging services that rely on them for security. And, given that the text CAPTCHAs appear to be getting to the point where it's harder for human users to accurately use them, a new approach seems to be due. CAPTCHAs are based on the fact that humans tend to be better at image recognition problems than automated systems, so the Google researchers were apparently able to come up with the new technique simply by looking into areas that computer scientists had identified as being problematic for computer-based solutions.

They apparently came up with image orientation. Humans can apparently properly orient a variety of images so that the vertical axis matches the real-world orientation of the photograph's subject; computers can only handle a subset of these. The Google researchers describe how they used this to implement a system appropriate for end-users in a paper they will present at the WWW 2009 meeting this week.

The basic idea behind their scheme is that any functional system will first have to eliminate any images that an automated system is likely to handle properly, as well as any that are difficult for humans to orient. So, for example, computers are good at recognizing things like faces in group shots, as well as horizons in landscape scenes, both of which provide sufficient information to orient the image. In other cases, the image doesn't have enough information for either humans or computers to properly sort things out—the paper uses the example of a guitar on a featureless background, which could be oriented horizontally, vertically, or in the angled position from which it's typically played.

The authors started with a database of images obtained from the Web based on a collection of common user queries. Orientation was inferred from the original picture, but this information was removed by applying a circular mask and then rotating the image a random number of degrees. They then filtered out those that could be properly oriented by computers using a set of 180 automated image classifiers developed using AdaBoost. They also eliminated images where these 180 evaluators produced widely divergent guesses, since this was an indication that the orientation was difficult to ascertain.

With that done, they used the remaining images to test humans, with a simple interface built with JavaScript and DHTML. For humans, they found that using a combination of three images and allowing an error of up to eight degrees in either direction produced an 84 percent success rate; random guessing would only accurately succeed 0.009 percent of the time. The tests also identified a number of outliers, such as images that were difficult for humans to orient (the guesses varied widely) or where the subject had been photographed at an unusual angle (the guesses were consistent, but different from the photo's original orientation).

In general, users also found the orientation of the images to be preferable to using text-based CAPTCHAs, with the exception of one person who apparently had a form of OCD that compelled her to tweak each image's orientation to perfection (my speculation, not the authors').

The authors note that the approach leaves plenty of room for further improvement, as the images could be obtained specifically from sources that computers are known to handle poorly, like cartoon content. They also suggest that, should computerized cracking programs ever catch up here, it should be possible to extend the technique into the third dimension using 3D models.