CAPTCHAs, or Completely Automated Public Turing tests to tell Computers and Humans Apart, are everywhere on the web. They're meant to protect sites from unwelcome automated processes, such as the actions of a malicious botnet, for example. CAPTCHAs do this by making the user do something that humans are supposed to be good at but computers find more difficult.

Although CAPTCHAs were originally added to many sites in the name of security, this technology may actually make your web application less secure and bring unseen complications to your site’s privacy policy. They also have the potential to drive customers away.

Developers should not assume that CAPTCHAs must be a part of each registration system. Even the most popular CAPTCHA utilities have flaws hiding within. They also hurt profits when usability issues cause users to abandon transactions. There are security and economic risks of CAPTCHAs, so here is some advice on using better methods to block unwelcome automated processes.

Understand how CAPTCHAs started

CAPTCHA technology has its roots in a 2000 meeting between Yahoo and Carnegie Mellon University researcher Luis von Ahn and his advisor, Manuel Blum. Yahoo Mail had a problem: iI was too easy for spammers to automate the registration of Yahoo mail. The solution, according to von Ahn’s 2005 PhD thesis on the subject, was “automated tests that humans can pass but computer programs cannot.”

Initially, the CAPTCHA tests slowed some people down. Even today they do so—a 2010 study of the most popular CAPTCHA schemas led by Stanford researcher Elie Bursztein found that solving these tests was not as simple as you'd think for many users, especially non-English speakers. Humans in the study agreed only 71 percent of the time on the meaning of 5,000 CAPTCHAs.

The technology continued to evolve throughout the early 2000s. Google's reCAPTCHA improved the experience with less frustrating tests that used more recognizable images. Even the digitized archives of The New York Times and Google Books, along with Google Street View addresses, were cleaned up by presenting images from those sources to users in reCaptcha form.

reCAPTCHA's well-known flaws and the Mechanical Turk attack

CAPTCHAs were introduced to improve the overall security of web applications such as Yahoo Mail, but through the years their implementations have introduced additional security concerns. As of May 2016, Mitre’s Common Vulnerabilities and Exposures database was tracking 38 vulnerabilities discovered between 2005 and 2015, mostly with reCAPTCHA-era forms. If these flaws go unpatched, attackers can exploit cross-site scripting and SQL injection, arbitrarily read files, or simply bypass the protection CAPTCHA is supposed to provide completely.

One measure of the effectiveness of any cipher is the cost to crack it. Initially, it can be expressed in the hardware costs required to mount a brute-force attack. Services such as Amazon's Mechanical Turk change that cost equation, enabling the automation of tasks humans are good at, very inexpensively. The process has even been commoditized, with CAPTCHA decoding available as a service via APIs in many popular web application languages.

Google's attempts to secure reCAPTCHA and the NoCAPTCHA hack

To combat a lot of the security and usability issues of reCAPTCHA, Google introduced the NoCAPTCHA version of reCAPTCHA, which solves many of the user frustration issues with a simple checkbox proclaiming “I am not a robot.” Behind the scenes, NoCAPTCHA uses a highly obfuscated system of checks that includes downloading a JavaScript VM to perform further local browser checks.

But even this highly intelligent, user-friendly service has its drawbacks.

Embedding any CAPTCHA service essentially forces your users to accept the CAPTCHA provider's privacy policy before they can use your application. The CAPTCHA provider doesn't need to disclose how information is gathered by the service or how it is used. Google’s NoCAPTCHA does link to a privacy statement and terms of service in the widget with good descriptions of what is being gathered, but you will find only high-level descriptions of how it is used.

Even that obfuscated system was no match for hackers. Not long after its release, a paper was presented at Black Hat Asia detailing methods for automating the bypass of protections with 99.1 percent accuracy. Google has since mitigated reCAPTCHA's vulnerability to the large-scale token-harvesting attack that was used in the paper, but the cat-and-mouse game continues.

As recently as July 2016, a team of researchers at Columbia University published a paper detailing how to break semantic image CAPTCHAs just like Google's with an accuracy of 70% using a machine learning algorithm.

CAPTCHAs' effect on profits

Web developer Casey Henry discovered that CAPTCHAs were driving down the conversion rate of visitors to registered users by 3.2 percent in the 50 sites he manages. After he disabled the CAPTCHAs, there was a 4.2 percent increase in automated registrations, but he got the 3.2 percent of real users back. He decided that it made more sense just to build tools to sort out the spam rather than keep the CAPTCHAs. Video slideshow creator Animoto made a similar decision after it saw its conversion rate go up by a third when it eliminated CAPTCHAs.

The value in simplifying forms is hard to overstate. CAPTCHAs' potential to hurt accessibility (and lose those customers) shouldn't be forgotten either. Online travel site Expedia saw immediate results after removing an optional field that was confusing customers. That simple change added $12 million in profit by retaining customers who would otherwise abandoned transitions in frustration.

If you choose to use CAPTCHAs

Implement them in a way that allows them to be turned on and off. Leave them off until they become necessary.



Use tests that are more interesting or fun for users to solve. For example, services like PlayCaptcha, from Future Ad Labs, gives users a simple, interactive task that ensures that the user is human, and it does a bit of advertising in the process.

4 alternatives to CAPTCHAs

CAPTCHAs are not the only way to stop bots and other automated processes. Here are a few other methods that you should explore:

Allow your users the option to identify themselves with an account they already have. Services such as Google, Yahoo and Facebook offer recognizable identity as a service. OpenID offers even more flexibility but may not be as well known to your users. If you use this method, you should still maintain a locally hosted registration form for users with privacy concerns.

Use AJAX (unobtrusive JavaScript) to replace web forms. Real users won't even notice. After loading the DOM object, make an AJAX call to fetch the real form. The hard-coded form replaced by the AJAX call also serves as a honeypot for bots that can be used to blacklist abusive IP addresses.

Move the burden of sniffing out abusers from your users to the back end. Track source IP addresses either in real time or in logs to automate the throttling or blocking of abusers. The X-Forwarded-For HTTP header can help prevent misinterpreting proxied traffic.

The simplest solution is a honeypot. It's an extra field with information you don’t need, but automated processes will dutifully fill it in anyway. Hide it from normal users via the CSS property visibility:hidden and reject any submissions that fill in this field. They are obviously from a bot.

Whatever decision you make for your site, remember that CAPTCHAs are not a firewall to be enabled by default for all applications. The technology has drawbacks and should not be enabled without good reason.

Image credit: Flickr

Keep learning