This Q&A is part of a weekly series of posts highlighting common questions encountered by technophiles and answered by users at Stack Exchange, a free, community-powered network of 100+ Q&A sites.

totymedli asked:

I have a website e-mail form. I use a custom CAPTCHA to prevent spam from robots. Despite this, I still get spam.

Why? How do robots beat the CAPTCHA? Do they use some kind of advanced OCR or just get the solution from where it is stored?

How can I prevent this? Should I change to another type of CAPTCHA?

See the original question here.

What’s in a CAPTCHA?

MainMa answers (52 votes):

Two of the easiest ways to get through CAPTCHA:

Use human farms, i.e. ask for people to fill CAPTCHAs for money, just like ProTypers does.

Use an OCR.

There may also be a bug either in the CAPTCHA mechanism itself or the surrounding application, allowing someone to bypass the CAPTCHA.

By the way, the W3C article Inaccessibility of CAPTCHA : Alternatives to Visual Turing Tests on the Web explains as well how CAPTCHAs could be compromised:

One of the first documented attacks on the system was by a Carnegie Mellon student, who associated CAPTCHA images with access to an adult Web site, thus gaining free human labor to crack the authentication.

External projects [...] have shown methodologies and results indicating that many of the systems can be defeated by computers with between 88% and 100% accuracy, using optical character recognition.

So how can you prevent those attacks?

If you have your custom implemented CAPTCHA, you may try to move to a popular one, like reCAPTCHA. This will help if either your own CAPTCHA was too easy to OCR, or if there was a bug which was successfully exploited.

If you use a popular CAPTCHA mechanism, moving to a custom-made one or to another popular one might prevent OCR.

Technically, nothing would prevent human farms: you may create animated GIFs where several frames display different text very quickly and only one frame is actually visible by the user, you may distort or bend text in all directions, or you may find alternative ways to prevent OCRs from recognizing text, but humans paid for solving CAPTCHAs will still successfully solve them.

You may want to move from visual CAPTCHA to sound (if you're not using both already, and you should), but this means that users with hearing impairment would be unable to use your application.

FrustratedWithFormsDesigner and GalacticCowboy mentioned in the comments domain-specific CAPTCHAs. I tried to find some material about how effective those are, but without success, so here is just my personal opinion:

Domain-specific CAPTCHAs can be hugely annoying when actual users have no idea about the answer.Example: I'm visiting a page on a movies-oriented website. I notice a mistake in an article and want to comment on it to notify the author about the mistake. The comments form asks me, as a CAPTCHAs mechanism, to provide the name of the actress displayed on a photo. I have no idea who is this actress, so the only thing I can do is to leave the website (or spend the next two minutes using Google Images).Another example: a website asks to give a synonym of "mysterious." Easy as it sounds for a non-impaired person who speaks English fluently, it would be impossible to solve without external help for people who don't speak English well or people with some developmental disabilities, not counting the fact that finding synonyms or antonyms is always tricky. Most of those domain-specific problems can be solved programmatically. Both examples I gave are easily solved using external resources (Google Images and Synonyms dictionary). The one about transistors given as an example by FrustratedWithFormsDesigner is better, but still may be probably solved with a custom-made bot. None resist human farms. Either they generate data, just like ordinary text CAPTCHAs draw distorted characters, in which case the generation algorithm can be itself exploited to tune the bots, or they find data somewhere, just like reCAPTCHA takes text from scanned books, in which case the bot can use this data against it (for example, if you take words from a dictionary, asking the user to provide synonyms, the bot can use the very same dictionary to have a 100 percent success).

Related: "How to implement a system to hide spammy user-generated content?"

The plight of the spam fighter

Morons answers (27 votes):

Adding to MainMa's answer...

Spammers trick others into doing the CAPTCHA for them

Basically, spammers set up a wearz site or a porn site that appears to have a CAPTCHA on it, but it's not a real CAPTCHA. A bot pulls the CAPTCHA from the site they want to spam (or otherwise exploit), and then displays it on the Wearz site or a porn site where someone completes it for them. Then the CAPTCHA value is passed back to their bot...

A bit more on Spammers

I use reCAPTCHA, and I've found that it's basically worthless. I also use a custom spam filter that catches the spam that got past reCAPTCHA, and I need to review it every few days for false positives.

My forum is also all custom-written and it gets very little traffic. I don't believe anyone coded a specific attack to my site. Still, my spam filter catches 2,000 spam messages a day! None are ever displayed on the site. Spammers get no benefit from spamming me, yet they still do.

I can see patterns in the spamming attempts because I log it all. I can tell you this: putting aside how they get past the CAPTCHA, spammers are clearly using a brute force technique varying the fields that are filled out and the kind of data and word mixes that populate those fields. Apparently they do this so cheaply (including bypassing the CAPTCHA) that it doesn't even pay to do an analysis of the individual sites to see of if what they are doing is or isn't working.

Year after year, they continue targeting my site with thousands of spam messages a day only to get one through every month, and that one gets manually deleted a day later. It's that cheap to spam!

This is going to be a battle for years to come. Particularly for small one-man moderator sites like mine.

Find more answers or leave your own at the original post. See more Q&A like this at Programmers, a site for conceptual programming questions at Stack Exchange. And if you've got your own programming problem that requires a solution, login to Programmers and ask a question (it's free).

