How much is Google’s bot detection really worth? An analysis of how well reCAPTCHA detects bots.

Last update: May 20, 2020



Intro

For more than a decade, Google's reCAPTCHA service has offered website owners a trade: let Google track your users and ask them questions, and in exchange they promise to stop bots and spam.



More than 4.5 million websites use reCAPTCHA [1] and the system collects hundreds of millions of daily solves. [2] This equates to more than 100 person-years of labor every day.[ 3]



‍Labeled data is the fuel that powers machine learning. reCAPTCHA answers have been used to improve Google's ML/AI services since 2009. Many people who use reCAPTCHA don't realize their labor is being put to use.



Google is now charging high rates for reCAPTCHA Enterprise, causing many companies to reevaluate this deal. Major companies like Cloudflare have switched to hCaptcha's enterprise solutions in response.



Is this a fair trade?

The hCaptcha team has been collecting pricing data from "dark web" captcha-breaking services since 2016. We noticed an interesting trend:



The average cost of breaking a reCAPTCHA is incredibly low (less than $1 per 1000 solves) and has not materially increased since our monitoring began in 2016. This applies to both reCAPTCHA v2 and v3. [4]



Pricing data lets us put an exact dollar value on the security offered by a reCAPTCHA: $0.001 or less per answer.

‍



What is the value of the 100+ person-years of labor collected by Google daily?

Substantially higher.



The cost to label a single image ranges from $0.03 to $1.00 or more. By any conservative estimate, Google has extracted billions of dollars of free labor to date. [5]

‍



Is there an alternative to this unfair deal?

hCaptcha is the first credible alternative. As millions of monthly users prove their humanity via hCaptcha, their labor enters an anonymous open market for bidding.



Companies all over the world bid for that labor to complete simple tasks that are easy for people but hard for machines.



Websites earn revenue from that work instead of donating it to Google.



Users support the site they are visiting, and experience less web spam.



hCaptcha Enterprise customers benefit from the expertise hCaptcha has gained operating the largest independent challenge provider on the internet, and get a better deal than reCAPTCHA: up to 50-80% better pricing, depending on volume.



The wide and often-changing variety of tasks on hCaptcha are harder to automate. reCAPTCHA's single interface has not changed in many years, and has a small set of infrequently changing questions. This is one of the ways hCaptcha provides more robust anti-bot protection than reCAPTCHA.



hCaptcha uses advanced machine learning to accomplish what reCAPTCHA promises, while sharing the value captured with the websites that use it.

‍



Why hasn't reCAPTCHA improved over time?

There has been a revolution in deep learning since 2012. The industry norm for task accuracy has radically improved over the past decade, primarily due to deep convolutional networks. Other ML-powered services Google offers have followed this curve.

‍

(See footnotes 6 and 7.)



ML has improved, but the difficulty of bypassing reCAPTCHA has not materially changed during this time.

‍



Improving reCAPTCHA creates an intrinsic conflict for Google as an ad vendor.

Every bot identified by reCAPTCHA directly reduces Google's ad revenue.

‍

If Google itself determines that a user seeing an ad or clicking a link was in fact a bot, it cannot charge for ads shown to that user. This conflict of interest has severely limited the scope of Google's anti-bot ambitions.



For example, Google has not offered obviously valuable services like retroactive bot detection. It is much more reliable to declare user traffic bot-generated after analyzing several days of data.



Knowing after the captcha that some users were bots is also valuable. It lets website owners clean out old spam posts and delete "sleeper accounts" registered to spam forums in the future, reduce fraud, and more.



Offering retroactive bot identification would open Google up to thorny questions of how to retroactively refund advertisers who spent money on that fraudulent traffic.



The reCAPTCHA product has thus stagnated for a decade.



Users have suffered, and unnecessary web spam is now rampant. But there is an alternative.





How is hCaptcha different?

hCaptcha is the only major captcha service not owned by an ad network.

hCaptcha's incentive is to maximize accuracy, not ad revenue: bot answers are useless for our customers, who need human training data.



hCaptcha cares about privacy, transparency, and fairness.

Knowing who an individual is has minimal value for hCaptcha. We only care whether they are human or a malicious actor.



hCaptcha shares the value of work done while users prove their humanity.

Instead of offering a service of questionable value while extracting labor with real value, hCaptcha provides both compensation and strong anti-bot protection.



hCaptcha offers a fair deal to both free (hCaptcha Publisher) and enterprise (hCaptcha Enterprise) customers.

Instead of offering an easily defeated service to free users while charging millions of dollars for an enterprise version that is not much better, hCaptcha provides excellent bot protection to both, while offering enterprise users additional features and customization, along with strong SLAs and rapid support.

‍



Who built hCaptcha?

hCaptcha is a service of Intuition Machines (imachines.com), a machine learning company based in San Francisco that offers products and services for companies implementing ML in their business.



The IM team is composed of scientists and engineers who joined us from from Apple, Amazon, Google, Cloudera, and other leaders in machine learning, distributed systems, and security.



IM scientists regularly publish in top conferences like ICML, ICLR, ECCV, and NeurIPS, and the company has publicly released state-of-the-art results in ML disciplines like deep hashing. [8][9]





How is Google using your free labor?

Ever been asked to "click on the car/crosswalk/stoplight?"

You were likely seeing frames of video from cameras on Google's Waymo self-driving cars, and helping to improve their object recognition.



Ever been asked to "type in the building/street sign?"

You were likely seeing pictures of buildings or street signs from Google's Streetview cars, and helping to label locations for their Maps and Earth products.



Ever been asked to "type in the two words?"

You were likely seeing pictures of text takem from book pages on Google's Books service, and helping to improve their OCR (text recognition) software.

‍



What kinds of jobs run on hCaptcha?

hCaptcha supports a wide variety of tasks.



hCaptcha customers use the service for tasks including object recognition, attribute detection (collecting labels to identify types of clothing), relevance ranking (picking the most relevant product from several options when compared to a primary product), bounding boxes (finding areas of interest on an image), "Human OCR" (identifying the text in an image) and many other types of tasks.



Below are two examples of image classification and attribute detection job types. [10]

‍



Who are hCaptcha's paying customers?

hCaptcha customers span a variety of industries, from technology companies to those focused on fashion, retail, and other areas where they can create value from applying machine learning.



hCaptcha is committed to democratizing access to high volume real-time annotation, letting companies big and small improve the quality of their products and services by applying machine learning to their business.



