Google revamped its reCAPTCHA system, used to block automated scripts from abusing its online services, just hours before a trio of hackers unveiled a free system that defeats the widely used challenge-response tests with more than 99 percent accuracy.

Stiltwalker, as the trio dubbed its proof-of-concept attack, exploits weaknesses in the audio version of reCAPTCHA, which is used by Google, Facebook, Craigslist and some 200,000 other websites to confirm that humans and not scam-bots are creating online accounts. While previous hacks have also used computers to crack the Google-owned CAPTCHA (short for Completely Automated Public Turing test to tell Computers and Humans Apart) system, none have achieved Stiltwalker's impressive success rate.

"The primary thing which makes Stiltwalker stand apart is the accuracy," wrote Adam, one of the three hackers who devised the attack, in an e-mail. "According to the lead researcher from the Carnegie Mellon study, the system we attacked was believed to be 'secure against automatic attack,'" he added, referring to this resume from a Carnegie Mellon University computer scientist credited with designing the audio CAPTCHA.

Stiltwalker's success exploits some oversights made by the designers of reCAPTCHA's audio version, combined with some clever engineering by the hackers who set out to capitalize on those mistakes. The audio test, which is aimed at visually impaired people who have trouble recognizing obfuscated text, broadcasts six words over a user's computer speaker. To thwart word-recognition systems, reCAPTCHA masks the words with recordings of static-laden radio broadcasts, played backwards, so the background noise would distract computers but not humans.

What the hackers—identified only as C-P, Adam, and Jeffball—learned from analyzing the sound prints of each test was that the background noise, in sharp contrast to the six words, didn't include sounds that registered at higher frequencies. By plotting the frequencies of each audio test on a spectrogram, the hackers could easily isolate each word by locating the regions where high pitches were mapped. reCAPTCHA was also undermined by its use of just 58 unique words. Although the inflections, pronunciations, and sequences of spoken words varied significantly from test to test, the small corpus of words greatly reduced the work it took a computer to recognize each utterance.

Enter the neural network

With the sounds isolated, the hackers then funneled each word into a battery of mathematical solvers to translate the characteristics of each isolated word into text that would solve the CAPTCHA puzzle. An early version of the attack worked by using the open-source pHash software library to generate a "perceptual hash" of each sound. Unlike cryptographic hashes, which typically produce vastly different ciphertext when even tiny changes are made to the plaintext input, pHash outputs vary minimally when generated by similar-sounding words. By comparing the perceptual hashes of the collected sounds to a table of hashes, the team could make educated guesses about which words were being included in the audio tests. But they ultimately scrapped the technique because its level of accuracy didn't break 30 percent.

The hackers eventually devised a machine-learning algorithm that produced significantly better results. Their neural network was seeded with data from 50,000 reCAPTCHA utterances along with human-generated input for each corresponding word. They then combined the tool with a separate attack that exploited another weakness they discovered in the audio version—namely its habit of repeating the same challenges verbatim in pseudo-random fashion. By using cryptographic hashes to fingerprint 15 million of the estimated 25 million challenges in reCAPTCHA's repertoire, their attack was able to crack most of the tests.

"The majority of the time, we can look at the challenge and not do any computation at all," Adam said. "It takes less than a second to get an answer with the MD5 solver."

Their attack became all the more effective after discovering that Google's audio CAPTCHA accepted multiple spellings for many of the challenges based on the approximate phonetic sounds of each word. As a result, an audio test that included the word "boat" could be solved by entering "boat," but it could also be solved by entering "poate." Similarly, a test that included the word "plate" could be solved by entering "plate," but it too could also be solved by entering "poate." Tests for words that included "Friday," "fairy," or "four" were also solved by entering "Friay." By fashioning the same alternate spelling for a variety of different sounding words, the hackers could pare back the number of guesses required to solve a specific puzzle, a technique crackers call "reducing the keyspace."

In the end, the hackers said their computer-generated attack solved 17,338 out of 17,495 challenges they attempted, a success rate of 99.1 percent. At one point, the attack was able to deduce answers to 847 tests in a row before being tripped up. More details of the hack are here.

The Googleplex strikes back

About two hours before the hackers were scheduled to present the attack on Saturday at the Layer One security conference, Google engineers revamped reCAPTCHA. Suddenly, Stiltwalker, which the hackers had carefully kept under wraps, no longer worked. Adam told me that he has no proof anyone tipped off Google employees—but he doubts the timing was coincidence.

The updated reCAPTCHA system uses a human voice uttering unintelligible sounds as background noise, making it impossible for Stiltwalker to isolate the distinct words included in each audio challenge. The puzzles have also been expanded from six words to ten words and each challenge lasts 30 seconds, compared with only eight seconds under the previous reCAPTCHA.

A Google spokesman declined to offer specifics of the reCAPTCHA upgrade beyond issuing a statement.

"We took swift action to fix a vulnerability that affected reCAPTCHA," it said, "and we aren’t aware of any abuse that used the techniques discovered. We're continuing to study the vulnerability to prevent similar issues in the future. We've found reCAPTCHA to be far more resilient than other options while also striking a good balance with human usability. Even so, it's good to bear in mind that while CAPTCHAs remain a powerful and effective tool for fighting abuse, they are best used in combination with other security technologies."

While the changes stymied the Stiltwalker attack, Adam said his own experience using the new audio tests leaves him unconvinced that they are a true improvement over the old system.

"I could only get about one of three right," he said. "Their Turing test isn't all that effective if it thinks I'm a robot."

Story updated to correct use of the word histogram.