Machines masquerading as humans have never had it so easy. New artificial intelligence can crack a range of CAPTCHAs — tests commonly used by websites to check if a user is a person or robot — with very little practice.

Inspired by the human brain, a team from an artificial intelligence company in California has developed an algorithm that can be trained up to break complex text-based CAPTCHAs, including those used by PayPal and Google.

And it can do so with 50,000 times fewer training examples than current state-of-the-art programs.

"We just have to accept that, as computer vision systems improve, traditional text-based CAPTCHA systems no longer offer the protection they used to," said study co-author Miguel Lazaro Gredilla of Vicarious AI.

Online arms race

The internet is awash with automated programs called bots. Last year, online bot traffic surpassed that of humans.

Some bots are helpful. Search engines employ bots to trawl websites and update links.

Others, not so much. These include spambots, which create email accounts and send spam emails to addresses they've found online, and social bots, which harvest personal information from social media and share malicious links.

To be (human) or not to be: CAPTCHAs such as this are easily solved by machines. ( Getty Images: Claudio Divizia )

To keep the bots at bay, some websites incorporate a step to prove you are, indeed, a real person. And in 2000, the 'Completely Automated Public Turing test to tell Computers and Humans Apart', or CAPTCHA, was born.

Humans are naturally good at teasing out digits, letters and symbols, even when they're warped and jumbled together.

But the advent of machine learning and increased computational power meant those distorted strings of text were quickly solved by machines and those early CAPTCHAs rendered useless.

In response, CAPTCHAs became more complex and trickier for the bots — but also for real people. For example, humans solve Google's complicated reCAPTCHAs, where letters are mashed together or intersected by lines, only 87 per cent of the time.

And now the machines are catching up, commented Adnene Guabtni, a cybersecurity researcher at CSIRO's Data61.

"If something is hard for humans, it's almost impossible for machines. At least, that's what we thought."

Layers of learning

Many current CAPTCHA-cracking algorithms are neural networks — a type of artificial intelligence modelled loosely on the brain.

Neural networks contain tens or hundreds of connected layers of artificial "neurons". When an image is fed into the network it gets converted into data, which cascades through the network.

But neural networks typically need millions of training images, labelled with the correct answer, to be consistently accurate.

The researchers at Vicarious AI, led by Dileep George, decided to steer away from neural networks. They still looked to neuroscience for inspiration, but created a computer model that didn't need so much training.

And although they announced in 2013 that they could defeat CAPTCHAs, it was only today that the team published their results — in the journal Science.

A visualisation of the new algorithm analysing the letter 'A' ( Supplied: Vicarious AI )

At the heart of their method is a focus on "contour continuity": the way the human brain can distinguish the edges of an object, even if that object is partially blocked by another.

The "recursive cortical network" they developed is simpler than a neural network, containing only a few processing layers. The first step analyses pixels to determine if they form part of an object's edge.

These data are processed by the rest of the network and, as the data moves through the layers, the image's contours coalesce. The final layer produces the object in question, such as the letter A.

And crucially, the new algorithm works even when letters in the CAPTCHA overlap.

Practice makes perfect

A CAPTCHA is considered useless if it can be cracked by a machine more than 1 per cent of the time.

George and colleagues found their new AI was able to beat Google's reCAPTCHA 67 per cent of the time, after having just five training examples per character.

By comparison, some state-of-the-art neural networks needed 50,000 times more training examples to do the same job.

The new algorithm also solved Yahoo and PayPal CAPTCHAs 57.4 per cent and 57.1 per cent of the time, respectively.

Next line of defence

This achievement is obviously a concern for cybersecurity, but all is not lost. More advanced CAPTCHAs are already being used, commented Dr Guabtni.

For example, Google now has an "invisible CAPTCHA", which requires a user to click a button.

What seems like a simple task is underlined by a whole range of clues that tell a website if the clicker is a bot or person.

"From the moment the webpage loads to the click, you're doing things," Dr Guabtni said.

"If the button is to the left of your mouse, you move the cursor. But when you do that, you will wobble a little. That is detected.

"It knows you're human because you have imperfections. A machine would try to go in a straight line."

The website might also examine your browsing history and past interactions to support your case (or not).

But, Dr Guabtni said, it's only a matter of time before machines crack the invisible CAPTCHA as well.

"I think something hardware-based, like a fingerprint reader, is the best way [to prove you're a person]."

Beyond CAPTCHA

Because the recursive cortical network learns so well with relatively little practice, it will be useful in situations where training images are limited or expensive to collect, according to Sarah Erfani, who specialises in machine learning and data privacy at the University of Melbourne.

"It could be used in medical imaging, where you only have a few samples [of a disease or disorder]," she said.

"From a computer science perspective, it's wonderful."

For researchers at Vicarious AI, the focus is on applying their now patented algorithms to robotic vision.

"In robotics, we need to manipulate objects that are subject to continuous change, in environments that might not be completely controlled," said co-author of the new Science paper, Dr Lazaro Gredilla.

"We hope that [the recursive cortical network's] ability to learn from very few examples will ... enable robots that learn and adapt faster to changing tasks and environments."