Headlines recently exploded with news that a computer program called Eugene Goostman had become the first to pass the Turing test, a method devised by computing pioneer Alan Turing to objectively prove a computer can think.

The program fooled 33% of 30 judges into thinking it was a 13-year-old Ukrainian boy in a five-minute conversation. How impressive is the result? In a very brief encounter, judges interacted with a program that could be forgiven for not knowing much or speaking very eloquently—in the grand scheme, it’s a fairly low bar.

Chat programs like Eugene Goostman have existed since the 1970s. Though they have advanced over the years, none yet represents the revolutionary step in AI implied by the Turing test. So, if the Eugene Goostman program isn’t exemplary of a radical leap forward, what would constitute such a leap, and how will we know when it happens?

To explore that question, it’s worth looking at what the Turing actually is and what it’s meant to measure.

In a 1950 paper, “Computing Machinery and Intelligence,” Alan Turing set out to discover how we might answer the question, “Can machines think?” Turing believed the answer would devolve into a semantic debate over the definitions of the words “machine” and “think.” He suggested what he hoped was a more objective test to replace the question.

Turing called it the imitation test. The test involved three participants, an interrogator (of either sex) and a male and female subject. The interrogator would try to discover which was male and which female by asking questions. The man would try to fool the interrogator and the woman would try to help him. To avoid revealing themselves by physical traits, the subjects and interrogator would ideally communicate by teletype from separate rooms.

Now, Turing said, substitute the participant trying to fool the interrogator with a computer. And instead of trying to discover which is a man and which a woman—have the interrogator decide which is human and which a computer.

Turing suggested this test would replace the subjective question, “Can a machine think?” and, later in the paper, suggested how well a computer might play the imitation game at the turn of the 21st century.

“I believe that in about fifty years’ time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion.”

Fooling 30% of the judges after five minutes, then, was Turing’s forecast of computing’s progress by 2000, not concrete criteria for passing his test.

Further, as Ray Kurzweil and Mitchell Kapor note in their Long Now Turing test wager,Turing’s imitation test was “specifically non-specific.”

His paper suggests the general framework for an effective test to objectively measure machine intelligence but left the details to evolve as appropriate over later decades.

Sadly, Turing died of cyanide poisoning in 1954 at age 42 (an apparent suicide) two years after being convicted of homosexuality, then illegal in the UK, and forced to decide between prison and chemical treatment for his “condition.”

Turing’s contributions were monumental beyond his musings on machine intelligence. However, his imitation test has endured, evolved, and over the years, become widely associated with the objective measurement of advanced AI.

There are a number of variations on the Turing test—variables include the total number of judges in the test, the length of interviews, and the desired bar for a pass (or percent of judges fooled). The tests involve a judge who conducts text interviews (usually by instant message or something similar) with a number of human subjects and a computer.

The goal is still to unmask the computer, and the broad aim of the tests is to show machines have attained mental capability indistinguishable from human beings.

Now, in some areas, computers have already met and surpassed human ability: Deep Blue in chess or Watson at Jeopardy. Computation on silicon is orders of magnitude faster than computation in the brain. Computers excel at brute force number crunching, simulation, and remembering and accessing huge amounts of data.

However, computers don’t have the brain’s aptitude for pattern recognition, adaptation, and the traits associated with them like language, learning, and creativity. These are some of the abilities the Turing test sets out to measure.

But to appear human, a program must also slow its responses, fabricate factual and typographical errors, inject emotional cues (positive and negative) and non-sequiturs. And this is curious. To prove intelligence, why do we require a machine mimic humans in all our strengths and failings—intelligence and ineptitude?

In his paper, Turing mounts a spirited defense against would-be opponents of machine intelligence. But I think the answer to why the Turing test requires a machine become indistinguishable from a human lies in his defense from consciousness in which he quotes British neurologist and neurosurgeon Geoffrey Jefferson:

“Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain—that is, not only write it but know that it had written it.”

In Jefferson’s view, a machine, through clever artificial means, may contrive to create and report its creation—but it can’t know it’s created because it’s no more than a collection of mechanical parts and instructions written by its programmers.

Turing takes Jefferson’s point and applies it to humans too, “According to the most extreme form of this view the only way by which one could be sure that machine thinks is to be the machine and to feel oneself thinking…likewise according to this view the only way to know that a man thinks is to be that particular man.”

And this is, I think, at the heart of what the Turing test can show.

We can’t prove “a machine thinks” any more than we can prove the person next to us thinks. But when one is indistinguishable from the other, then we are allowed to question whether a machine can think only as much as we are allowed to question whether a human can think—and beyond that point, the question can be resolved no further.

Each bar, say fooling 30% or 50% of the judges, should be viewed less as a definitive proof of anything and more as an indicator of progress.

The ultimate Turing test, in my view, won’t be in a controlled environment but out in the real world. A future scenario in which thousands or millions of ordinary people freely interact with a sufficiently advanced program, like Samantha from the movie Her, and spontaneously begin to treat it like a human companion in nearly every sense.