Each year in Las Vegas, Defcon holds a contest called Capture the Flag. It's a hacking competition: you try to break into their system while they try to break into yours. Just like regular Capture the Flag, there's offense and defense. If your offense is good, you might break through the enemy system before they know what you're up to. If you're a defensive player, you can stop the enemy intrusions early and learn from them, giving your offense a leg up. Of course, hackers are competitive — never more so than at Defcon — so the most important part is that by the end, you know who won and who lost.

Michael Walker has helped design those games, even joined in them a few times. In 2013, he came to DARPA with a challenge: teach computers how to play.

The result is DARPA's Cyber Grand Challenge, which begins its qualifying challenge today, testing entries from more than 100 teams. Each entry represents a different take on what such a program might look like. The basic task for each program is simple — find ways to break software — but there are almost endless ways a given piece of software might break. No one has ever built an automatic bug hunter for a practical purpose like this, and it's not clear how best to do it, or if it can be done at all. But by the time Defcon convenes next summer, Walker is aiming to have a full crop of those bug hunting programs, pitted against each other in gladiatorial capture-the-flag combat. The winner gets $2 million, with $1 million and $750,000 for the runners-up.

"It's never been done as an autonomous system."

DARPA has tried this model before, holding Grand Challenges for self-driving cars, humanoid robots, and drivetrain systems. In each case, the goal has been to take an idea with lots of theoretical research and tip it into the realm of the practical. To that, DARPA dangles a big cash prize and counts on lots of entries. The contest is open to anyone, with small shops and academic teams competing alongside major corporations. Some get funding assistance from DARPA, but others come at it from the outside, all generating as many different approaches to the problem as possible.

In the case of the self-driving car, it worked. After some early false starts, Sebastian Thrun's Stanford team emerged as the hero of the late challenges and went on to form the core of Google's self-driving program. Walker's hope is that automated security research could be near a similar tipping point. "Program analysis — software that studies other software — is right now where vehicle automation was then," Walker says. "They can solve lots of interesting problems in the lab, but it's never been done under practical applications and it's never been done as an autonomous system."

Walker's goal is arguably even harder than the self-driving car. Instead of building software that can walk or steer, his entries will have to answer the fundamental question of security research: how might this code fail? It's a hard question, one that occupies an entire industry full of very smart people and still only manages to put a small dent in the many ways a program can fail. It can take years for a vulnerability to surface, even in widely used software like Windows. Once a researcher finds a vulnerability, building and applying the patch can take months. In high-profile cases like the Heartbleed vulnerability, the result is often a real scramble, as criminals rush to exploit the bug before admins have time to patch it.

It can take years for a vulnerability to surface, and months more to patch it

Bug-hunting software would act faster, potentially patching vulnerabilities as soon as it sees them being exploited. The Cyber Challenge is testing out the most elemental form of the idea, but if the test models become practical, it would be a pivotal change for the security profession, which currently assumes that any widely used software has vulnerabilities we don't know about. Computability theory dictates that the programs won't be able to find every vulnerability, but just outrunning human researchers and speeding up the patch cycle would be enough to fundamentally change the way software works. "It is utterly disruptive to the way we think about computer security," Walker says. "Right now we're worried about you clicking the wrong link, or knowing about that command and control server as a threat indicator, but we've given up on the software safety part of it. It's considered an unsolvable problem."

"It is utterly disruptive to the way we think about computer security."

For now, Walker is most concerned with showing the idea can work at all. The entries submitted today will be run against a suite of test software, with the best entries receiving funding from DARPA. The funded teams will compete against an open field in a series of challenges leading up to Defcon 2016, where the finalists will go head to head, using high-powered computers to show off their programs in front of a live audience. To test out the programs, Walker's team is providing a brand new binary executable format and 100 new pieces of software. Each one comes with a clear task and a clear success state; the attacker’s job is to make it fail. That means any vulnerabilities will be completely new and useless for attacks on existing software. As Walker put it, "We needed a desert to play in."

The nature of Capture the Flag means that the programs will only compete against each other, so it will be easy for later generations of software to adapt and improve. Most importantly, if DARPA wants to put a program head to head against a human some day for a Deep Blue versus Kasparov moment, we'll know exactly how to do it. But for now, the big question isn't who will win, but if anyone can prove out the idea itself.

Even that may be harder than it looks. In the case of self-driving cars, the first challenge was disastrous: none of the cars made it even a 10th of the way through the desert route and the $1 million prize went unclaimed. Just over a decade later, it's one of the most sought-after projects in tech. In the end, the first round of projects may not be the best way to judge. "What happens after an initial prototyping contest where we go from impossible to possible? Universally what happens is, everybody tears everything down and starts over," Walker says. "This is how technology development works."