At the Black Hat cybersecurity conference in 2014, industry luminary Dan Geer, fed up with the prevalence of vulnerabilities in digital code, made a modest proposal: Software companies should either make their products open source so buyers can see what they’re getting and tweak what they don’t like, or suffer the consequences if their software failed. He likened it to the ancient Code of Hammurabi, which says that if a builder poorly constructs a house and the house collapses and kills its owner, the builder should be put to death. No one is suggesting putting sloppy programmers to death, but holding software companies liable for defective programs, and nullifying licensing clauses that have effectively disclaimed such liability, may make sense, given the increasing prevalence of online breaches. The only problem with Geer’s scheme is that no formal metrics existed in 2014 for assessing the security of software or distinguishing between code that is merely bad and code that is negligently bad. Now, that may change, thanks to a new venture from another cybersecurity legend, Peiter Zatko, known more commonly by his hacker handle “Mudge.” Mudge and his wife, Sarah, a former NSA mathematician, have developed a first-of-its-kind method for testing and scoring the security of software — a method inspired partly by Underwriters Laboratories, that century-old entity responsible for the familiar circled UL seal that tells you your toaster and hair dryer have been tested for safety and won’t burst into flames. Called the Cyber Independent Testing Lab, the Zatkos’ operation won’t tell you if your software is literally incendiary, but it will give you a way to comparison-shop browsers, applications, and antivirus products according to how hardened they are against attack. It may also push software makers to improve their code to avoid a low score and remain competitive. “There are applications out there that really do demonstrate good [security] hygiene … and the vast majority are somewhere else on the continuum from moderate to atrocious,” Peiter Zatko says. “But the nice thing is that now you can actually see where the software package lives on that continuum.” Joshua Corman, founder of I Am the Cavalry, a group aimed at improving the security of software in critical devices like cars and medical devices, and head of the Cyber Statecraft Initiative for the Atlantic Council, says the public is in sore need of data that can help people assess the security of software products. “Markets do well when an informed buyer can make an informed risk decision, and right now there is incredibly scant transparency in the buyer’s realm,” he says. Corman cautions, however, that the Zatkos’ system is not comprehensive, and although it will provide one indicator of security risk, it’s not a conclusive indicator. He also says vendors are going to hate it. “I have scars to show how much the software industry resists scrutiny,” he says.

Photo: Cole C Wilson

Software Seal of Approval When Mudge announced on Twitter last year that the White House had asked him to create a cyber version of Underwriters Laboratories, praise poured in from around the security community. No one knew the details, but people were confident if he was involved, it would be great. “Excellent! Something everyone has talked about for decades!” the Def Con hacker conference tweeted after his announcement. “That’s a concept that really could make a difference if executed well,” wrote Bruce Potter, founder of the Shmoo Group crypto-security collective, which runs the annual Shmoocon security conference Mudge has been tightlipped about the nature of the cyber UL ever since, but he agreed to discuss the details in advance of a talk he’s presenting next week at the Black Hat conference in Las Vegas.

“To use the car analogy, does it have seatbelts, does it have air bags, does it have anti-lock brakes?” — Peiter Zatko

He says the method their lab uses to evaluate software is based on one he taught NSA hackers in the 1990s about how to find the softest targets on an adversary’s network. (During his run back then with the famed hacker think tank L0pht Heavy Industries, Mudge and his L0pht colleagues regularly provided advice to various parts of the government.) The technique involves, in part, analyzing binary software files using algorithms created by Sarah to measure the security hygiene of code. During this sort of examination, known as “static analysis” because it involves looking at code without executing it, the lab is not looking for specific vulnerabilities, but rather for signs that developers employed defensive coding methods to build armor into their code. “To use the car analogy, does it have seatbelts, does it have air bags, does it have anti-lock brakes? All the things that are going to make [a hacker’s] life more difficult,” Mudge says. The Zatkos say a code’s security hygiene, measured by the programming methods developers use, as well as by the tools and settings used to compile the resulting software, are good predictors of whether a software application will have serious security vulnerabilities and reliability issues.

Their algorithms run through a checklist of more than 300 items, such as whether the compiler used to convert the source code into binary inserted common protective features, like preventing portions of memory reserved for program data — the “stack” and “heap” — from being used to hold additional software. “Things like ASLR [address space layout randomization] and having a nonexecutable stack and heap and stuff like that, those are all determined by how you compiled [the source code],” says Sarah. “Those are the technologies that are really the equivalent of airbags or anti-lock brakes [in cars]. They’re the things that make software better than it used to be.” Modern compilers of Linux and OS X not only add protective features, they automatically swap out bad functions in code with safer equivalent ones when available. Yet some companies still use old compilers that lack security features. The lab’s initial research has found that Microsoft’s Office suite for OS X, for example, is missing fundamental security settings because the company is using a decade-old development environment to build it, despite using a modern and secure one to build its own operating system, Mudge says. Industrial control system software, used in critical infrastructure environments like power plants and water treatment facilities, is also primarily compiled on “ancient compilers” that either don’t have modern protective measures or don’t have them turned on by default. Asked about the findings, a Microsoft spokesperson would only say, “We are focused on security as a core component in the software development process. We developed and are committed to the Security Development Lifecycle, and continue to lead the industry in creating the most secure products across all platforms.” The Zatkos’ algorithms also assess the number of branches in a program; more branches mean more complexity and more potential for error. And they look at the presence of complex algorithms that could be susceptible to algorithmic complexity attacks. The lab is also looking at the number of external software libraries a program calls on and the processes it uses to call them. Such libraries make life more convenient for programmers, because they allow them to repurpose useful functions written by other coders, but they also increase the amount of potentially vulnerable code, increasing what security experts refer to as the “attack surface.” There are about 200 specific external library calls, Mudge says, that are particularly difficult to implement in a manner that ensures a given program executes safely.

If they get a really low score, “we can guarantee that … they’re doing so many things wrong that there are vulnerabilities” in their code. — Sarah Zatko

The process they use to evaluate software allows them to easily compare and contrast similar programs. Looking at three browsers, for example — Chrome, Safari, and Firefox — Chrome came out on top, with Firefox on the bottom. Google’s Chrome developers not only used a modern build environment and enabled all the default security settings they could, Mudge says, they went “above and beyond in making things even more robust.” Firefox, by contrast, “had turned off [ASLR], one of the fundamental safety features in their compilation.” Mudge worked for Google previously, so some might accuse him of bias, but he says their algorithms, which have been vetted by an outside technical board, ensure that the automated assessments aren’t biased. Software vendors will no doubt object to the methods they’re using to score their code, arguing that the use of risky libraries and old compilers doesn’t mean the vendors’ programs have actual vulnerabilities. But Sarah disagrees. “If they get a really good score, we’re not saying there are no vulnerabilities,” says Sarah. But if they get a really low score, “we can guarantee that … they’re doing so many things wrong that there are vulnerabilities [in their code].” The lab aims to prove such vulnerabilities with the second part of its testing regimen, which uses fuzzing, a method that involves throwing a lot of data at a program to see if it crashes or does something else it shouldn’t do. “In actually executing it and crashing it, we’re confirming that, yes, this thing has bugs, this thing crashed,” Mudge says. “We were able to give it input and it behaved abhorrently.” Not all crashes indicate the presence of a bug that hackers can exploit, but they do, at a minimum, indicate that a program may be unreliable for users. In the lab reports the Zatkos plan to make available to the public, they will note which crashes they found were potentially exploitable. The Zatkos don’t plan to fuzz every program, only enough to show a direct correlation between programs that score low in their algorithmic code analysis and ones shown by fuzzing to have actual flaws. They want to be able to say with 90 percent accuracy that one is indicative of the other. Mudges Storied Hacking History Mudge has a long history in the hacker and security communities. While a member of L0pht, he and his L0pht colleagues testified to federal lawmakers in 1998 that the group could bring down the internet in 30 minutes using a serious flaw that still exists.

Photo: Cole C Wilson