Teaching an Old Dog New Tricks

The Problem is Complexity

My first experience with software quality was in 1976, when I sat in front of an ASR-33 and laboriously typed 2 pages of BASIC code from David Ahl's "Creative Computing" into my high school's Hewlett-Packard 21MX. Supposedly it would let me simulate a lunar landing, but I discovered when I told it to "RUN" that the colons scattered all over the listing were actually important to the correct functioning of the program. Oops! I wish I could say that in that moment I achieved some kind of enlightenment about code quality - but instead I think I learned what most programmers learn - fiddle with it until it stops complaining. Then, not much later, I learned that a program that's not complaining still may not be working right. If I recall correctly, my lunar landing simulation went off into an infinite loop, and I've been hooked on computing ever since.

(This book got me hooked)

A lot has changed since my first year as a programmer, but two things have changed hardly at all:

The more complicated the program is, the harder it is to get it right.

It's really hard to tell the difference between a program that works and one that just appears to work.

Like a lot of programmers, I jumped into coding without even knowing what a "debugger" was. When I first encountered a program designed to help you write programs, it was like a big light-bulb going on in my head. By then, like most programmers, I had considerable practice at debugging using "PRINT" statements - although, by then, I had graduated to "printf( )" even the early version of "adb(1)" was a huge step in the right direction.

Old Tricks: Saber-C

Fast-forward a few years and I encountered my next big paradigm-shifting program to help me write programs: Saber-C.(1) At that time I was a presales support consultant for DEC and was constantly writing small bits of code for customers and internal users. DEC offered Saber-C on ULTRIX, and I decided to play with it because it was described to me as a "kind of super duper debugger." It turned out that Saber-C was a C language interpreter, which was fantastic because the run-time environment was fully simulated rather than simply being allowed to run, then monitored as you'd get in a debugger. So, if you allocated an int, and tried to use it as a char *, you got an error. Since it tracked the allocated size of a memory object as well as its type, you'd get a warning if you accessed off the end of an array, or walked a structure member in a structure you had just freed. I don't know how many times I've seen code where someone is freeing a linked list like this:

struct listelem { int stuff;

struct listelem *next; }; /* simplified to make it obvious what I am doing wrong */ freelist(struct listelem *lp) { while(lp != (struct listelem *)0) { free(lp); lp = lp->next; } }

That piece of code is especially pernicious since it'll work almost all of the time - until you run it on a weird architecture or with a memory allocator that compacts the freed space in a way that changes the contents of lp->next after the call to free(). I could show you the countless scars - literally, the death of a thousand cuts - that I've suffered from this kind of minor sloppiness.

For me, using Saber-C was an eye-opener. It gave me a whole new approach to development, since I could use the interpreter to directly call functions from a command line, without having to write a test-harness with a main( ) routine and controlled inputs and outputs. My test-harnesses were just a file of direct calls to the function I was writing, which I could feed directly into the interpreter with a mouse-click. Being able to do that, without having to go through a compile/link/debug cycle, my code-creation sped up dramatically and I was catching bugs in "real-time" as I wrote each block of code. After a little while, I think I can safely say that the quality of my code skyrocketed.

When I left Digital and went to Trusted Information Systems, I got involved in developing an internet firewall for The White House, as a research project under DARPA. As technical lead of that project, I had a small budget and used it to buy a copy of Saber-C, to serve as my main development environment. I wrote, debugged, and tuned the TIS Firewall Toolkit (FWTK)(2) entirely under Saber-C. The resulting code was remarkably robust and stable, though it eventually succumbed to feature-creep and the resulting code-rot as an increasing number of programmers had their hands in the code-base.

What can I say I learned from my Saber-C experience? First off, that there is no excuse for writing unreliable software. Humility - and an acceptance that you can and do make mistakes - is the key to learning how to program defensively. Programming defensively means bringing your testing process as close as possible to your coding so that you don't have time to make one mistake and move on to another one. I also learned that code that I thought was rock-solid was actually chock-full of unnoticed runtime errors that worked right 99% of the time, or were reliable on one architecture but not another. I used to use Saber-C as my secret weapon to convince my friends I had sold my soul to The Devil: whenever they were dealing with a weird memory leak or a wild pointer that was making their programs crash with a corrupted stack or mangled free list. Usually Saber-C could pinpoint the problem in a single pass. I don't write as much software as I used to (not by a long shot) but I keep an old Sparc Ultra-5 with Saber-C in my office rack for when I need it.

New Tricks: Fortify

As we've all discovered in the last decade, it's not enough for code to simply work correctly, anymore. Today's software has to work correctly in the face of a high level of deliberate attack from "security researchers" (3) or hackers eagerly attempting to count coup by finding a way to penetrate the software. Reliability tools like Saber-C help produce code that is relatively free of run-time errors, but the run-time testing performed by the typical programmer does not take into account the kind of tricks hackers are likely to attempt against the software once it is in the field. I know that I, personally, am guilty of this: when I wrote input routines for processing (for example) a user login, I worried about what would happen if the line was too long, or if a field was missing, or quotes didn't match - and that was about it. The recent history of internet security shows that most programmers take the same approach: worry about getting things as right as you can based on the threats you know about, then cross your fingers and ship the software. Most programmers who are aware that security is a consideration will learn maxims like "don't use strcat( )" and might use snprintf( ) instead of sprintf( ), but the environment is constantly changing, and so are the rules - it is impossible to keep up.



(How not to ship code that blows up elegantly in your customer's face)

As you can see above, people are working on dealing with detecting some of these flaws at run-time. Obviously, I'm a big fan of run-time error detection, but I think it should be done while the code is being developed, not while it's being run by the user. This error happened, as I was writing this, when QuickTime Player didn't like something in the music I was listening to. This lucky accident is a case in point of "better than nothing, but still half-assed."

Three years ago I joined a technical advisory board (TAB) for a company called Fortify, that produces a suite of software security tools. One of the big advantages of being a TAB member for a company is that you can usually mooch a license for their software if you want a chance to play with it. As part of another project I'm involved in, I am leading development of a website that is being coded in JSP - since security is always a concern, I wanted to be able to convince our engineers to use a code security tool. So I asked Fortify for an evaluation copy of their latest version, planning on running some code through it to see how well it worked.

Fortify's tools are built around a source code analyzer that renders a variety of programming languages (Java, C, ...) into an intermediate form which is then processed with a set of algorithms that attempt to identify and flag dangerous coding constructs, possible input problems, and so forth. For example, one of the approaches Fortify uses is similar to the "tainting" system that can be found in the Perl programming language: as an input enters the system it is tracked through the code-flow. Places where that input is used to compose other pieces of data are examined, and a flag is raised if the "tainted" data might find its way into a composed command such as an SQL call or shell escape ("injection attack"). In C programs, tainted data is tracked by size to verify that it is not copied into a memory area that is too small for it ("buffer overflow attack"). Later on, I'll show you what that looks like, when Fortify correctly discovered a potential buffer overflow in some of my code. This is incredibly useful because of the prevalence of buffer overflows and injection attacks. But the problem, really, is that there are too many attack paradigms in play today for any programmer to keep track of. Security specialists, even, have a hard time keeping track of all the ways in which code can be abused - it's just too much to expect a "typical programmer on a deadline" to be able to effectively manage.

After a bit of thinking, I cooked up the idea of taking Fortify's toolset and running it against my old Firewall Toolkit (FWTK) code from 1994, to see if my code was as good as I thought it was. As it happens, that's a pretty good test, because there are a couple of problems in the first version of the code that I already knew about - it would be interesting to see if Fortify was able to find them.

Building Code With Fortify Source Code Analyzer (SCA)

The source code analyzer acts as a wrapper around the system's compiler. In this case, since I was working with C code, I was using it as a front-end ahead of gcc.



(Running code through sourceanalyzer)

Before I could get my old code to build on the version of Linux I was using, I had to fix a bunch of old-school UNIX function calls that had been obsoleted. Mostly that meant changing old dbm calls to use gdbm instead, and replacing crypt( ) with a stub function. I also removed the X-Windows gateway proxy from the build process because I don't install X on my systems and didn't have all the necessary header files and libraries. It turns out that, if you were reasonably careful about how you pass the $(CC) compiler variable into your makefiles, you can run the code through source analyzer without having to alter your build process at all.

This is a crucial point, for me. I think that one major reason developers initially resist the idea of using a code checker is because they're horrified by the possibility that it will make their build process more difficult. That's a serious consideration, when you consider how the infinite various flavors of UNIX have become subtly incompatible, and how smart/complex it has made the typical build process. I admit that my original decision to use FWTK had something to do with my comfort in knowing it has a very minimalist build process and relatively few moving parts. As it turned out, adding sourceanalyzer to the FTWK build (see above) required absolutely no changes at all; I simply passed a new version of CC on the command line, e.g:

make CC='sourceanalyzer -b fwtk gcc'

When you run the code through the source analyzer, you provide a build identifier (in this case -b fwtk) that Fortify uses later when it's time to assemble the analyzed source code into a complete model for security checking. Running the source analyzer ahead of the compilation process slowed things down a bit, but not enough to bother measuring. My guess is that for a large piece of software it might add a significant delay - but, remember, you're not going to do a security analysis in every one of your compile/link/debug cycles.



(Performing an analysis run with sourceanalyzer)

Once you've run the code through sourceanalyzer, it's time to perform the full analysis. Invoking sourceanalyzer again with the "-scan" option pulls the collected scan results together and does a vulnerability analysis pass against them. In the example above, I ran it with flags to create an output file in "FVDL" - Fortify's Vulnerability Description Language, an XML dialect. Fortify supports several different data formats for the analysis results; I used the XML in this example because I wanted to be able to look at it. Like most XML, it's extremely verbose; the preferred analysis format is something called "FPR" - in this example performance was not my objective.

Running the scan process is a lot more intensive than the first pass. I was not running on beefy hardware (1.2 Ghz Celeron with 1 Gb of RAM running Linux) and the analysis took several minutes to complete. Earlier, before I switched to the machine with 1 Gb of RAM, I was running on an older server with a 500Mhz processor with 256Mb of RAM - Fortify warned me that "performance would suffer" and it wasn't kidding! If you're going to run Fortify as a part of your production process, you should make sure that your analyst's workstation has plenty of memory.

One thing that impressed me about how Fortify's system works is that it's very environment-agnostic. The fact that I didn't have X-Windows on my Linux box didn't matter; I was able to generate the analysis using command line incantations that took me all of five minutes to figure out. I have to confess at this point that I barely read the documentation; I was able to rely almost completely on a "quick walkthrough" example. After the FPR/FVDL analysis file was produced, I fired up the Audit Workbench (Fortify's graphical user interface) on my Windows workstation, and accessed the results across my LAN from a samba-shared directory. I was pleasantly surprised that, not only did it work flawlessly, Audit Workbench asked me if I wanted to specify the root directory for my source tree because the files appeared to be remote-mounted. I ran all my analysis across the LAN and everything worked smoothly.



(Summary of the FWTK analysis)

When you open the FPR/FVDL with Audit Workbench you are presented with a summary of findings, broken down in terms of severity and by categories of potential problems such as Buffer Overflow, Resource Injection, Log Forging, etc. You are then invited to suppress categories of problems.



(Turning off different analysis issues)

I thought that being able to suppress categories was very cool. For example, on the FWTK code I was assuming that the software was running on a secured platform. One of the options for category suppression is "file system inputs" - which disables the warnings for places where the software's input comes from a configuration file instead of over a network. Turning this off greatly reduced the number of "Hot" warnings in the FWTK - it turns out there were a few (ahem!) small problems with quote-termination in my configuration file parsing routines. If someone was on the firewall and able to modify the configuration file, the entire system's security is already compromised - so I turned off the file system inputs and environment variable inputs.



(Audit Workbench in action - click for a detail view)

The screenshot above is the main Audit Workbench interface. Along the upper left side is a severity selector, which allows you to focus only on Hot, Warning, etc., issues. Within the upper left pane is a tree diagram of the currently chosen issue and where it occurs in the code, with a call flow below. I found this view to be extremely useful because it did a great deal to jog my memory of inter-function calling dependencies in code I'd written over fifteen years ago. If you look closely at the screen-shot above you'll see that one of the candidate buffer overflows in the FTP proxy had to do with where the user's login name is parsed out using a routine called enargv(). I distinctly recall that routine as being fairly complicated but, more importantly, I know I didn't write it with the idea that it might be subjected to nasty quote-balancing games from an attacker. When you click on the lines in the small display, it pops you to the source code in question, so you can see exactly what is going on. In this example, I'm taking a string (the user's login name that they provided to the proxy) and tokenizing it from their input using enargv( ), sprintf()ing it into a buffer then handing it to syslog(). Ouch! As it turns out with closer inspection, there were some controls on the length that the value of authuser could reach, at that point, but it's definitely dodgy code that deserved review.

Let's follow the code-path and you'll see what I mean. It's a good example of how unexpected interactions between your code and your own library routines can get you into trouble. The problem starts in ftp-gw.c:

ftp-gw.c: usercmd() (code removed for this example)

char buf[BSIZ]; /* BSIZ is 2048 */ char tokbuf[BSIZ]; char mbuf[512]; char *tokav[56]; /* getline called to read remote user's command input */ if((x = getline(0,(unsigned char *)buf,sizeof(buf) - 1)) < 0)

return(1);

if(buf[0] == '\0')

return(sayn(0,badcmd,sizeof(badcmd)-1));

tokac = enargv(buf,tokav,56,tokbuf,sizeof(tokbuf)); if(tokac <= 0) return(sayn(0,badcmd,sizeof(badcmd)-1));

So far, so good. Fortify has identified that some input is coming in on a socket, through its analysis of getline() and is tracking the use of that input as it flows through the code. It's also performing analysis to track the sizes of the data objects as they are used. In this example, that's my problem. I won't walk you through all the code of enargv() but it's a string tokenizer that "understands" quotes and whitespace and builds an argc/argv-style array of tokens. It's pretty good about checking for overflows, but if invoked with the correct input, in this case, enargv()could be coerced into returning a string that was one character smaller than BSIZ, or 2048 bytes. And that's where the code analyzer flags that this potentially large blob of data is getting used in some risky ways:

ftp-gw: line 679 (where the audit workbench identified "multiple issues") cmd_user()

char buf[1024];

char mbuf[512]; /* some processing done in which "dest" is plucked out of the tokav/tokac set that was parsed earlier with enargv() ... then: */ { sprintf(mbuf,"Permission denied for user %.100s to connect to %.512s",authuser,dest); syslog(LLEV,"deny host=%.512s/%.20s connect to %.512s user=%.100s",rladdr,riaddr,dest,authuser); say(0,mbuf); return(1); }

Ouch!! I am trying to stuff a string that is potentially up to 2048 characters into mbuf which is 512. What was my mistake? Simple: I relied on a function that I had written years before, and had forgotten that it would potentially return strings that were as large as the original input. This kind of mistake happens all the time. I'm not making excuses for my own bad code - the point is that it's absolutely vital to go through your code looking for this kind of mistake. I know that, if you had asked me at the time when I wrote it, I would have sworn that "It was written with insane care; there are no unchecked inputs!" And I'd have been wrong. This was a fairly subtle mistake that I overlooked for 3 years while I was actively working on the code, and Fortify found it immediately.

My over-reliance on syslog() also turned out to be a problem a number of years after the FWTK was released. One version of UNIX (that shall remain nameless....) syslog() function had an undersized fixed-length buffer that it used to construct the log message's date/time stamp and the message - since the FWTK code tended to push lengthy messages containing (among other things) host DNS names, an attacker could craft a buffer overrun and push it through the FWTK's overeager logging. This is another place where an automated tool is useful: if the tool's knowledge-base contained the information that sending syslog() messages longer than 1024 bytes was risky on some operating systems, the code analyzer would have checked all my syslog() function calls for inputs longer than 1024 bytes. In fact, when the code analyzer's knowledge-base gets updated with a new rule, it's going to retroactively trigger you auditing code you knew was OK but that may no longer be OK in the light of your new knowledge.

One feature of Audit Workbench that I didn't use for this experiment was the analyst's workflow. At the bottom of the screen is a panel with a "Suppress Issue" button and some text entry fields and drop-menus. These implement a problem tracking and reviewing system that looks extremely well thought-out. If I were resuming maintenance of this software and was establishing a code review/audit cycle, I could now review each of the hot points Fortify identified and either fix bugs, redesign routines, or mark the hot point at reviewed and accepted. For a larger body of code this would allow multiple analysts to coordinate working on a regular review process. If I were a product manager producing a piece of security-critical software, I would definitely use Fortify to establish a regular audit workflow as part of my release cycle.

The number of issues Fortify identified in the FWTK code is pretty daunting. After a day spent digging into them, I found that a lot of the items that were flagged were false positives. Many of those, however, were places where my initial reaction to to the code was "uh-oh!" until I was able to determine that someplace earlier in the input there was a hard barrier on an input size, or some other control against inappropriate input. It made me reassess my coding style, too, because I realized that it's important to keep the controls close to the danger-points, rather than letting them be scattered all over the code. When I collect network input, I typically scrub it by making sure it's not too long, etc., then pass it down to other routines which work upon it in a state of blind trust. In retrospect I realize that doing it that way could lead to a situation in which the lower-level routines get inputs that they trust, which came via a path that I forgot to adequately scrub. The example of my use of enargv() in the FTP proxy is a case of exactly that kind of mistake. When I first started using Saber-C, I felt like it was a tool that taught me things that made me a better programmer. I feel the same way about Fortify.

Experience with Other Open Source Packages

Once I had gotten some hands on with Fortify, I decided to assess the level of difficulty in applying it against a few other security critical, popular, Open Source programs. My main concern was whether the build process would admit substituting sourceanalyzer for gcc without having to spend hours editing makefiles. The results were good - postfix, courier IMAPd, syslog-ng, BIND, and dhcpd all turned out to work if I specified sourceanalyzer as $(CC). I.e.: all I had to do was tell it:

make CC='sourceanalyzer -b postfix gcc'

and it ran through from the command line with no changes in the build process at all. Sendmail proved to be a little bit more difficult but only because I was too stubborn to read the build instructions (two minutes) and figured it out by reading through the build scripts and makefiles instead (two hours).

I did not attempt to do an in-depth, or even cursory, analysis of the Open Source packages. With each one, however, I spent some time reviewing the hot listed items. Some of the packages had many items flagged; one of the larger code-bases had several thousand. Interestingly, I found I could immediately tell which sections were older and better thought-out, and which sections were developed by less experienced programmers. I imagine you could use the hot list, sorted by code module, to make a pretty good guess as to which of your programmers was more experienced and more careful. As I was reviewing one hot listed item that looked particularly promising, I discovered a comment above the questionable line of code that appeared to have originated from some kind of manual code audit workflow tool that I haven't identified. The comment claimed that the operation was safe and that and had been reviewed by so-and-so in version such-and-such. It's nice to see that these important pieces of our software "critical infrastructure" are, in fact, being audited for security holes!

One of the packages I examined had a number of very questionable-looking areas in a cryptographic interface that is normally not enabled by default. My suspicion is that the implementation of that interface was thrown into the software as an option and isn't widely used, so there has been relatively little motivation to clean it up. The software that you'd expect would come under concerted attack and review by the hacker and "security researcher" community proved to be fairly clean. I didn't see anything that immediately jumped out as a clear hole, although I found several dozen places in a module of one program where I felt it was worth sending an Email message to the maintainer suggesting a closer review.

Notification

One topic in security about which I have been exceptionally vocal is the question of how to handle vulnerabilities when they are discovered. I personally believe that the hordes of "security researchers") that are constantly searching for new bugs are largely a wasteful drain on the security community. The economy of "vulnerability disclosure," in which credit is claimed in return for discovering and announcing bugs, has had a tremendous negative impact on many vendors' development cycles and product release cycles. Many of these larger vendors have begun using automated code-checking tools like Fortify in-house, to improve their software's resistance to attack. Indeed, if the "security researchers" actually wanted to be useful, they'd be working as part of the code audit team for Oracle, or Microsoft. But then they couldn't claim their fifteen minutes of fame on CNN or onstage at DEFCON.

My decision to use the FWTK code-base was partly influenced by the fact that it's very old code and has largely fallen out of use. I felt that, if I had discovered a lot of vulnerabilities in a widely-used piece of software, I'd feel morally obligated to invest a lot of my time in making sure they were fixed. As a matter of philosophy, I don't approve of releasing information about bugs that will place people at risk - and, in the case of the FWTK I knew I wasn't going to annoy the author of the code. Not too much, anyhow! I think that this approach worked pretty well, especially considering the number of potential problems I found in my code.

While I was running Fortify against the other open source packages that I mentioned earlier, I indentified six exploitable vulnerabilities in two of the packages. To say that that was "scary" would be an understatement, since I invested under an hour in poking randomly about in the results from each package. I followed procedures that have worked for me since the mid-1980's: I researched the owner of the module(s) in question, contacted them personally, and told them what I'd found. Contrary to the ideology of the "full disclosure" crowd, everyone I contacted was extremely responsive and assured me that the bugs would be fixed and in the next release; No hooplah, no press briefing, no rushing out a patch. I won't get my fifteen minutes of fame on CNN but that's all right. I'd rather be part of the solution than part of the problem.

Lessons Learned

This experience was very interesting and valuable for me. First off, it gave me a much-needed booster-shot of humility about my code. Having a piece of software instantly point out a dozen glaring holes in your code is never fun - but it's an important sensation to savour.

More importantly, it showed me that tools like Fortify really do work, and that they find vulnerabilities faster and better than a human. That's a significant result if you're involved in software development for products that are going to find themselves exposed to the Internet. Since the FWTK code was developed using extensive run-time checking with Saber-C, it proved to be extremely solid and reliable, but Saber-C never checked specifically for security flaws. As it turns out, there is a major difference between the kind of analysis your tools should do for run-time reliability as opposed to security. Clearly, both are necessary. As a developer (or "former developer" anyway) I am deeply concerned about how difficult it is to do a reliable code-build on the various flavors of UNIX/Linux that are popular - adding source code checking to the build cycle was initially scary but it turned out that the fear wasn't merited. I admit I was pleasantly surprised.

The "many eyes" theory of software quality doesn't appear to hold true, either. FTWK was widely used for almost ten years, and only one of the problems I found with Fortify was a problem I already knew about. So FWTK was a piece of software (in theory) examined by "many eyes" that did not see those bugs, either. When you consider that code-bases like Microsoft's and Oracle's can number in the tens of millions of lines, it's simply not realistic to expect manual code-audits to be effective. Engineers I've talked to at Microsoft are using some form of automated code checking, but I don't know what it is. Oracle is using a suite of tools including Fortify. I suppose the inevitable end-game with automatic vulnerability checkers is that they will become available to both sides. That'll still be a good thing because it will go a long ways toward reducing the current trade in the obvious-bug-of-the-month. If we can push code quality for Internet applications to the point where a "security researcher" or a hacker has to invest weeks or months to find an exploit, instead of hours, the world will be a much better place.

Acknowledgements

I would like to thank Brian Chess at Fortify for supporting this research through the loan of a sourceanalyzer license. Jacob West at Fortify was kind enough to answer some of my questions, and to gently RTFM me when I needed it.

(1) Saber-C is now known as "CodeCenter" and is still available for a limited number of platforms. It never appeared to do very well in the marketplace, which is some kind of tragedy. It was, and is, a terrific piece of software. The fact that a product as useful as Saber-C could fail to do well in the market is an indicator of the impoverishment of software "engineering" practices industry-wide.

(2) The Firewall Toolkit later became the core of the TIS Gauntlet firewall. For a few years after its release, the FWTK code-base was at the center of more than half of the firewalls on the Internet.

(3) I prefer to call them "vulnerability pimps." I am a security researcher. These guys are smart and hard-working but they are not my peers.