I have run across a few references to Paul Dietz's Common Lisp test suite over the past few months, the latest being from Juho Snellman's blog where he describes how the suite caught some errors in his new SBCL register allocator code.

Paul describes his work on the suite in his own blog, the 13-September entry of which had this to say:

To go beyond simple 'does function FOO do X when called on arguments Y' kinds of tests, I've also started experimenting with a random tester. I've taken the approach of William McKeeman (favorite quote: "Excellent testing can make you unpopular with almost everyone.") I generate random lisp forms that are guaranteed to be well-defined when a set of free variables have integer values. The forms are wrapped in two lambda expressions, one with type declarations for the free variables and one in which optimizations are inhibited. The functions are compiled and their outputs compared on a set of random inputs. This approach is very simpleminded (and a lot easier to do in Lisp than in C, as McKeeman did), but it immediately found new bugs in every lisp implementation on which I tried it. The beauty of this approach is that it exploits increasingly cheap CPU cycles to save the increasingly relatively expensive programmer/tester cycles.

In my experience, random testing of just about anything is a really good thing. I have always found it effective at finding those elusive corner cases that are so difficult for people to envision themselves. The computer can generate such perverse test cases that it really makes you wonder why people don't do this more often.

For instance, when I was at HP in the late 1980s working on the PA-RISC series 700 workstation products, I knew one of the engineers in the CPU group who was designing the next round of IEEE floating point co-processors for PA-RISC. If you aren't fully aware, the IEEE FP spec is very precise, but has a number of very hairy corner cases. In particular, things like rules for rounding in the last bit of precision are called out very clearly. This engineer, who I always thought was pretty smart, built himself a software simulation of his chip design and proceeded to generate random test vectors for the new design. Simultaneously, he ran those same vectors through his trusty Motorola 68K FP copro and had the system stop and dump the vectors that produced any mismatches.

When he first fired up the simulation, it ran for only a few seconds before it found a problem. He fixed the problem and restarted. It ran for a minute. He fixed the problem and restarted. It ran for a few minutes. He fixed the problem and restarted. It ran for an hour or two. He fixed the problem and restarted. It ran for 24 hours. He fixed the problem and restarted. It then ran non-stop.

Note that if he had just done this, he actually couldn't have been sure that his processor was bug-free, only that it had exactly the same bugs as his 68K FPU. To ensure that he didn't miss a bug, he also compared with earlier PA-RISC FPUs.

What is most interesting about this example is that if Intel had done the same basic testing on the original Pentium, they would not have suffered the massive FPU bug debacle in the 1990s.

I have used the same technique in testing network protocol implementations. Protocol decoding is notoriously error-prone and bugs here can result in huge security holes. In this case, I try to generate real data streams and then write a program to corrupt the data slightly. Often, because protocols involve various packet integrity algorithms, you have to generate something that will make it all the way through different parts of the stack and not just get thrown out at the lowest levels, otherwise you aren't testing much other than the lowest layers of your system. For instance, if you capture IP packets and corrupt them, recompute the IP and TCP checksums such that things don't get bounced right at those checks but actually make it to different parts of the stack.

Others have successfully used the technique to test UNIX (and here) and Windows utilities.

In short, I don't think I have ever seen a case where a random input tester didn't reveal a bug of some sort, unless the system had already been subjected to such a tester previously.

Update: Here is a good description of William McKeeman's work using random testing on the DEC C compiler. I think that Paul's blog referenced this but it dropped out when I did the cut/paste and I think Paul's link was also stale.