Are You Sure That Your Program Works? One of the more effective ways of finding out that something doesn't work is to try to explain why you think it does work.



Everyone who has ever written a nontrivial program knows that humans make mistakes, and no program can ever simply be assumed to work as specified. Where people's opinions differ is in what to do about this state of affairs.

The simplest strategy is "Try it out; if it appears to work, assume it works until proven otherwise." This strategy might be reasonable for small programs used only by their authors, but for software on any kind of commercial scale, a more thorough approach is needed.

One such approach is "desk checking," which might be described as "Stare at it really hard and convince yourself that each part of it works." Described this way, desk checking sounds silly, but I've found lots of software bugs that way. The technique works even better if one programmer tries to explain the program line by line to another programmer. One of the more effective ways of finding out that something doesn't work is to try to explain why you think it does work.

Dijkstra, among others, strongly favored the idea of proving formally that programs work. However, this idea, although attractive on the surface, runs into practical problems that so far have prevented it from being widely used. One of these problems is that there is no guarantee of being able to construct a proof for every working program — especially if that construction is done by another program. Additionally, parts of just about every real-world programming language, such as floating-point arithmetic, have properties that make formal proofs difficult.

Here is an example that I hope will illustrate just how hard it can be to prove even a simple program correct:

double add3(double x, double y, double z) { return x + y + z; }

This function computes the sum of three floating-point numbers. Unfortunately, it doesn't work correctly. If it did, then add3(1E100, 1.0, -1E100) would yield 1.0 , but on most implementations it will yield 0.0 . The reason, of course, is that 1E100 is such a large number that adding 1.0 to it doesn't change it. Adding -1E100 then gives the same zero result that would appear if the 1.0 were totally missing.

We can deal with this problem in one of several ways:

Define "the sum of three floating-point numbers" in such a way that this program fits the definition. Find and implement an algorithm that does a better job of adding three floating-point numbers. Shrug our shoulders and say that programs that involve floating-point arithmetic are inherently imprecise, and that this implementation is good enough.

Option (1) is certainly the easiest of these to implement. However, it has a drawback: Defining the sum in this way requires an inferior solution. That is, if someone were to rewrite this function so that add3(1E100, 1.0, -1E100) actually did yield 1.0 , it would no longer meet its specification. In effect, (1) rules out the possibility of implementing (2) later as an improvement. And (3) isn't much help either, because it leaves unanswered the question of how much imprecision is allowable before the program is declared to be broken.

Because of these and other difficulties in formal correctness proofs, most program testing these days seems to be informal. One testing technique that I think has been particularly useful is the idea of writing test programs before one even writes the program to be tested. In other words, you define your problem by writing test cases that show how the program is supposed to behave. Because you don't even have your program yet, it fails those tests. Your programming task has now been turned into a debugging task: Your (empty) program doesn't do what it is supposed to so, and your job is to fix it in the most straightforward way possible. On more than one occasion, I have seen this technique advocated as the primary — or even the only — test strategy. For example, some programmers take the view that "As soon as all the test cases pass, it's time to ship."

I think that this particular viewpoint may be as impractical as the viewpoint that every program should be proved correct — and for a similar reason. Correctness proofs are problematic whenever a desirable property of a program is one that is hard to specify rigorously; test cases fail as the sole judge of a program's correctness whenever a program happens to rely on any kind of undefined behavior. These two reasons are similar because they are both consequences of the differences between theory and practice.

In theory, a+b+c computes the sum of a , b , and c . In practice, that isn't always the case. In theory, a program that produces correct results for every input you can think of is correct. In practice, that program might be working only by coincidence, and the undefined behavior on which it relies might change at some time in the future.

In general, the possibility of undefined behavior makes any kind of testing much more difficult than it would be otherwise. One might think, therefore, that the interest of software reliability demands that programming languages should rule out any possibility of undefined behavior. Unfortunately, software has other aspects that make testing difficult — to the extent that merely prohibiting undefined behavior wouldn't help much. I'll talk about some of those aspects next week.