Some Optimizations Are No-Brainers Kernighan's rule for optimizations (Don't do it) is good advice. But as with most rules, there are exceptions.



For the past four weeks (1, 2, 3, 4), I've been giving examples of optimizations that have nasty side effects, or don't do what one might expect, or cause other problems. Reading these articles, you would probably figure that I agree with Brian Kernighan's two rules for optimization:

Don't do it. (For experts only) Don't do it yet.

You would be right. However, as with most rules, there are exceptions. In the case of optimization, three such exceptions are optimizations that

are ways of avoiding needless pessimizations; make such a difference, with so little effort, that it would be crazy not to use them; and are ways of simplifying the program without making it slower.

As an example that covers both (1) and (2), I used to work with someone who was asked to try to figure out how to make a daily mainframe accounting job take less than the 32 hours it was currently taking. It is obviously bad news when a daily job takes more than a day to run; and in this case it was even worse news — because this was in the early 1970s, when a computer was in the same price bracket as the part of the office building that contained it.

My colleague began by looking at the overall data flow in this big daily job. He found that it merged transaction data from two enormous sequential input files, and then sorted the result of the merge so that it would be in order for the next phase. This sort took four hours.

Do you see the problem? As a general rule, in order to merge sequential files, each of those files must be in order. If the input files were not already sorted, the merge would fail — so it was safe to assume that the input files were already sorted. But this fact meant that the output file was also already sorted, so there was no need to sort it again. In other words, this four-hour sort was completely unnecessary. It was a needless pessimization.

Continuing his investigation, my colleague found that one of the programs in this big job read an input file into an array. This program predated the existence of C++, so I'll take the liberty of representing that part of the program this way:

class Data { /* … */ }; Data data_array[/* A big enough number */ ];

Here, all of the elements of data_array were implicitly initialized to empty values. Later, the program read values into data_array :

while (/* there are still data to be read */) { int n = 0; while (!data_array[n].is_empty()) ++n; input_file >> data_array[n]; }

In other words, the program started by initializing the entire array to empty values. Then, each time it read a data item, it searched sequentially through the entire array until it found a value that was not empty. When it found such a value, it read a single data item into it.

This code was also a needless pessimization: Looking at every element every time the program wants to read a new element means that the time for element number n is O(n) . This fact in turn, means that the time needed to fill an n -element array is O(n2) . Using an extra variable to keep track of how many elements of the array had been filled would reduce that O(n2) to O(n) , which is an enormous difference for a trivial addition to the code.

The third kind of no-brainer optimizations are those that make the program easier to understand without making it obviously slower. You may quibble with using the word optimizations to refer to such tactics, but I think it's fair to include programming techniques that make a program clearer — hence better — without harming its performance.

We'll look at some examples of these clarity optimizations next week.