A Summary of Jon Louis Bentley's

``Writing Efficient Programs''

This summary of Jon Louis Bentley's book ``Writing Efficient Programs'' consists of selected text from that book with editorial changes, primarily removing references to specific examples.

© 1982 by Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632 (ISBN 0-13-970251-2)

© 1990 by Lawrence A. Crowl, University of Rochester, Computer Science Department, Rochester, N.Y. 14627

Methodology

The most important properties of a large system are a clean design and implementation, useful documentation, and a maintainable modularity. The first steps in the programming process should therefore be the design of a solid system and the clean implementation of that design. If the overall system performance is not satisfactory, then the programmer should monitor the program to identify where the scarce resources are being consumed. This usually reveals that most of the time is used by a few percent of the code. Proper data structure selection and algorithm design are often the key to large reductions in the running time of the expensive parts of the program. The programmer should therefore try to revise the data structures and algorithms in the critical modules of the system. If the performance of the critical parts is still unsatisfactory, then use the techniques below to recode them. The original code should usually be left in the program as documentation. If additional speed is still needed, then the programmer should work at lower design levels, including hand-written assembly code, operating system modifications, microcode, and special-purpose hardware design.

Applying the Rules

Identify the code to be changed. We should identify the code to be changed by monitoring the program, and then concentrate on the parts that are taking the majority of the time. Choose a rule and apply it. Once we know what code to change, we see if any of our rules can be used to change it. The rules are presented in groups as they relate to different parts of programs; we should therefore identify whether the expense of our program is going to data structures, loops, logic, procedures or expressions, and then search in the appropriate list for a candidate rule. When we apply a rule, we should make sure that the application preserves the correctness of the program; this is usually done by applying the spirit, if not the actual formulas, of program verification. Measure the effect of the modification. Removing a common subexpression in a traveling salesman tour program is typical of many changes we make: it appears that it would increase the program's speed by a factor of two but in fact it gives less than a three precent improvement. Even if we believe that we understand the effect of a transformation by reasoning alone, it is absolutely necessary to support that analysis with observation; we often find that we are quite mistaken. Document the resulting program. The final modified program should include a description of the clean code and of the modification that was incorporated for efficiency. That description can range from a brief comment to including a copy of the original code enclosed within comment characters and a thorough discussion of the rule used to modify it (with an appropriate reference to this book, of course).

Fundamental Rules

Code Simplification: Most fast programs are simple. Therefore, keep code simple to make it faster. Problem Simplification: To increase efficiency of a program, simplify the problem it solves. Relentless Suspicion: Question the necessity of each instruction in a time-critical piece of code and each field in a space-critical data structure. Early Binding: Move work forward in time. Specifically, do work now just once in hope of avoiding doing it many times later.

Space for Time Rules

Data Structure Augmentation: The time required for common operations on data can often be reduced by augmenting the structure with extra information or by changing the information within the structure so that it can be accessed more easily. Examples are reference counters and hints. Store Precomputed Results: The cost of recomputing an expensive function can be reduced by computing the function only once and storing the results. Subsequent requests for the function are then handled by table lookup rather than by computing the function. Caching: Data that is accessed most often should be the cheapest to access. (Caching can ``backfire'' and increase the run time of a program if locality is not present in the underlying data.) Lazy Evaluation: The strategy of never evaluating an item until it is needed avoids evaluations of unnecessary items.

Time for Space Rules

Packing: Dense storage representations can decrease storage costs by increasing the time required to store and retrieve data. (Decreasing data space may increase code space.) Overlaying: Storing data items that are never simultaneously active in the same memory space reduces data space. Code overlaying reduces code space by using the same storage for routines that are never simultaneously needed. Many operating systems provide this service automatically in their virtual memory systems. Interpreters: The space required to represent a program can often be decreased by the use of interpreters in which common sequences of operations are represented compactly. (Finite state machines are simple, compact interpreters.)

Loop Rules

Code Motion Out of Loops: Instead of performing a certain computation in each iteration of a loop, it is better to perform it only once, outside the loop. (Code cannot be moved out of loops if it has side effects that are desired on every iteration.) Combining Tests: An efficient inner loop should contain as few tests as possible, and preferably only one. The programmer should therefore try to simulate some of the exit conditions of the loop by other exit conditions. Sentinels are a common application of this rule: we place a sentinel at the boundary of a data structure to reduce the cost of testing whether our search has exhausted the structure. Loop Unrolling: A large cost of some short loops is in modifying the loop indices. That cost can often be reduced by unrolling the loop. Transfer-Driven Loop Unrolling: If a large cost of an inner loop is devoted to trivial assignments, then those assignments can often be removed by repeating the code and changing the use of variables. Specifically, to remove the assignment I:=J , the subsequent code must treat J as though it were I . Unconditional Branch Removal: A fast loop should contain no unconditional branches. An unconditional branch at the end of a loop can be removed by ``rotating'' the loop to have a conditional branch at the bottom. Loop Fusion: If two nearby loops operate on the same set of elements, then combine their operational parts and use only one set of loop control operations.

Logic Rules

Exploit Algebraic Identities: If the evaluation of a logical expression is costly, replace it by an algebraically equivalent expression that is cheaper to evaluate. Short-Circuiting Monotone Functions: If we wish to test whether some monotone nondecreasing function of several variables is over a certain threshold, then we need not evaluate any of the variables once the threshold has been reached. Short-circuit evaluation of boolean expressions is an example of this rule. (A more complex application of this rule exits from a loop as soon as the purpose of the loop has been accomplished.) Reordering Tests: Logical tests should be arranged such that inexpensive and often successful tests precede expensive and rarely successful tests. Precompute Logical Functions: A logical function over a small finite domain can be replaced by a lookup in a table that represents the domain. Boolean Variable Elimination: We can remove boolean variables from a program by replacing the assignment to a boolean variable V by an if-then-else statement in which one branch represents the case that V is true and the other represents the case that V is false. (This generalizes to case statements and other logical control structures.) This rule usually decreases time slightly (say, less than 25 percent), but greatly increases code space. More complex applications of this rule remove boolean variables from data structures by keeping separate structures for the true and false records.

Procedure Rules

Collapsing Procedure Hierarchies: The run times of the elements of a set of procedures that (nonrecursively) call themselves can often be reduced by rewriting procedures in line and binding the passed variables. Exploit Common Cases: Procedures should be organized to handle all cases correctly and common cases efficiently. Coroutines: A multiple-pass algorithm can often be turned into a single-pass algorithm by use of coroutines. Transformations on Recursive Procedures: The run time of recursive procedures can often be reduced by applying the following transformations: Code the recursion explicitly by use of a program stack.

If the final action of a procedure P is to call itself recursively, replace that call by a goto to its first statement; this is usually known as removing tail recursion. That goto can often be transformed into a loop.

is to call itself recursively, replace that call by a to its first statement; this is usually known as removing tail recursion. That can often be transformed into a loop. If a procedure contains only one recursive call on itself, then it is not necessary to store the return address on the stack.

It is often more efficient to solve small subproblems by use of an auxiliary procedure, rather than by recurring down to problems of size zero or one. Parallelism: A program should be structured to exploit as much of the parallelism as possible in the underlying hardware.

Expression Rules

Compile-Time Initialization: As many variables as possible should be initialized before program execution. Exploit Algebraic Identities: If the evaluation of an expression is costly, replace it by an algebraically equivalent expression that is cheaper to evaluate. An algebraic identity yields a fast range test that compiler writers can use on two's complement architectures.

We can often multiply or divide by powers of two by shifting left or right.

Strength reduction on a loop that iterates through the elements of an array replaces a multiplication by an addition. This technique generalizes to a large class of incremental algorithms. Common Subexpression Elimination: If the same expression is evaluated twice with none of its variables altered between evaluations, then the second evaluation can be avoided by storing the result of the first and using that in place of the second. (We cannot eliminate the common evaluation of an expression with important side-effects.) Pairing Computation: If two similar expressions are frequently evaluated together, then we should make a new procedure that evaluates them as a pair. Examples include sine/cosine and minimum/maximum. Exploit Word Parallelism: Use the full word width of the underlying computer architecture to evaluate expensive expressions. The bitwise representation of sets exploits this rule.

System Dependent Optimization