Kernel Hacker's Bookshelf: The Practice of Programming

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

In The Mythical Man-Month, Fred Brooks observes that the productivity of experienced programmers frequently varies by a factor of 10 or more. What makes the 10x programmers so much better? Undoubtedly some of the difference is due to native facility with language or logic. But even with these advantages, no one is born writing beautiful, elegant, maintainable code; everyone goes through a learning process.

How do we learn to be good programmers? In many ways, the art of computer programming is still stuck in the era of the master-apprentice system. Some of us are lucky enough to learn to program in something like "the UNIX room" at Bell Labs, where you could shoulder-surf the likes of Ken Thompson and Dennis Ritchie. Occasionally someone practices pair-programming instead of just arguing passionately about it, and once in a very long while, a 10x programmer will actually teach another person how to program. Unfortunately, formal university education rarely teaches students about the practical aspects of programming, as any holder of a computer science degree will readily attest, and few programmers have the time, interest, or ability to write accessible books about programming. As a result, most programmers are doomed to a decade of re-inventing wheels by trial and error.

Brian Kernighan and Rob Pike are two 10x programmers who do have the time, interest, and ability to write a book about software engineering best practices. The Practice of Programming aims to fill the gaps in the training of most computer programmers. From the book:

Topics like testing, debugging, portability, performance, design alternatives, and style - the practice of programming - are not usually the focus of computer science or programming courses. Most programmers learn them haphazardly as their experience grows, and a few never learn them at all.

This book probably won't make you ten times more productive, but it can easily make you twice as productive (and half as frustrated). If I could send one book to a programmer trapped on a desert island, this would be the book - and I'd send the same book to the new programmer who just joined my development team.

Overview

The Practice of Programming differs from most programming books in several enjoyable ways. Rather than promoting a particular new programming philosophy, Kernighan and Pike focus on three principles: simplicity, clarity, and generality. As you might guess from the title, the book is short on theory and long on practice. About one third of the ~250 page book is taken up by actual real-world example code, starting with the original dodgy code and showing the step-by-step evolution to better code. Most examples are in C, but the principles illustrated readily translate to other languages.

The writing style of this book is refreshingly practical and down-to-earth, without losing generality. The authors avoid stark black-and-white pronouncements, preferring to discuss why different techniques are useful under different conditions. Clarity is another hallmark of their style; they use as few words as possible to clearly state each point, and dismiss trivialities and side issues quickly and cleanly. A typical example of this approach is their advice on brace and indentation style: "The specific style is less important than its consistent application. Pick one style, preferably ours, use it consistently, and don't waste time arguing."

The book is organized into nine chapters, each covering a topic such as testing or debugging that usually requires an entire book on its own. The table of contents includes headings like "Test as You Write the Code," "Consistency and Idioms," "Strategies for Speed," "Other People's Bugs," and "Programs that Write Programs." I can't cover the whole book in this review, but I'll go into detail on two of my favorite chapters, "Performance" and "Notation."

Performance

The introduction of this chapter gives some very direct advice: "The first principle of optimization is don't." Computers are fast - go run lmbench on your desktop to update your sense of just how fast. For example, some system calls are now in the sub-microsecond range under Linux on modern hardware. Armchair optimization - the practice of making small theoretical optimizations as you code, at the expense of readability, portability, or correctness - is especially foolish in light of Donald Knuth's observation that 4% of the code typically accounts for more than half of the run-time of the program. Kernighan and Pike's first piece of advice is to write simple, clear, concise code, and optimize only when you have some tangible reason to do so.

The chapter begins with a real-world optimization problem: a spam-filter that worked well enough in testing but bogged down in production. The tangible reason for optimizing this program is that the mail queues were filling up with undelivered mail - a clear justification for optimization if there ever was one. The authors show the process they went through to optimize the spam-filter, step-by-step: profiling, analysis, a first attempt at optimization, re-factoring the problem, addition of pre-computation, and measurement of the results. This overview is welcome not only as a good programming war story but also because the overall flow of code optimization is non-obvious (otherwise, "How would you go about optimizing a program?" would not be such a common interview question).

The rest of the chapter talks about best practices for each step of optimization. The first topic is timing and profiling, as it should be. All too often, even good programmers measure performance by "feel" - if you don't believe me, search LKML. Sometimes no easy tool exists to measure what is being optimized, but it's still better to write some kind of measurement tool, no matter how clunky or approximate. Human perception and judgment are heavily influenced by preconceptions and the vast majority of theoretical optimizations have negligible effects on performance. A more subtle piece of advice is to turn performance results into pictures or graphs. Chris Mason's seekwatcher is an excellent example; it turns block traces into graphs - and even movies!

The authors cram a surprisingly complete demonstration of profiling into less than two pages, using prof on their spam-filter as the example. They show how to identify hot spots and do basic sanity checking on the results - e.g., match up the number of times a function call shows up in the profile with the number of iterations of the main loop. While they include some caveats on trusting profiling results, I wish they had spent some time on the design of profiling tools to show the kinds of biases and errors that so often make profiling results misleading. Perhaps it's because I work on systems software, but I've found that I really have to know the details of whether the profiler is using a periodic timer, hardware counters, includes time spent sleeping for IO in the kernel, how many events are dropped or missed, etc. A useful technique to demonstrate, and one in keeping with their minimalist, do-it-yourself philosophy, would be manually bisecting the code with timers to find hot spots when normal profiling tools fail.

The discussion on rewriting code goes beyond "find the top function and optimize it" - it also addresses eliminating calls to hot functions entirely and doing modest amounts of pre-computation. A fair portion of the section on code tuning has been superseded by improved compilers which can do, e.g., loop-unrolling automatically, but it still teaches valuable lessons about how to read code and understand its true cost and complexity.

Notation

The chapter on notation unfolds elegant, beautiful solutions one by one, turning normally painful problems into fun coding exercises. Each technique - little languages, special-purpose notation, programs that write programs, virtual machines - is accompanied by a concrete demonstration of how to implement the bare minimum of the technique to get the job done. The suggestion to "write a new language" seems absurd in the face of most day-to-day programming problems, but writing a very small, very specialized language can save the programmer much time and many bugs, even when replacing only a few hundred lines of conventional code. Their first example, after printf() format specifiers, is a notation for packing and unpacking network packets. I recently implemented this technique and can report that it worked beautifully, repaying the time I invested in it within days of completion.

Another exercise in minimalism is their demonstration of how to write a basic grep in around 100 lines of C, without relying on external libraries. Most of us will never need to re-implement regular expressions from scratch, but we may encounter a problem best solved by writing a small general purpose pattern matcher.

Another example demonstrates the power (and danger) of keeping a variety of scripting languages and data processing tools at your fingertips. The authors implement a crude text-only web browser with about 50 lines of Awk, Tcl, and Perl, again using only built-in language support and no external libraries or modules. Here as elsewhere, Kernighan and Pike refuse to make hard and fast assertions about the One True Scripting Language; they'd rather you used the right language for the right job. From the book:

These languages together are more powerful than any one of them in isolation. It's worth breaking the job into pieces if it enables you to profit from the right notation.

It can be argued that this approach is less justified now, given the modern plethora of scripting languages written specifically to address the limitations of earlier scripting languages. However, their argument still rings true for me, as someone who has never settled down into one scripting language. I have a decade of experience using a hodge-podge of random scripting languages, and when I do write in one scripting language, I end up spending a lot of time contorting language features to fit situations they were not designed for.

The section on virtual machines shows how to implement a minimal special purpose virtual machine (the Z-machine for Zork comes to mind immediately). The remaining sections cover programs that write programs, using macros to generate code (a common technique in Linux header files), and just a little taste of run-time code generation.

Summary

The Practice of Programming embodies its own principles: simplicity, clarity, generality. First published in 1999, it has aged well due to its focus on general principles of good programming rather than language-specific tricks and tips. The book has something to offer to programmers at all levels of experience; beginners will benefit most but experienced developers will appreciate the more advanced and subtle techniques in the later chapters. Of all the books on the Kernel Hacker's Bookshelf, this one should never be missing.