I don’t look at comments, they’re always out-of-date. – Yogi Berra Co-worker

Good coding discipline is important, and we have many rules of thumb to guide us as we design and write or programs. We should also remember that blindly following your heuristics is unwise; there is no silver bullet.

The ends usurped by the means

I’d like to draw your focus towards the notion of self-documenting code. I often hear programmers telling others how hard they try to make their code self-documenting. I think we all know at least one occasion where we or someone we know has, with much pride, got rid of several comments by careful variable naming. But rarely have I heard someone mention making an effort to produce easy to understand code.

Isn’t it the same thing, you may ask? They may be correlated, but you can have one without the other. Poorly-designed software can be full of code which is self-documenting when taken in isolation, but the way modules are combined and the choice of algorithms to accomplish each task can be utterly confusing in a macroscopic view of the program. Sometimes you need to know why the code does what it does, but the self-documenting approach only tells you how. Conversely, literate programs are mostly documentation and can be very easy to understand.

Reach for the stars, fall in the gutter

Most commonly, code is none of these things, and often ripe for improvement. During some code review, we looked at the following block of string processing code which is responsible for grouping together strings recursively by the bytes within each character until it can no longer be subdivided, then processing each group found. (I expanded the example from the original posting date to give some more context here.)

void groupStrings ( pair < int , int > currentGroup, bool upperBitsCleared, int currentBits ) { vector < pair < int , int >> stringGroups ; // code to populate stringGroups elided firstElement = 0 ; if ( upperBitsCleared && currentBits == ( MAX_BITS - 1 ) ) { // this implies we have reached the end of the strings in the first list, // which is our base case finishProcessing ( stringGroups [ 0 ] ) ; firstElement = 1 ; } for ( int i = firstElement ; i < strings. size ( ) ; ++ i ) { groupStrings ( stringGroups [ i ] , upperBitsCleared && i == 0 , i ) ; } void groupStrings(pair<int, int> currentGroup, bool upperBitsCleared, int currentBits) { vector<pair<int, int>> stringGroups; // code to populate stringGroups elided firstElement = 0; if (upperBitsCleared && currentBits == (MAX_BITS - 1)) { // this implies we have reached the end of the strings in the first list, // which is our base case finishProcessing(stringGroups[0]); firstElement = 1; } for (int i = firstElement; i < strings.size (); ++i) { groupStrings(stringGroups[i], upperBitsCleared && i == 0, i); }

The proponents of self-documenting code among me seized on the condition and associated comment. “Can we make this more readable?” Their suggested outcome was to extract the conditional expression into a method:

void groupStrings ( pair < int , int > currentGroup, bool upperBitsCleared, int currentBits ) { vector < pair < int , int >> stringGroups ; // code to populate stringGroups elided firstElement = 0 ; if ( isEndOfString ( upperBitsCleared, currentBits, MAX_BITS ) ) { finishProcessing ( stringGroups [ 0 ] ) ; firstElement = 1 ; } for ( int i = firstElement ; i < strings. size ( ) ; ++ i ) { groupStrings ( stringGroups [ i ] , upperBitsCleared && i == 0 , i ) ; } void groupStrings(pair<int, int> currentGroup, bool upperBitsCleared, int currentBits) { vector<pair<int, int>> stringGroups; // code to populate stringGroups elided firstElement = 0; if (isEndOfString(upperBitsCleared, currentBits, MAX_BITS)) { finishProcessing(stringGroups[0]); firstElement = 1; } for (int i = firstElement; i < strings.size (); ++i) { groupStrings(stringGroups[i], upperBitsCleared && i == 0, i); }

However, the new name is unspecific. Only the first element of the array ‘stringGroups’ contains strings that have no more characters to process; it is not true for the entire array, or even any of the input arguments! What did we really gain from this? According to the teachings of ‘Uncle Bob’, we avoided failure. In his book Clean Code, he says,

The proper use of comments is to compensate for our failure to express yourself in code. Note that I used the word failure. I meant it. Comments are always failures.

The implication in the rest of the chapter is that anything you can do to remove that comment is automatically an improvement, with exceptions (and this isn’t one of his exceptions). However, what happens if you don’t have the ability to accurately judge whether you really expressed that comment in your code?

It’s easy to tell yourself something is obvious and doesn’t need to be explained once you’ve already gained that understanding. Of course we know ‘isTerminal’ relates to the first element, and that this code handles the base case of our recursive algorithm. But how about six months from now? After a dozen bug fixes and feature requests? When code have been added or changed, is it still going to be that obvious?

Another of Uncle Bob’s teachings is the Boy Scout rule: “Always check a module in cleaner than when you checked it out.” The point of the rule is to encourage gradual refactoring of the code into an easier-to-understand form. However, this doesn’t mean any effort is positive: if you vacuum a carpet, but wear muddy boots doing it, only the gnarliest of dust bunnies can justify the damage caused. Even if you believe you’re improving the code, you cannot claim to follow the rule if you lack the perspective to evaluate that improvement.

How odd I can have all this inside me and to you it’s just words

The paraphrase at the top of this essay was said at this code review, and is the reason why I wrote this at all. I concede that nobody forces you to update comments when the code surrounding it changes. Yet comments and identifiers have more in common than you think.

The biggest criticism of comments – that their content does not match reality – is equally true of identifiers. The compiler cannot check the syntax of language used in your identifier name. It cannot warn you when you are storing the end of the range in a variable named ‘start’, nor the width of an object in a field named ‘height’. It cannot tell you when you changed the behaviour of a function, and its name is no longer accurate.

Worse, you are further constricted by the length you can reasonably give an identifier. There’s a reason why nobody writes essays on Twitter: not every concept can be expressed in 140 characters or less.

If one of your axioms is that comments are worse than any code you could write, you are making a mistake. The only alternative to comments is using appropriate variable and function names. It is said that there are two hard problems in computer science, cache invalidation and naming things; it takes a lot of hubris to believe you can name things as well as you can describe them.

Regardless, you must try to name things as well as you can, but be careful. The capacity of a bad name to confuse and mislead the reader is boundless. A bad name unexplained is worse than no name at all.

Excuse me, I believe you have my stapler

This isn’t even considering the benefits of inlining code at its only call site rather than creating an elaborate tree of indirections throughout the source. Unfortunately this is a tough sell, as self-documenting code’s only means of describing blocks of code is through named functions. Take that away and what is left? But even to try and use functions this way is a mistake in my opinion.

Organising everything into small chunks of code works well when your code is perfectly designed and the problem it is trying to solve can be reduced to a series of independent processes. Often, you can ensure the code is reusable, or maps a set of inputs to a set of outputs in an easily-understandable way. It works because the code is simple.

In most real projects, we have to endure incomplete design or unmanaged inherent complexity in the problem domain. For these, it’s a suboptimal strategy unless you are simultaneously removing that complexity. Not all functions extracted from a block of code are fully independent from the code you extracted it from, usually because it expects its inputs to be in a particular state. Extracting that function does not remove complexity from that code. Instead it spreads it around, making it harder to understand why the function does what it does, and allowing other developers to use a function in the wrong context, potentially introducing bugs.

If your goal is to organise the code, commenting does this without removing the spatial context of the code it is complected with. Save functions for eminently reusable code. Functions imply reusability and the movement from one level of abstraction to another. Hijacking them for the purpose of code organisation is a misuse of structure. Structure is important: that’s why the GOTO wars were fought.

Here is an egregrious example from Uncle Bob, where a 10-line function does something more than its name suggests. Rather than documenting its behaviour, it is exploded into an English-language DSL that buries the unique feature of the class not described in its name – replacing a symbol at most once – three levels deep in the call stack. Local variables become class members, and since any method can alter a class member, we have increased the overall complexity of the original code. If trying to document your code makes it more complex, you are probably doing something wrong.

I hope most programmers are not as extreme as him, or at least not influenced too much. It is at least illustrative of the problem that self-documenting code is no replacement for good documentation. In the instance where you are trying to replace a comment, you are merely going from one form of documentation to another – and in a medium where it is harder to express yourself. This is not to say we shouldn’t try to make our code explain itself, but that it is only one of many means to the true goal: to write code that is easily understood and remains so for years to come.