Like every other working programmer, the past couple years haven’t had as much “hacking on fun stuff” as I would have liked. So I’ve taken some downtime, and as a part of that, wanted to improve my development tooling. That led to spending a Saturday whipping up Workspace, which is entirely written in Bash.

Now, the last time I did any serious shell programming was when dinosaurs still ruled the Earth, and the interesting thing about touching a language that you haven’t worked with in years – and looking at your old code – is that you get a very real sense of how much you’ve learned.

Fourteen years ago, I had bought heavily into the idea that small code is good code. Less code means less bugs. Don’t Repeat Yourself means way less bugs. So I kept things very tight, and very dry. And also nearly impossible to read. Because, at the time, I didn’t understand one of the most fundamental rules of programming:

Programs should be written for people to read, and only incidentally for machines to execute. Structure and Interpretation of Computer Programs

I’m not the only one that has learned this the hard way, so I’m going to share how I learned to write code that my peers could make sense of, and in some cases, outright read.

Thinking in baby steps.

A program is, in essence, a set of instructions for solving a problem. The computer can deal with the whole thing at once, but since I have a human brain, I have to break those instructions down into manageable steps that are mutually understood by both the computer and myself.

Bugs happen when that understanding stops being mutual – when the model we have inside our heads of what will happen turns out to be different to what the program actually does.

Most of the time, this was because I had tried to bite off too much of the problem at once, and my brain couldn’t manage the complexity. Too many facts to keep on my brain-stack at any one point in time. So I learned to start thinking of programs in baby steps.

That’s what small code is really about.

Limiting a method or class to an exact number of lines isn’t the goal. Those are just forcing functions, which push us into breaking down the problem we are trying to solve into bite-size chunks that our brains can handle.

This started me down the road of writing a lot of small methods like this:

function in_file { local file = " $1 " local content = " $2 " grep -q " $content " " $file " 2>/dev/null }

Which got used in bigger methods:

function save_setting { local file = " $1 " local key = "_setting_ $2 " local value = " $3 " local new_setting = "export $key = \" $value \" " if in_file " $file " " $key \= " ; then replace_in_file " $file " " $key \= " " $new_setting " else append_to_file " $file " " $new_setting " fi }

in_file could easily be replaced with a single line of code, but that’s not the point. grep -q <filename> &>/dev/null certainly works, but in_file <filename> shows intent.

Similarly, I haven’t provided the source for replace_in_file or append_to_file , but you can probably figure out what they do all the same.

This pattern continues up the call chain:

function run_set { local key = " $1 " local value = " $2 " is_valid_setting " $key " || die "unknown setting: $key " save_setting " $SETTINGS_FILE " " $key " " $value " }

Lots of small methods, each with one job, calling other small methods that each have one job. This technique is known as the Single Responsibility Principle, and it’s one of the most important concepts in software engineering.

Sometimes it’s hard to know if you’ve given too many responsibilities to a method or a class. As a rule of thumb, if you can’t describe what it does without needing a conjunction (“and” or “or”), then you’re trying to do too much at once.

You can’t prove murder without intent.

The beauty of this approach is that at every step of the way, the code is small, readable, and exceedingly easy to understand. It’s also inherently DRY. Functions based on intent, rather than on the characters between the squiggly brackets, are reusable whenever you have the same intent.

In this particular program, is_valid_setting gets used in several places.

But calling a function from one place is fine. Both save_setting and in_file only get called in one place, but breaking these out into separate methods – each with one job – made the code both more readable, and more reasonable.

This style of coding has enabled me to write large chunks of nearly bug-free code in one shot. It’s a pretty amazing feeling to see entire sections of your test suite go from red to green. Each step is so small that it’s easy to keep my brain in sync with what the computer is actually going to do when the program runs, which means I don’t spend much time stuck in the test-fix-test loop.

(Incidentally, for Ruby or Python code, it’s generally recommended to keep methods to somewhere in the range of 5-8 lines, which is roughly the maximum number of facts that your brain can hold in working memory.)

This isn’t about me.

If the computer doesn’t run it, it’s broken. If people can’t read it, it

will be broken. Soon.

Charlie Martin

I know that at some point of time, some poor bastard is going to have to work on my code. I find that it’s best to assume that this individual is a psychopath with severe anger-management issues. And an ax. Probably also a Sherman tank. And my home address.

Hell, let’s just assume it’s the Hulk.

I want to work in such a way that the Hulk has a greater chance of understanding and successfully modifying my code without having to track me down. This gives me a greater chance of surviving to see another birthday.

It’s very win-win.

This means that I need to be as clear as possible with the names of functions and variables.

Now, of course, naming functions and variables is subjective. There’s no objective standard, and never will be. I’m writing code for human beings to read, and there is no one perfectly objective way for any two given people to communicate.

So there’s no one correct way to name things. In the code above, in_file could just has easily have been called file_contains or file_has_string , or probably half a dozen other names. The point is not to find the correct name, but merely to try and express, in as clear a fashion as possible, what my mental model of the solution was.

Even if I only have an 80% chance of getting the Hulk to understand what I was thinking, then that means I’ve got an 80% chance of seeing another birthday cake.

Unfortunately, I can’t always choose the best name. It’s just not possible. So even if I can’t be clear, I can at least try to be consistent. Variables that hold the same data have the same name throughout the codebase. That way, even though the Hulk has to burn some time figuring out that a frobnitz holds user-specified options, he only has to figure that out once for the entire codebase. Which means another all-expenses trip ‘round the sun for me!

And when I’m feeling adventurous, I’ll even call the Hulk and ask him to come over and tell me whether or not what I’ve written makes sense. Better yet, we’ll pair-program so he can stay not-angry with my code in realtime, although it’s a nightmare to squeeze him into an office chair.

Comments are where you put the crazy.

Some years ago, I decided to finally try writing my code with no comments, and it was a massive leap forward in learning to write readable code. When you can’t rely on the comment-crutch anymore, you have no choice but to get very, very good at coding for readability.

I used to use comments to explain what the code was doing or how the code was doing it, but it’s too easy for those comments to go out-of-date every time the code changes, and things are clear enough so that the comments themselves just get in the way.

That said, the “never write comments, ever!” people have missed something of colossal importance, because there is still a time and place for comments: to answer the question: what the hell is going on here?.

Comments are the place to document insanity.

Let’s say that, hypothetically, I need to parse some XML data, and that I needed the entities (things like “&”) left alone, so that I could process them at a later stage. And that unfortunately, the XML library had other ideas:

# WORKAROUND: replace_entities is ignored Apache Xerces, and # still produces errors with libxml, so we're just going to # butcher things with regex substitution on the way in, and # de-butcher them on the way out. def self . escape_entities ( string ) string . gsub ( /\&([\w\-]+)\;/ , '$$\1$$' ) end def self . unescape_entities ( string ) string . gsub ( /\$\$([\w\-]+)\$\$/ ) { "& #{ $1 } ;" } end

Sure, it might be possible to cook up a set of method names to explain this, but I’m willing to bet it would be massively less clear than the above comment.

Comments are also useful for those times that I’ve implemented a clever algorithm, or deployed a new technique that even I don’t fully understand. I’ve found that a comment with a link to the algorithm, or to a blog post explaining the clever hack, is a great way to prevent myself from suffering from an acute overdose of Hulk Smash.

Okay, so maybe it’s a little bit about me.

There is one other programmer, besides the Hulk, that is damn happy I started writing code for readability: Future Me.

In working on one of my pet projects recently, I needed to dust off some Coffeescript that I had laying around from about four years ago, where I had figured out how to handle some cross-browser nastiness related to text selections.

Honestly, the code sucked. But it was very easy to understand, and readable to the point where I could clean it up and integrate it with a new project with very little pain.

Past Me saved Current Me a lot of headache by putting readability first.

Seriously, that’s all there is.

Thinking in baby steps, doing one thing at a time, naming things based on intent, and using comments as a prison for the terminally insane.

Simple concepts, hardly worth writing about, and yet, they’ve completely changed the way that I think about programming – and helped me find joy in writing code for something as annoying as the Bourne shell.