If you're a programmer this has happened to you. You've built or known a project that starts out with a well-defined purpose and a straightforward mapping from purpose to structure in a few sub-systems.

But it can't stand still; changes continue to roll in.

Some of the changes touch multiple sub-systems.

Each change is coherent in the mind of its maker. But once it's made, it diffuses into the system as a whole.

Soon the once-elegant design has turned into a patchwork-quilt of changes. Requirements shift, parts get repurposed. Connections proliferate.

Veterans who watched this evolution can keep it mostly straight in their heads. But newcomers see only the patchwork quilt. They can't tell where one feature begins and another ends.

What if features could stay separate as they're added, so that newcomers could see a cleaned-up history of the process?



Solution

Start with a simple program.

// Includes // End Includes // Types // End Types // Globals // End Globals int main(int argc, char* argv[]) { if (argc > 1) { // Commandline Options } return 0; // End Main }

It doesn't do much, it's just a skeleton, but it's a legal program and it builds and runs. Now features can insert code at well-defined points in this program:

:(after "Includes") #include <list> #include <string>

The first line above is a directive to find an existing line matching the pattern "Includes", and insert the following lines after it. You can insert arbitrary fragments at arbitrary points, even inside functions:

:(after "Commandline Options") if (arg == "test")) { run_tests(); return 0; }

A simple tool will splice them into the right places:

int main(int argc, char* argv[]) { if (argc > 1) { // Commandline Options if (arg == "test")) { run_tests(); return 0; } } return 0; // End Main }

With the help of directives, we can now organize a program as a series of self-contained layers, each containing all the code for a specific feature, regardless of where it needs to go. A feature might be anything from a test harness to a garbage collector. Directives are almost too powerful, but used tastefully they can help decompose programs into self-contained layers.

We aren't just reordering bits here. There's a new constraint that has no counterpart in current programming practice — remove a feature, and everything before it should build and pass its tests:

$ build_and_test_until 000organization $ build_and_test_until 030 $ build_and_test_until 035 ...

Being able to build simpler versions of a program is incredibly useful to newcomers to your project. Playing with simpler versions can help gain fluency with the global architecture of your project. Newcomers can learn and test themselves in stages, starting with the simplest versions and gradually introducing the complexities of peripheral features. Try it for yourself, and tell me what you think!

$ git clone http://github.com/akkartik/wart $ cd wart/literate $ make # builds and runs tests; needs gcc and unix

This example is in C, but the idea applies to any language. Free yourself of the constraints imposed by your compiler, and write code for humans to read.

Rationale and history

You might have noticed by now that directives are just a more readable way to express patches. This whole idea is inspired by version control. For some time now I've been telling people about a little hack I use for understanding strange codebases. When I am interested in learning about a new codebase, I skim its git log until I find a tantalizing tag or commit message, like say "version 1". Invariably version 1 is far simpler than the current version. The core use cases haven't yet been swamped by an avalanche of features and exceptions. The skeleton of the architecture is more apparent. So I focus on this snapshot, and learn all I can. Once I understand the skeleton I can play changes forward and watch the codebase fill out in a controlled manner.

But relying on the log for this information is not ideal, because the log is immutable. Often a codebase might have had a major reorg around version 3, so that reading version 1 ends up being misleading, doing more harm than good. My new organization lays out this time axis explicitly in a coherent way, and makes it available to change. The sequence of features is available to be cleaned up as the architecture changes and the codebase evolves.

Codebases have three kinds of changes: bugfixes, new features and reorganizations. In my new organization bugfixes modify a single layer, new features are added in new layers, and reorganizations might make more radical changes, including changing the core skeleton. The hope is that we can now attempt more ambitious reorganizations than traditional refactoring, especially in combination with tracing tests. I'll elaborate on that interaction in the future.