The goal of writing reusable software is as old as programming, and is so well ingrained into programmers that we take it as an axiom. It's accepted and non-controversial. Of course, we all strive to produce reusable software, but as I look back on 35 years of programming, I note with chagrin that practically none of the code I've written has been usable in another project. I will "reuse" code by copy-paste-modify, but that's cheating, and even that doesn't happen too often.

While it might seem that I have missed the boat somewhere, I've asked other long-time programmers, and many of them seem to have the same frustrating experience. This starts out, then, as journey into figuring out what went wrong. Why does that compressor I wrote before not work in another project? Why is my macro expander not reusable? Why did I chuck all my carefully written UTF code? Why do I write symbol tables over and over again? Why is that disassembler I wrote completely unusable in another application? Why do I keep reinventing the wheel?

My first thought was that I am now a better programmer than I was, and so rewrites are justified. That's just too easy a rationalization, though. A more careful examination turns up two troubling characteristics of my components:

The abstractions are leaky. I have failed to encapsulate the problem, and the component's dependencies have leaked out into the surrounding application, and the surrounding application's dependencies have leaked into the component.

The components are too specific. They are not about generic type T , but about specific type X . The implementation contains and takes advantage of specific information about X .

Things need some rethinking. Let's go back to first principles, and think about what a component is.

What is a Component?

One aspect of a component is that it is reusable software. But being reusable isn't enough. In C, for example, there are lots of available reusable libraries for all kinds of things. But they aren't necessarily components. A component has a predefined interface that it conforms to. Most C libraries roll their own unique interfaces, and the developer wishing to use them has to build custom scaffolding for each one.What sort of defined interface could possibly be general enough to work for a wide range of software components?

Paring a program down to its fundamentals, they are:

Read input Process that input Write output

Even if a program itself doesn't fit that model, it is usually composed of subsystems that conform to it.

Let's rewrite this in pseudo code:

source => algorithm => sink

Chaining things together makes sense:

source => algorithm1 => algorithm2 => sink

and so on. This looks awfully familiar. It looks like the UNIX command line  the files and filters model. Files form the sources and sinks, and the algorithms are called "filters." The UNIX command line is a brilliant innovation where the components are files and filter programs. Each filter program does one smallish job. To get things done, the command line user strings them along, the output of each feeding into the input of the next, and the final sink is a file or the console. (There are endless examples of how successful this component style is.)

Because UNIX is largely written in C, this model has found its way into common C usage, and is the ubiquitous file interface. Component programming in C relies on being able to express sources and sinks as files, and use algorithms that operate on the file interfaces. This is so successful that there are often pseudo-file systems.

Of course there are limitations. The approach shows the usual characteristics of something that evolved while retaining backward compatibility, rather than being designed up front, as anyone trying to figure out ioctl can attest. Worse, it views data as a stream of bytes, and that's awkward for a lot of uses, as certain algorithms imply specific supporting data structures. I'll get into this shortly.

Looking Back At My Code

With these thoughts in mind, I look back at all my failures at reusable code and notice something else: It looks nothing at all like: source → algorithm → sink.

In fact, it looks like a bunch of nested loops. The source data enters at the top, and gets swirled around and around in ever smaller and tighter loops, and leaves via the sink in the center of that maelstrom. For example, here's a simple program in pseudo-code that reads a text file from stdin line by line, tests to see if each line matches a pattern, and writes those lines out to stdout :

void main(string[] args) { string pattern = args[1]; while (!feof(stdin)) { string line = getLine(stdin); if (match(pattern, line)) writeLine(stdout, line); } }

No wonder it is not reusable. There's no way to separate out the looping part from the source and that darned sink in the middle of it all can hardly be composed with other loops and sinks. Also bad is that it isn't immediately obvious what it does. Compare it with this pseudo-code version:

void main(string[] args) { string pattern = args[1]; stdin => byLines => match(pattern) => stdout; }

The loops have vanished, along with the intermediate variable line that was used to glue things together. The code is easier to reason about (and hence, more likely to be correct), shorter, and is made up of composable items.

The Next Design

In the 1980s, along came C++. It brought OOP to the C world, but somehow C++ OOP did not produce better component programming. (There are many successful C++ OOP libraries that have stood the test of time, and can be viewed as component libraries, but as far as components as discussed here, I don't think there was much improvement.) C++ did update the C file design and generalized it a bit with iostream s. Things started looking like our desired source → algorithm → sink with iostream's >> operator . But that just didn't seem to catch on, either, and didn't see much use outside of reading and writing files. I think the iostream style could have been generalized and expanded beyond mere serialization, but that never happened.

In the 1990s, Alexander Stepanov introduced the C++ Standard Template Library (STL). At last, components could be more than files; and algorithms could be generic and automatically adapt themselves to the data types being processed. Components followed a common interface. They could be composed, and could be compiled to highly efficient code. But even that didn't quite get us where we want to be:

The syntax doesn't resemble source → algorithm → sink. It keeps winding up looking like loops

for (i = L.begin(); i != L.end(); ++i) ... do something with *i ...

just like my old C code.

There is some improvement in getting rid of loops with std::for_each() , std::transform() , std::copy() , std::find() , etc., but there are composition problems with them. For example, std::find() returns an iterator, but where is the other end of the range? The components don't fit together easily. Also, the STL's design was marred by the absence of lambda functions in C++ at the time. This rendered many STL-based idioms awkward (to the extent that STL containers tend to enjoy much more usage than STL's higher-order algorithms).

Iterators are an abstraction of pointers. But pointers don't know where the beginning or end of their data is, so when using iterators one finds himself passing around pairs of iterators. This is exactly analogous to C's passing arrays around as two separate items  a pointer to the start of the array, and the dimension of the array. I have characterized this as C's greatest mistake, and it has unfortunately been propagated into C++ and the STL design.) Needing to pass around two disjoint pieces of data to embody an abstraction (as opposed to raising the abstraction to first-class status) hurts composition and forces designs into pedestrian manipulations of fragments. A better component design would combine the two.