One of the most interesting maxims of OO (to me) is the Open/Closed Principle. It states that modules should be open for extension but closed to modification. Bertrand Meyer coined the term and, although he was thinking of inheritance as a reuse mechanism, he hit upon something central in software development. It turns out that regardless of whether we are doing OO or not, we know we are programming well when we find ourselves not modifying our existing code as much. Code should grow by addition rather than mutation. Need a new feature? Introduce some new classes. If you find yourself adding all of the code inline in existing methods and classes, know that down that path madness lies. If you keep your classes and methods small and focused and you avoid tangling responsibilities, closure is often an emergent effect.

The Open/Closed Principle has been around for a while. Many developers attempt to use it consciously, and others do things which tend to promote closure without thinking about it. What is the net effect on code?

One of the nice things about having a version control history is that we can query it.

Here is a graph of the per file commit frequencies of source files on a random open source project. Every point on the x axis is a file and the its height on y is the number of times its been modified. The frequencies have been sorted in ascending order to make the picture sane:

One thing that's obvious is that there are a lot of files which have only been modified a very small number of times. There are also some files which get an extremely large number of commits. It's worth taking a look at the latter files and wondering why. In some cases, it's because the files are configurational - they consist of things like data-based settings in code that change frequently. At other times, they are havens of run-away complexity - snarls of logic that seem to accrete more and more code. Clearly the right side of the curve is where you want to expend some refactoring effort.

Another thing that is nice about this view of code is that you can get a sense of what your payback will be for your refactoring. The area under the curve is the total number of commits for the project. If we inch backward from the right, we can calculate something like the 5% mark - the files which you are likely to touch in one out of every twenty modifications. This may not seem like much, but it is over the duration of a project, especially when you factor in the effort of continually trying to understand that code. I have no doubt that there is a decent formula that we can use to calculate the likely savings from refactoring based on the ratio of adds to modifications, and the commits in a code base.

Side note:

It's relatively easy to make these diagrams yourself. grep the log of your VCS for lines that depict adds and modifications. Strip everything except the file names, sort it, run it through 'uniq -c', sort it again, and then plot the first column.