Hexagonal Architecture. Source of the image unknown.

Skim the dirt off your codebase!

How to clean up a 3+ years old project and live happily ever after.

We’ve all been there: a software project was lucky enough to live on for years after the initial development, however, somewhere along the line it started to get crippled by patches and workarounds that were only meant to be temporary, but nobody ever got to clean up.

As a contractor, every few months I move to a different company. I’ve worked with start-ups and well-established companies alike. I’ve once joined a project early in the development process, left the team after shipping the app, and found myself working on that same project almost three years later. That was quite an experience! The only code I wrote left in it was a bunch of blank lines scattered across a few files. I’ve worked on projects at all stages of their lifetime. Nevertheless, I often find the same problems. Luckily, the solution also stays the same.

The problem I find most often is that, as projects grow in complexity and feature set, they also become harder to change. The time required for developing new features keeps increasing and breaking stuff becomes easier and easier. Knowledge of distinct areas of the codebase tends to stay in the head of one person or of a small group. Other developers on the team usually don’t bother learning how something works until they have to, and as the people leave, that knowledge gets lost forever.

The number one enemy of good code is code itself. It’s the growing net of dependencies, the lack of isolation, the lack of documentation. It’s also the tempting voice inside your head saying that’s OK if you just add one more method to an already fat class. Because if you do it, you’ll be done already, your feature or bug fix will be complete, your manager happy and everybody will cheer at your ability to move fast (and break things that will come chasing you back in a day or so, just for you to patch some more).

Perhaps you promise yourself you’ll go back and do things properly. However, before you know it, the whole project looks like an indecent mess and doing the simplest things turns into the most difficult job ever.

Disclaimer: what I’m saying is nothing new. People have written hundreds of pages about Working with Legacy Code, Clean Code or Refactoring. However, it never seems to be enough. The same problems keep resurfacing. Therefore, I’m just making my part in trying to get some principles burned in people’s mind, as well as in my own.

Software programming is more like gardening than engineering

Would you do the same if you were working on a building, a bridge, or trying to launch a rocket into space with people on it? I hope not! I know some do, though. Just look at what happened in Italy a few months ago! Or look at the poor quality of homes in the UK!

For a long time, programming software has been compared to constructing a building. Perhaps it’s a good comparison, or maybe not. Some elements make programming look like engineering: you have to analyze a problem, come up with a good solution, make sure you take the right measures, build it, test it and hope it won’t fall on itself. However, it’s so cheap to change things in software and so unpredictable sometimes that you know it’s going to happen. As Andy Hunt, author of The Pragmatic Programmer, puts it:

There is a persistent notion in a lot of literature that software development should be like engineering. […] It doesn’t work that way with software. […] Software is much more like gardening. You do plan. You plan you’re going to make a plot this big. You’re going to prepare the soil. You bring in a landscape person who says to put the big plants in the back and short ones in the front. You’ve got a great plan, a whole design. But when you plant the bulbs and the seeds, what happens? The garden doesn’t quite come up the way you drew the picture. This plant gets a lot bigger than you thought it would. You’ve got to prune it. You’ve got to split it. You’ve got to move it around the garden. This big plant in the back died. You’ve got to dig it up and throw it into the compost pile. These colors ended up not looking like they did on the package. They don’t look good next to each other. You’ve got to transplant this one over to the other side of the garden. - Andy Hunt, author of The Pragmatic Programmer.

Fixing rotten codebases

What can we do when we find ourselves working on a nasty codebase? Perhaps we’ve just joined a team in need of help, or maybe we lost control of the ship, and we’re struggling to get it on track.

The only solution for that problem I’ve found is to push the crap to the edges first, then remove it altogether. What that means is that we have to make an effort to isolate parts of the codebase in little black boxes that each serve one, and only one, purpose. Small, contained blocks of code are easier to understand and to change. We can test them independently from the rest of the application and refactor them as needed. In short, we have to:

Isolate Test Refactor

Divide and conquer

The first step is to identify small, contained pieces of functionality and extract them into their own containers. I’ll call them modules.

The goal is to create boundaries of interaction between the main app and the self-contained features. By doing so, we’ll have code that’s easier to understand and maintain. Or at least, we can take a piece of code nobody understands anymore and minimize and redefine how the main app interacts with it. Taking those infected areas of the codebase and isolate them also avoids spreading the disease even more.

When separating a module, it’s best to create a new package/framework to contain it so that the old code cannot access any of it anymore. Once confined to its own universe, we can define an interface through which the rest of the world will interact with the module. That interface will be part of the package, and we’ll do the necessary conversions within the module. This way, when we go change the implementation of the module for any reason — e.g. to clean it up — no other code will be affected.

Initially, you’ll have to face the wall of dependencies, but that is the problem you’re solving in fact! Those dependencies have to either be injected into the module or the module has to depend on some other module to get them.

A perfect way to achieve that is avoiding passing the same type of data around. Use protocols and DTOs as much as possible, instead. Doing so will make your code even easier to understand and maintain. A small drawback is that you may need to write a bit of code to conform to those protocols or to convert your data models into (and from) this intermediate objects. It’s OK to repeat yourself a little here. The advantage is just too big to ignore.

For example, modules won’t have to depend on others to know about the types they need. They can just define what they need. In your main app, you can extend types to conform to a protocol or convert them to other types. Your module’s code will also be way easier to test, as defining stub and mock objects for your tests won’t require a burden of dependencies and state maintenance.

Push the crap to the edges first, then remove it!

As a final note, what you extract doesn’t necessarily have to be something poorly written or incomprehensible, although the greatest benefit comes from isolating those parts of the code. Pulling out well-written pieces will help over the long run too.

Test

With an interface for the module in place, we’re able to test it in isolation. Hence, we’ll be able to ensure a consistent, functional behavior, and automate the process using a Continuous Integration system.

Tests will serve both as a way to clarify the scope of the module and prevent regression. They’ll also be a nice documentation of the expected behavior when someone new has to use that code. With the revitalized understanding of what the extracted segment has to do, we’re able to identify more areas that we can move out of the module or prune altogether.

Testing is an essential part of the process and shouldn’t be ignored. It’s the base upon which we can stand when refactoring the module. With tests, we can be confident that we’ll catch regressions. Dependencies and weak spots will surface as you’ll find places that aren’t easily tested. Life will be good.

Confidence is key. As programmers, we’re afraid of change when we lack confidence in our ability to catch regressions. We’re scared of breaking things, and we should be. Having a set of automated tests running across the entire codebase at each commit can free us from the burden of fear.

Refactor

The final step is refactoring. Once everything is isolated and tested, you’ll find that the codebase will already be in a way better shape than it was before. To isolate the code something had to change. Blocks had to be rearranged, and communication cleaned up behind a new interface. To test the module other parts had to become more testable.

At this point, you’ll have to take a significant decision: to refactor or not to refactor. Some of the modules will be in perfect shape. Some will still be black boxes you didn’t dare touching. I know. It’s OK. You didn’t test all of it because you were afraid of changing and breaking things. It’s not ideal, but we can live with it. Isolating that component was already a great achievement. Other requirements were piling up in your backlog, and you were getting nervous. You didn’t want to disappoint your manager.

Here’s the thing, though. You’ve come a long way, and the light is just around the corner. You can see it already. If you’re going to get to make your codebase great again, you’ll have to endure. It will take time; you’ll have to balance the refactoring work with the time your company requires you to devote to new features and enhancements. You’re in a position that allows for that. However, the devil is in the detail.

To prevent the codebase to get crippled again, you’ll have to keep nurturing it. There’s no way around this. Trust me. I’ve been there. Time and time again I’ve avoided testing or refactoring because of time, pressure, overconfidence, boredom. It never paid off. I’ve always been bitten by my lack of endurance. It’s not fun. Never again. Wouldn’t you like to keep working on a healthy codebase, where everything works as expected and very few bugs are ever introduced? If your answer is ‘yes,’ then you know what you have to do. You’ll have to test and refactor those crappy parts of the codebase that are still left.

Maintaining a healthy garden

Even once you’ve isolated the best and worst part of your codebase, wrapped them in a reasonable suite of automated tests, and refactored everything, you’ll still be in danger. There’s no stopping. The job isn’t finished. Sure, your app will be in its best shape ever. However, it’s not going to last long if you don’t keep caring. As I mentioned in an old article, according to Michael Feathers, author of Working Effectively with Legacy Code, “legacy code is simply code without tests.”

Legacy code is simply code without tests

Test Driven Development, the practice of writing automated tests and code in short cycles — we’re talking seconds short — with the tests being written before the “actual code,” comes very handy to maintain our garden in a healthy state.

An important note is that you should try to avoid as much as possible to have modules depend on one another. Again, use protocols and DTOs to avoid spreading dependencies! Those constructs are invaluable friends.

Where to go from here

Congratulations for making it this far! I know this mustn’t be the easiest or funniest piece to digest. As I mentioned towards the start of this article, what I’m saying is nothing new. It’s all pretty old stuff. I stand on the shoulders of giants. We all do. Therefore, I highly encourage you to take some time to read through the original article describing the concept known as The Hexagonal Architecture, which emphasizes the importance of isolating the core domain of your app from external dependencies and interact with them using ports and adapters. When I mentioned that you should define clear interface layers through which the main app and the tests communicate with a module, I was referring exactly to that type of architecture. You may also want to have a look at a similar concept known as Self-Contained Systems Architecture. Good luck with your refactoring!