This is part 2 of my small series about cleaning up large code bases. After we get our team together and all are determined to clean up that mess, where do we start? Time for some planned refactoring!

Usually there are two possible actions we can take if we want to clean up our application code: Rewriting and refactoring. If the code base is large enough rewriting is not an option. We may be wiser having written our application once, but writing it a second time still will take its time.

Remember that we’re talking about millions of lines of code here. In addition to take time, writing from scratch also will introduce bugs. Users do not tend to tolerate bugs popping up in parts of an application that have been stable for years.

So there is basically only one option left: Refactor the code.

Refactoring large applications: Where and how to start

Let me quote Wikipedia about refactoring:

Code refactoring is the process of restructuring existing computer code […] without changing its external behavior.

“Great” some might say, “so you want to work without getting anything visible done? Who’s going to pay for that?” There’s a point. While refactoring, we are not adding functionality. To justify our work (and to get paid), we need to add value otherwise while refactoring.

Planned refactoring

Usually the argument for refactoring is that making the code more readable and maintainable pays off in the future. We still have to make sure that the time we invest in our code actually does pay off. If we don’t do the right refactoring in the right place, we can burn days, weeks, even months without having a measurable impact on the overall maintainability of our application. Therefore it is crucial that we not just open our IDE and hack away at the first file we find. We have to plan where to start and what to focus on.

Obviously, if code is good enough, we should not start to refactor it. “Good enough” can be a rather low bar. We need to find the areas that are written really badly. But even if code is an unmaintainable, disgusting mess, we need not necessarily improve it. If it just works and does never need to be changed, just leave it alone. Maintainability is only worth investing time into if someone actually has to maintain that specific piece of code.

Since we’re at the start of our mission to clean up the code base, we still might have to convince management that they get something for the time we invest. Go for the low hanging fruit. If there is bad code that you can’t figure out how it works, put it aside. If it’s impossible to test properly, leave it for now. In a large code base of years of legacy code rot you might find easier targets to improve.

Find the hot spots

To get the most value out of your refactoring, invest the time in the areas that benefit most from more maintainability. Code tends to have hot spots of activity, where most of the work happens during your usual development.

Finding those hot spots can be done simply by asking your coworkers. Everyone knows a few places in the code they frequently have to touch and that are a pure mess. Often the files and classes that have to altered frequently also are the ones that really need improvements. A file that has endured hundreds or thousands of edits has had lots more of opportunities to get ugly than a file with only five commits.

If you get too many hints where you could start or want to have a more scientific approach to finding the hot spots, there’s good news. Adam Tornhill has written his book “Your Code as a Crime Scene” on exactly that topic. If you google for that title you’ll also find his conference talks on Youtube. Use the data of your version control system to figure out where the most and larges commits have happened in the past.

Focus on a goal

Even if you concentrate on one or two hot spots, there probably is still more than enough to do. Determine the main pain points in the code you are about to refactor. Then pick a single goal you want to achieve, and stick to it.

Don’t sidetrack. Let’s pick an example: we are refactoring a god class to break it apart into several smaller classes. While we are at it, it’s tempting to fix all those errors in const correctness, naming issues and switch all those old school pointer parameters to references. Don’t go down that road. It’s too easy to get lost.

There are lots of possible goals. Improving class design is only one of them. Bringing your code under test is another, and an important one. We can’t reliably refactor code that is not covered by tests. Another important goal might be a shorter compile time. Often there’s a lot of code that depends on hot spot classes. In that case even smaller refactorings can trigger the recompilation of large parts of the application, and a single refactor-compile-test-commit cycle can last very long.

Refactoring time management

The goals we pick for our refactorings should be manageable. If a goal seems too big, try to split it into smaller goals. For example, instead of splitting that large class into all its responsibilities, factor out one responsibility at a time. That way a single refactoring task can be done in less time, fit into a sprint and leave time for others that have to work on the class.

It is very likely there will be others that are affected by the refactorings. After all, it’s a hot spot you’re working on. The daily business of fixing bugs and adding new functionality will very likely have to access the very same spots that you want to refactor. It is one thing to work with a class that looks very different because someone refactored it last week. It is another thing to implement new features in a class that is constantly changing while you are at it or merging your changes into the changed class.

For that reason it is good practice to separate maintenance and refactoring as much as possible. If you can, plan dedicated refactoring sprints. If not, make clear to the team that it is not a good idea to add functionality to class X from Monday through Wednesday because you are taking it apart.

Conclusion

Refactoring a large application is a huge task. As any huge tasks, it should be done with a plan what to do, where to do it and when to do it. And remember that this is a team game. The planned refactoring should be done in the team, or the team should be at least aware of the details of your refactoring plans.