Knowing Where to Start Cleaning Your Code

Doc Norton, OnBelay, https://leanpub.com/escapevelocity

Nine months ago, this team was kicking out code and knocking down features. But as the project rolled on, things started to slow down a bit. First there was that request to add one more condition to the logic for determining to what tier a customer belongs. It seemed simple enough, but it took the team a couple of weeks to code it up and yet another week to merge it and resolve all of the issues. And more generally, it seems like small stories require changes to a lot of areas of the code. Like a recent change to the shopping cart which required us to deploy a new version of the tax calculator module.

You ask the enterprise architect to get involved and see if there is anything she might recommend. Maybe the team needs some training. Or maybe the solution doesn’t match the architecture standards and this is causing a problem.

After some discussion with the team, the architect recommends you run static analysis against the code.

"It sounds like the high-level architecture is not an issue, but there may be some issues with the implementation of the code.", she explains.

The team agrees to run the analysis and works with the architecture team to get things set up.

Sure enough, the analysis shows a number of issues in the code; low test coverage, duplicate code, high complexity, and a lot of so-called "code smells" that have been violated.

You and the team agree that the code needs to be tidied up. There are plenty of opportunities for improvement, but you also need to continue to deliver new value. So where to start? Do you go after the low hanging fruit and find easy things to fix? Or should you maybe tackle that huge convoluted class with all the responsibility and high complexity? Maybe you pick a specific code smell and knock out every instance of it? What code actually needs to be fixed most? How do you decide?

Code Churn

One strategy is to look at the code churn and start with areas of high churn. Code churn indicates the amount of change that takes place in a particular area of code. It is not actually a measure of code quality But it is an important partner to other code quality metrics.

Code churn tells us where we are spending the most time in the code, making updates, adding features, fixing defects. The higher the churn, the more time the team spends in that area of the code. The more time spent in an area of code, the more risk there is if the code is low quality or doesn't have good coverage.

Churn can happen for a number of reasons; for example - changing requirements, delivery in small slices, or poor code composition.

If the requirements change frequently, then the code necessarily needs to change. If the changes are around a specific piece of functionality, say a calculation or a user interaction, then we would see higher churn in those areas of the code than others. We’d revisit the code and make adjustments in order to bring the code in line with the new requirements.

As it turns out, delivery in small slices can sometimes look the same as frequently changing requirements. Depending on how the work has been sliced, the team may be returning to the same area of the code with each new increment. Maybe we started with a basic search and are incrementally adding what fields the search can be run against and altering the way the results are returned from a single list to a paginated list. This is going to cause a certain amount of churn in those areas of the application.

As mentioned, poor code composition can also result in churn. If there is a particular class, say customer, that many other classes are coupled to, it is possible that as we change uniquely different areas of the code, we also have to change the customer class. This would cause high churn in customer, but only because it is inappropriately coupled to other classes.

In all of these cases, churn is still a good indicator for where to start cleaning up the code.

Let’s say we have a product that has two particularly snarly areas of the code, both with high complexity and low test coverage, which one should we fix first?

The one with the higher churn.

Lower churn code is lower risk to the organization.

In fact, churn is so significant, that a piece of code with no coverage, sky-high complexity, and any other number of internal issues that happens to work and has zero churn is a piece of code we can safely leave alone until absolutely necessary. Absolutely necessary being when business requirements indicate we need to make changes to it.

While I encourage teams to look at churn and use it to help them make decisions about what areas of the code to clean, there is a much more simple heuristic to use.

Clean the code you are in

As mentioned, churn indicates where you spent the most time in the code. When we look at churn, we are most interested in recent churn; where have we been making changes in the past few days or weeks or even months.

So churn is a trailing indicator of where we spent time in the code. And we’re interested in the most recent churn. What if we shortened the feedback loop? What if instead of looking at what code we most recently changed, we look at the code we are currently changing?

If you spend a lot of time in an area of the code, it is wise to clean it up. So, as it works out, if you are in an area of the code, then it makes sense to clean it up a bit. If, every time you are in the code, you clean it a little, the areas of the code where you spend the most time will get cleaned the most. The end result is the same as if you'd used churn as a guide, except you wouldn't have to measure churn and the code would already be cleaned (at least a little).

This technique of cleaning the code you are in is often referred to as "The boy scout rule". One of the "rules" of scouts is to leave the campsite better than you found it. In this metaphor, the campsite is our code. Leave the code better than you found it. Implement the feature, fix the bug, do whatever it is the story indicates you're supposed to do. And while you're at it, rename a variable, extract a method, or make any other small simple change that improves the readability, maintainability, and quality of the code. Maybe, on an ambitious day, you sprout a new class or implement a pattern that fits your code needs now that you better understand them. If everyone on the team does this, the code that gets touched the most will be in the best shape.

Code quality articles in Methods & Tools

Refactoring Large Software Systems

Refactoring Java Code

Click here to view the complete list of Methods & Tools articles

This article was published in May 2019