It’s an inescapable fact that writing maintainable code is hard. Most projects end up with code that nobody dares touch, caused by a lack of understanding and a fear of error. Before we think about how to avoid such a mess, we have to consider what we mean by maintainable code.

Before we can change any code, we have to understand what the code currently does, and whether it’s supposed to do that. While there are many ways to achieve this, I think one of the best solutions is a comprehensive suite of automated tests. A good suite of tests can serve as documentation, telling you how the code is supposed to behave, as well as making sure that the code actually has that behaviour. Even better, they can give you the confidence that your code still works after you’ve made your changes. Unfortunately, some code is impossible to test, so the first question I’ll try to answer is: how do I write testable code?

Even if we have a suite of tests, changing code is difficult if we have no idea how it works. Therefore, the second question is: how do I write clean, understandable code?

How do I write testable code?

If you ask people, “How do you write untestable code?”, they’ll often respond with answers like “Long, complicated methods” or “No clean separation of concerns”. While these are things to avoid (and we’ll come back to them), they make code hard to test, but not impossible to test. Code tends to be untestable for two main reasons:

Global, mutable state. In other words, a variable that any code can access, and any code can modify.

state. In other words, a variable that any code can access, and any code can modify. Constructing, or otherwise acquiring, services, such as database connections, through static methods (a constructor is effectively a static method).

Global variables are fine if it’s immutable data. If it’s mutable, then it becomes impossible to consistently set up an identical environment before each test. For instance, say a method uses an incrementing integer to assign unique IDs. Each time we run the method, the assigned ID will be different, making it impossible to test the method with particular IDs.

The solution to the second point is dependency injection. A class should ask for its dependencies through its constructor rather than acquiring them itself. This allows a test to pass in mocked versions of those dependencies. For instance, let’s say we have an online booking system for a tattoo parlour. If a user under 18 requests an appointment, we refuse to give them a tattoo. If they’re 18 or over, we give them the first available appointment. The code might look something like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 public class Appointments { public Result RequestTattooAppointment ( User user ) { var appointmentBooker = new AppointmentBooker ( ) ; if ( user . Age >= AgeOfConsent ) { return appointmentBooker . BookFirstAvailableAppointment ( user ) ; } else { return Result . UnderAge ( ) ; } } }

There’s no way to test this class in a unit test – no matter what we do, it will construct an AppointmentBooker , which will connect to the database. We don’t want to have to set up an entire database just to test the logic in this method. Instead, we should ask for an AppointmentBooker in the constructor.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public class Appointments { private readonly IAppointmentBooker m_AppointmentBooker ; public Appointments ( IAppointmentBooker appointmentBooker ) { m_AppointmentBooker = appointmentBooker ; } public Result RequestTattooAppointment ( User user ) { if ( user . Age >= AgeOfConsent ) { return m_AppointmentBooker . BookFirstAvailableAppointment ( user ) ; } else { return Result . UnderAge ( ) ; } } }

Now we can pass in a mocked instance of IAppointmentBooker in our test. For instance, just taking the case of the user being underage:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 public class AppointmentsTest { private readonly IAppointmentBooker m_AppointmentBooker = A . Fake < IAppointmentBooker > ( ) ; private readonly Appointments m_Appointments ; public AppointmentsTest ( ) { m_Appointments = new Appointments ( m_AppointmentBooker ) ; } public void UnderAgeUsersDoNotGetAnAppointment ( ) { // Given var user = new User { Age = Appointments . AgeOfConsent - 1 } ; // When var result = m_Appointments . RequestTattooAppointment ( user ) ; // Then Assert . Equal ( Result . UnderAge ( ) , result ) ; A . CallTo ( ( ) = > m_AppointmentBooker . BookFirstAvailableAppointment ( user ) ) . MustNotHaveHappened ( ) ; } }

There are some frameworks, such as Google Guice for Java or Ninject for C#, that can automate some parts of dependency injection, saving you from death by a thousand new s.

How do I write clean, understandable code?

Although there are dozens of useful rules and principles in writing clean code, I think most can be reduced to one of these three:

Optimise for the reader of the code, not the writer . Code is read more than it’s written. If you optimise for the writer, you’ll save a few key presses, but the cost to the reader is confusion, frustration, and subtle (and not-so-subtle) bugs.

. Code is read more than it’s written. If you optimise for the writer, you’ll save a few key presses, but the cost to the reader is confusion, frustration, and subtle (and not-so-subtle) bugs. Don’t repeat yourself (often abbreviated to DRY). It’s easier to maintain code if it only exists in one place, and this also ensures consistency. If code is duplicated, there’s a good chance that you’ll forget to update one of the copies, meaning bugs you fix in one copy will still be there in the other copy.

(often abbreviated to DRY). It’s easier to maintain code if it only exists in one place, and this also ensures consistency. If code is duplicated, there’s a good chance that you’ll forget to update one of the copies, meaning bugs you fix in one copy will still be there in the other copy. Do the simplest, smallest thing you can do to add some value. If you try and guess how you should try and modify the system for all future requirements, you’re going to get it wrong. Either you’ll end up rewriting the code you never used, or you’ll be stuck with code that sort-of does what you actually want it to do.

Optimising for the reader

The computer doesn’t care how you write your code, so long as it’s unambiguous. In other words, write code for humans, not machines. There are plenty of principles you could use, this is just a smattering:

Spend some time carefully thinking about the names for classes and methods. If you find all of your class names end in Helper , then you might need to spend a bit more time thinking. Thinking up descriptive names is hard, but being unable to think up a good name is sometimes a sign that your class or method does too much, and actually needs splitting up.

, then you might need to spend a bit more time thinking. Thinking up descriptive names is hard, but being unable to think up a good name is sometimes a sign that your class or method does too much, and actually needs splitting up. Don’t use abbreviations. For instance, instead of calling a variable dbc , call it databaseConnection . The exception is when the abbreviation is well-known, for instance, using html as an abbreviation is fine.

, call it . The exception is when the abbreviation is well-known, for instance, using as an abbreviation is fine. Good code should be unsurprising. If you find some code that is surprising or “clever”, try to see if you can simplify it, or somehow make it clearer.

Each method should operate at one level of abstraction. A method that deals with high-level concepts in your domain shouldn’t be doing complicated string manipulation as well. One style of writing code is to make it read like a newspaper article. From reading that method, you get just enough detail to see how it works. If you want more detail on how it works, you can dive into one of the methods being called. Just like a newspaper article, you should be able to stop reading at any point and have an understanding of the overall picture.

Don’t repeat yourself

If you ever find yourself copy-and-pasting code, then that’s a hint that you should pull out the common functionality of the code.

In some languages, it’s difficult to pull out code that has structural duplication, but operates on different data. However, if you have a language that lets you pass around functions easily, you can pull out that duplication. For instance, say you want to convert a list of strings to a list of integers. You could build a new list of integers, iterate through the list of strings, and add each parsed string to the new list:

1 2 3 4 5 var listOfIntegers = new List < Integer > ( ) ; foreach ( var str in listOfStrings ) { listOfIntegers . Add ( int . Parse ( str ) ) ; }

Every time you want to convert one list of values to another, the code is identical except for the value conversion. You can pull that duplication out by using map, called Select in C#:

1 var listOfIntegers = listOfStrings . Select ( int . Parse ) ;

Do the simplest, smallest thing to add value

It’s impossible to guess what code you’re going to need in the future, even if you think you have a good grasp of the requirements. By taking small steps, you learn about the problem while solving it.

You might notice that always taking small, simple steps doesn’t always lead to readable code without duplication. It’s crucial to remember to refactor your code once it’s working. Once you have code that does what you want in the simplest way, you must then make sure it’s clean code that you’d be happy maintaining.

Example

Here’s some code that I reckon could do with a bit of refactoring:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 public class ArchiveController : Controller { private readonly ISession m_Session ; public ArchiveController ( ISession session ) { m_Session = session ; } public ActionResult Index ( FormCollection form ) { var sds = form [ "start-date" ] ; var sd = DateTime . MinValue ; if ( sds != null ) { if ( DateTime . TryParse ( sds , out sd ) ) { } else if ( sds == "now" ) { sd = DateTime . Now ; } } var eds = form [ "end-date" ] ; var ed = DateTime . MaxValue ; if ( eds != null ) { if ( DateTime . TryParse ( eds , out ed ) ) { } else if ( eds == "now" ) { ed = DateTime . Now ; } } var hss = m_Session . QueryOver < HatSale > ( ) . Where ( h = > h . DateTime >= sd ) . Where ( h = > h . DateTime <= ed ) . List ( ) ; var ahwpm = hss . GroupBy ( h = > h . DateTime . Month ) . Select ( g = > new { W = g . Key , Avg = g . Average ( s = > s . Hat . Width ) } ) ; return View ( new { Hss = hss , Ahwpm = ahwpm } ) ; } }

There are clearly some problems with this code:

The class is called ArchiveController – what’s in this archive?

– what’s in this archive? The variables aren’t named well, for instance, what does ahwpm mean?

mean? The logic that reads start-date and end-date from the form is duplicated.

and from the form is duplicated. What does DateTime on a hat sale represent?

on a hat sale represent? There are bits of NHibernate querying and LINQ all on one unreadable line.

The method is long, and you feel you have to read the whole thing to begin to understand what’s going on.

The method operates at many layers of abstraction – it’s parsing strings, accessing the database, and doing some calculations.

So, after some refactoring:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 public class SalesArchiveController : Controller { private readonly ISession m_Session ; public SalesArchiveController ( ISession session ) { m_Session = session ; } public ActionResult Index ( FormCollection form ) { var startDate = ParseStartDate ( form ) ; var endDate = ParseEndDate ( form ) ; var hatSales = FetchHatSalesInDateRange ( startDate , endDate ) ; var averageHatWidthsPerMonth = CalculateAverageHatWidthsPerMonth ( hatSales ) ; return View ( new { HatSales = hatSales , AverageHatWidthsPerMonth = averageHatWidthsPerMonth } ) ; } private DateTime ParseStartDate ( FormCollection form ) { return ParseDateParameterOrNull ( form , "start-date" ) ? ? DateTime . MinValue ; } private DateTime ParseEndDate ( FormCollection form ) { return ParseDateParameterOrNull ( form , "end-date" ) ? ? DateTime . MaxValue ; } private DateTime ? ParseDateParameterOrNull ( FormCollection form , string key ) { var dateString = form [ key ] ; if ( string . IsNullOrEmpty ( dateString ) ) { return null ; } DateTime dateTime ; if ( DateTime . TryParse ( dateString , out dateTime ) ) { return dateTime ; } if ( dateString == "now" ) { return DateTime . Now ; } return null ; } private IList < HatSale > FetchHatSalesInDateRange ( DateTime startDate , DateTime endDate ) { return m _ Session . QueryOver < HatSale > ( ) . Where ( h = > h . DateOfSale >= startDate ) . Where ( h = > h . DateOfSale <= endDate ) . List ( ) ; } private IList < MonthlyHatSales > CalculateAverageHatWidthsPerMonth ( IList < HatSale > hatSales ) { return hatSales . GroupBy ( sale = > sale . DateOfSale . Month ) . Select ( group = > new MonthlyHatSales { AverageHatWidth = group . Average ( sale = > sale . Hat . Width ) , Month = group . Key } ) . ToList ( ) ; } public class MonthlyHatSales { public int Month { get ; set ; } public double AverageHatWidth { get ; set ; } } }

Although this code reads much better, it’s not perfect by any means. For instance, it doesn’t obey the Single Responsibility Principle: it deals with everything from parsing HTTP parameters to querying the database. We’ve already pulled this logic into separate methods, so a next step might be to try and pull these methods into separate classes. Regardless of what change you’re making, you should be extremely cautious about changing any code without any tests around it.

Conclusion

I’ve talked about how to write maintainable code, but is it the case that we always want maintainable code? Are there not occasions where we need to write a quick piece of code that we’re going to throw away immediately? This logic has one major flaw: our inability to predict the future. What begins as throw-away code can quickly become a vital piece of infrastructure, needing maintenance and extension for years. Even if the code is to be thrown away, the benefits of writing maintainable code can pay off much quicker than you might think. Often, a quick piece of code isn’t so quick after all, and you can easily find yourself surrounded by a baffling labyrinth of code in under a day.

Finally, it’s important to have a sense of craftmanship. Through continuous practice we improve the quality of what we write, and form the habit of always writing clean, readable code.