by Jasper Sprengers

You can’t do without automated (unit) tests if you want to stay on top of the ever increasing complexity of software projects. A mutation testing framework ‘watches the watchmen’ by inserting small changes into your compiled byte code and then validating your test suites against these intentional bugs. As a quality safeguard it’s much more effective than traditional source code validation. It is a even a challenging way to improve your coding skills and makes writing tests suites fun again.

As a largely self-taught programmer I am still thankful for the millennial internet boom, when a linguistics graduate could get hired with nothing more than a thirst for coding and no professional experience or relevant diplomas to show for. I like to think my coding skills have improved in the fifteen years since, starting out – like most glorified amateurs – with the anti-bureaucratic “just get it done even if it requires duct-tape” attitude of coding, where writing unit tests only gets in the way of shipping code. Read Joel Spolsky about Netscape pioneer Jamy Zawinsky. Nowadays I believe in the school of “You can have good, cheap or fast. Pick two.” I don’t do duct tape anymore, except for small home repairs.

If a job is worth doing, it is worth doing well. Good code written by fallible humans needs automated tests. Too bad it’s not really the most fun part of programming. For many who write code on a daily basis test driven development still doesn’t come natural. Plenty of lip service is paid, but we’re just not that motivated to do it properly unless we’re harassed with minimum thresholds for test coverage. Tests feel like unexciting pieces of code making sure that other code which puts a blue ball in a red box has indeed put a blue ball in a red box, and not a green ball in an orange box. A pedantic bureaucratic requirement. “I can write working code, you know. Let me just get on making customers happy. They don’t care about stupid unit tests”.

That’s a dangerous attitude. Unit tests matter. Integration tests matter even more. You should get motivated. Test cases are a yardstick for quality while you’re designing your code, not something to add when you have time to spare. Code that cannot be properly unit-tested is a likely candidate for some serious overhaul. However, full rewrites are a waste of time and money. Agile development means software is continuously thought out, written down, re-thought and re-written until the user is happy or the money runs out, whichever comes first. Adopting such a just-in-time approach means that a code base is not only being expanded with new code, but the existing code adapts to the more complex software architecture without breaking the functionality it already provides. This change is happening all the time, ideally from day one, but no later than day three. Good tests make this incremental refactoring possible. Bad tests make it impossible.

Greater minds than myself have written great books about the need for refactoring and why meaningful tests should cover your entire code basis. I shouldn’t have to convince you. Let me just add my own argument here:

If you can’t test some piece of code, I don’t trust you to understand it.

Understanding how each line of code relates to the whole is a major challenge. Here’s something to put Moore’s Law into practical perspective. I am writing this post on a fancy Macbook with 16 Gb memory. In the early eighties I saved up my precious pocket money to buy a 16 Kb memory module for my Vic 20 home computer. That’s a million times more memory in thirty years. Human short-term memory is stuck on a measly seven items and has probably been since Socrates. Yes, we have better tools and faster compilers, but we have fixed-size brains that have to deal with ever-more complex projects. The only way to comprehend these is to break them down into manageable, testable chunks.

How to make sure our tests are any good? “Code coverage”, I hear you say. True, that’s a useful criterion, usually expressed the percentage with which source code is covered by test suites, but coverage in itself is not enough. Test suites need to pummel your code with a wide range of sensible and wacky input parameters and assert the results. Ineffective, bogus unit tests that cover the source code and please the robot don’t really validate anything. Quantifying quality is not bad in itself, but insisting on a minimum percentage of test coverage without some assurance that your tests are actually any good only lulls you in a false sense of security, particularly if you have too many developers in your team with the duct-tape attitude I just described. If management is okay with the practise, I advise you to find another employer. If you’re okay with it, I urge you to change careers.

You can (and probably should) have dedicated testers that put the finished product through rigorous manual testing. If they are worth their money they will find something that your test suites didn’t catch. But how much nicer it would be if there were an automated way to check if your tests are any good. To test your tests, so to speak. Mutation testing does just that. Mutation testing is based on a simple assumption: if your test suites fully validate the behaviour of your program, then changing the behaviour of the program by inserting significant changes should cause at least some of your tests to fail.

For those who need a car analogy: suppose the car is your source code, the test case involves doing a three-point turn and the JUnit runner is behind the wheel. The test succeeds if the car points the other way unscathed. Mutation testing will do several evil things to your car and expects that your tests will sniff them out. It will remove the battery: did you check that the engine has started. It will skew the rear-view mirror: did you adjust it for a clear view?

The framework inserts small but significant changes (‘mutants’) in your compiled code. Examples are swapping out arithmetic and equality operators, making methods return null or just removing method calls: basically anything that leaves the outward interface intact. Your unit tests should however detect the mutation and fail, thus killing the mutant. If not, the mutant has survived, indicating that your test may be insufficient. Rather than looking at code coverage alone, the percentage of mutants killed becomes the true indicator of quality.

A very useful mutation testing framework for Java is pitest.org. It integrates well with standard build tools and has simple but effective Eclipse and IntelliJ plugins. You can get up and running in a few minutes. Pitest will output a neat HTML report that puts code coverage next to mutation coverage and has a detailed view specifying which mutants have survived, i.e. were not caught by the test suite.

Is that a useful indicator? Can’t you just fool the machine like you can with code coverage? Not really. Mutation testing simply won’t let you get away with sloppy testing. Introducing a small but significant change in a class must break at least one test, i.e. kill the mutant. Mutants are more likely to survive when coverage is poor, but I have found plenty of survived mutants in code I thought was well-tested. The report from the PIT suite neatly puts coverage next to the percentage of mutants killed, where the latter is always a lower figure.

Mutation testing forces you to take testing seriously, there’s no doubt about it. Integrating it into your daily routine sharpens your sensibility to code more defensively and write good tests by adding a competitive element to the mix. Test driven development can feel like playing a game of chess with yourself. You write some code, and then you write a test for it. It’s hardly a victory to see a green test suite. Now when mutants rear their ugly heads writing these tests becomes a lot less predictive, but also more challenging. Any lazy bum can do 100% code coverage. It’s easy. Killing all the mutants in an intricate piece of source code, now that’s a sweet tasting victory.

In a next post, I will go into more detail on how to incorporate mutation testing sensibly and productively.