There are many reasons why you might want to write tests for your code: to prove it works as expected, to prevent bugs from coming back after you fix them, or simply to shorten the feedback loop between the moment you type in your editor to the moment you run your new piece of code.

The nice thing about automated tests is that they are repeatable, and scale really well. Whether it’s Monday morning or Friday afternoon, whether you are distracted or deeply focused, whether they’re executed by the most senior or the most junior member of your team, they’ll always produce the same results.

They’re just as good and valuable as you make them.

The code coverage fallacy

To get a measure of the quality of your test suite, you might track metrics like code coverage. Code coverage enables you to measure what parts of your code are exercised by tests. You can visualize parts that need more attention, and parts that are well covered.

A limitation of code coverage though, is that it has no way of knowing how well your code was exercised: it knows that a particular method was executed, but not why. Take this example:

If you execute this code with pytest and generate coverage with pytest-cov, it will report 100% code coverage, even though we didn’t assert anything useful about the code.

Admittedly, this example is a bit silly and is not likely something that would pass through even the lightest code review process. Take this other example:

While it’s more useful than the last test and really does assert an important property of our code, this test could be improved. On this contrived example, it’s easy to spot how to make it better. On a more complicated test, it can be hard to spot the flaws.

The point I’m trying to make here is that code coverage can be tricked, intentionally or not, and may not be as good a measure of code quality as you think it is. When you mix complex requirements, complex code and complex tests together, you are bound to have the same kind of shortcomings somewhere in your tests.

Worse, a high code coverage may even give you a false sense of security. One Friday afternoon, a mission-critical piece of code breaks in a spectacular way just after a deploy, even though it was well tested and you had complete confidence in it. Based on the metrics, it was pretty much unbreakable!

I think you should instead view code coverage as a good way to know which part of the codebase is not tested enough and needs more attention, not as a way of knowing which part is really dependable.