Code coverage doesn’t show all the picture. This post shows how the myth of 100% code coverage can irreversibly damage the design of your code.

Let's begin.

Tests for Data Transfer Objects increase the code coverage. However, they don't exercise any meaningful business behavior of the application. They're only technical details. For example, changes to the name of an accessor method are more likely to require changes to multiple tests and incur unnecessary maintenance while not providing any business value.

The working code that shows an example of a test against the "user id" accessor method for a class with the name "purchase books command."

In the example above, changes to the userId() accessor method of PurchaseBooksCommand breaks the test for the PurchaseBooksCommand and the test for BookStoreCommandHander .

The working code that shows an example of more than one test failing when you rename the method "user id" to "client id."

Likewise, tests for the internal functions of a React component increase the code coverage. However, they also add coupling between the test code and the internal code of the component. Changes to the internal structure of the component without changes to the behavior are more likely also to require changes to the tests, which is unnecessary work.

The pseudo-code example that shows a test of a React component. The test grabs the instance of the "counter" component and calls the method "increment" two times. The assertion verifies if the internal state of the React component has the "count" property with the number 2.

The number of lines of code covered by tests is not a useful measure of code coverage.

Tests for a class that uses a Data Transfer Object as an argument for an essential feature of your application, say to purchase books, may not increase the code coverage for the DTO. However, they ensure the tests exercise only the code that belongs to the Business Domain.

"Exercise" means the test executes the code and asserts a behavior. It's an essential term in the context of this post. If the test only executes the code without an assertion, it increases code coverage but doesn't give any assurance that you'll see a test failure when the behavior of the code changes.

The working code that shows an example of a good test for the "book store command handler." The test verifies if the user can initiate the process to purchase books, not if the properties of the command work.

Likewise, tests that use the rendered state of a React component may not increase the code coverage for internal function calls. However, asserting against the rendered state ensures that when you refactor the internal code of the component, the tests are less likely to break.

The pseudo-code example that shows a test of a React component. The test renders the component which returns an API to query the rendered DOM in memory. The test queries an element with the class "count button" and asserts that the element with the class "current-count" has the right count number.

The number of meaningful behavior covered by the tests is a better measure of code coverage.

According to Goodhart’s law, when an objective measurement becomes a target in a system, people tend to game that measurement to pretend they are achieving better outcomes:

When a measure becomes a target, it ceases to be a good measure — Goodhart’s Law

If you use lines of code as an objective measurement for test coverage, people tend to game that and try to write the most optimal code to satisfy that metric. The code ends up worse than if the system didn't have any test at all.

So how do you quickly verify if all the tests are covering the right things or not? I wish there was a black and white response. I wish there was a magical tool you could throw at the code, and it would highlight all the test problems.

Unfortunately, there isn't.

The closest thing you can have to validate code coverage for the behavior of the application is Mutation Testing. However, that can't highlight 100% of the testing problems.

The test coverage is heavily dependant on the design of the application, and the design of the application is heavily dependant on the type of problem you're trying to solve.

You design for humans to understand the code and you test for humans not to screw up. Machines don't have these problems, they always execute the code correctly, and they never screw up. Therefore, no Static Analysis tool can understand your design; only a human brain with software design skills can look at the code, understand the design, and tell if the tests are covering the right things or not.

If you only write the tests to optimize coverage for machines, you'll end up with useless unreadable code that only machines can understand.

There’s a big difference between 100% test coverage for business behavior and 100% test coverage for lines of code.

A good example of meaningful code coverage is when you design your models to isolate the side-effects of HTTP requests. In that case, you need to design the code in a way that there's less coupling between the test code and the application code:

The pseudo-code example from a previous post which injects a “get request” into an “HTTP Server Data Source.” The test asserts that the method “find posts title” from the “HTTP Server Data Source” returns the correct result.

Another example of meaningful code coverage is when you design your models to reduce the number of nested imports. In that case, you need to design the code in a way that there's less direct coupling between the components of the application. It's better to pass the dependencies as function arguments or class constructors so that you can test the model without the need to mock imports.

If the tests simulate how the application uses the code under test and you mock imports in the test code, you should do the same in the application code. If that sounds ridiculous because nobody mocks imports in the application code, that’s because it is! Mocking imports is a Bad Code Smell that the API is not pluggable enough for the application to consume it and it's a hint you should redesign:

The pseudo-code example from a previous post that shows a function “Prefill” with capital case receiving two arguments, a “Stubbed Provider” and a “Mapping Logic” instance. The “Mapping Logic” receives one argument “for simple matching.” The code stores the result of the “Prefill” call in a variable called “prefill” in lowercase. The code calls the “prefill” variable as a function with the “form fields” as the only argument. The code stores the result in a variable called “prefilled form fields.” The assertion tests that the string representation of the “prefilled form fields” is correct.

In the context of tests, a client is any piece of code that consumes your model. The tests are the “first client” of your model that simulates how the application uses it. The application is the “second client.”

The code is a piece of text that only becomes useful when a machine executes it. Without a client machine that can prove its usefulness, the code is just a bunch of text without a purpose. For all practical means, it doesn't exist. The machine which executes the code should run in your computer, not in production. That allows quick and early feedback.

The problem is not 100% code coverage. Crap 100% is.

Proper 100% coverage means achieving 100% of the business use cases. It's to cover the external API of your models, not technical details like the methods of Data Transfer Objects or the internals of React components.

Although 100% test coverage of business requirements is the metric that matters, there's no way to retrieve that information through static analysis tools. Code coverage is for humans to have control of the code, not machines.

Excellent design skills and the mindset of solving problems instead of technical details are the first steps toward meaningful test coverage.

After all, only with a good design, you can achieve 100%.