Automated tests are immensely useful. Once you’ve started writing tests and seen their value, the idea of writing software without them becomes unimaginable.

But as with any technique, you need to understand its limitations. When it comes to automated testing—unit tests, BDD, end-to-end tests—it’s tempting to think that if your tests pass, your software is correct.

But tests don’t, tests can’t tell you that your software is correct. Let’s see why.

How to write correct software

To implement a feature or bugfix, you go through multiple stages; they might be compressed or elided, but they are always necessary:

Identification: Figure out what the problem is you’re trying to solve. Solution: Come up with a solution. Specification: Define a specification, the details of how the solution will be implemented. Implementation: Implement the specification in code.

Your software might end up incorrect at any of these points:

You might identify the wrong problem. You might choose the wrong solution. You might create a specification that doesn’t match the solution. You might write code that doesn’t match the specification.

Only human judgment can decide correctness

Automated tests are also a form of software, and are just as prone to error. The fact that your automated tests pass doesn’t tell you that your software is correct: you may still have identified the wrong problem, or chosen the wrong solution, and so on.

Even when it comes to ensuring your implementation matches your specification, tests can’t validate correctness on their own. Consider the following test:

def test_addition (): assert add ( 2 , 2 ) == 5

From the code’s perspective—the perspective of an automaton with no understanding—the correct answer of 4 is the one that will cause it to fail. But merely by reading that you can tell it’s wrong: you, the human, are key.

Correctness is something only a person can decide.

The value of testing: the process

While passing tests can’t prove correctness, the process of writing tests and making them pass can help make your software correct. That’s because writing the tests involves applying human judgment: What should this test assert? Does match the specification? Does this actually solve our problem?

When you go through the loop of writing tests, writing code, and checking if tests pass, you continuously apply your judgment: is the code wrong? is the test wrong? did I forget a requirement?

You write the test above, and then reread it, and then say “wait that’s wrong, 2 + 2 = 4”. You fix it, and then maybe you add to your one-off hardcoded tests some additional tests based on core arithmetic principles. Correctness comes from applying the process, not from the artifacts created by the process.

This may seem like pedantry: what does it matter whether the source of correctness is the tests themselves or the process of writing the tests? But it does matter. Understanding that human judgment is the key to correctness can keep you from thinking that passing tests are enough: you also need other forms of applied human judgment, like code review and manual testing.

(Formal methods augment human judgment with automated means… but that’s another discussion.)

The value of tests: stability

So if correctness comes from writing the tests, not the tests themselves, why do we keep the tests around?

Because tests ensure stability. once we judge the software is correct, the tests can keep the software from changing, and thus reduce the chances of its becoming incorrect. The tests are never enough, because the world can change even if the software isn’t, but stability has its value.

(Stability also has costs if you make the wrong abstraction layer stable…)

Tests are useful, but they’re not sufficient

To recap: