How do I stop dozens of tests failing every time I change something?

One of the most common problems when adopting test-driven development is that changing behaviour can result in dozens of failing tests that then need to be fixed one-by-one. Let’s take a look at some of the causes and how to avoid them.

Duplication in setup code

If we were writing a blogging platform, we might have a BlogPost class that has a title and a body. When we need one in a test, we could just call its constructor:

1 2 3 4 5 [ Test ] public void DefaultSlugIsLowercaseTitleAndSpacesConvertedToHyphens ( ) { var post = new BlogPost ( "Gentlemen of Few" , "<p>They're a band</p>" ) ; Assert . Equal ( "gentlemen-of-few" , DefaultSlug ( post ) ) ; }

The problem is that we might use the same constructor in many tests within the same file. If we change the constructor, such as by adding a date, we then need to change every usage of the constructor.

Solution

A better solution is to avoid calling the constructor directly:

1 2 3 4 5 6 7 8 9 [ Test ] public void DefaultSlugIsLowercaseTitleAndSpacesConvertedToHyphens ( ) { var post = CreateBlogPostWithTitle ( "Gentlemen of Few" ) ; Assert . Equal ( "gentlemen-of-few" , DefaultSlug ( post ) ) ; } private BlogPost CreateBlogPostWithTitle ( string title ) { return new BlogPost ( title , "<p>Filler blog post body</p>" ) ; }

By using CreateBlogPostWithTitle , we only need to update one call to the constructor within the test class, rather than updating every test case. If the same code is used across different test classes, we should consider moving CreateBlogPostWithTitle into a separate class.

As an added bonus, the test now contains only the information that’s relevant to the test. We were originally including the body of the blog post, which didn’t affect the behaviour of the test. The setup code now only contains the relevant details i.e. the title of the blog post.

Asserting more than one behaviour in a single test

That you can test too little might be obvious, but the reverse is also true: it’s possible to test too much in a single test. When writing a test case, try to focus on a specific behaviour. For instance, suppose that we were testing the function that takes any string and converts it to a slug (a string suitable for use as part of a URL). As a first step, we want to convert any uppercase characters to lowercase, so we write the following test:

1 2 3 4 [ Test ] public void UppercaseCharactersAreConvertedToLowercase ( ) { Assert . Equal ( "gist" , ToSlug ( "GiST" ) ) ; }

Once we’ve made it pass, the next step is to convert whitespace characters into hyphens. We might be tempted to write the following test:

1 2 3 4 [ Test ] public void WhitespaceCharactersAreConvertedToHyphens ( ) { Assert . Equal ( "gentlemen-of-few" , ToSlug ( "Gentlemen Of Few" ) ) ; }

However, we’re actually testing two behaviours here: as well as testing the conversion of whitespace to hyphens, we’re also testing the conversion of uppercase to lowercase.

Solution

We can test the behaviour in isolation by not including any uppercase characters in our original string:

1 2 3 4 [ Test ] public void WhitespaceCharactersAreConvertedToHyphens ( ) { Assert . Equal ( "gentlemen-of-few" , ToSlug ( "gentlemen of few" ) ) ; }

The advantage of our new test is that it’s less likely to fail when unrelated behaviour changes. For instance, suppose we decided that the uppercase characters should be preserved rather than converted to lowercase characters. The original implementation of WhitespaceCharactersAreConvertedToHyphens would have failed once the changes had been made, whereas the second version would continue to pass.

Redundant tests

As well as testing too much in a single test case, it’s possible to have too many test cases. Each test case has costs associated with it: the time taken to run the test, and the time to maintain it. In return, the test case should give you some new information about how well the code is working. If the test is describing the same behaviour as another test, consider removing it.

As a rough rule of thumb, if you have two tests, ask yourself the question: would I ever expect one of these tests to fail, but not the other? If not, then you can probably get rid of one of them. For instance, consider the following tests:

1 2 3 4 5 6 7 8 9 [ Test ] public void SlugConvertsSingleSpaceToHyphen ( ) { Assert . Equal ( "one-two" , ToSlug ( "one two" ) ) ; } [ Test ] public void SlugConvertsMultipleSpacesToHyphens ( ) { Assert . Equal ( "one-two-three-four" , ToSlug ( "one two three four" ) ) ; }

Is it possible that one of these tests might fail while the other passes? The answer is yes: if we’re only replacing the first whitespace character we find with a space, then it’s possible for the first test to succeed while the second fails.

We could add a third test for the case of four whitespace characters:

1 2 3 4 [ Test ] public void SlugConvertsAllFourSpacesToHyphens ( ) { Assert . Equal ( "one-two-three-four-five" , ToSlug ( "one two three four five" ) ) ; }

However, the value of such a test is dubious: we probably wouldn’t expect this test to fail while the previous test passes (or vice versa). Although it’s possible that the code could be written to pass in one case not the other, you’d probably have to be intentionally malicious to cause that behaviour. As a guide, assume (potential) stupidity on behalf of the implementer, not malice.

Why does my test code look exactly like my production code?

There are some occasions when your test code ends up looking very similar to the code under test. Such tests duplicate the same, potentially flawed, logic in the production code, rather than actually describing the expected behaviour and checking it works correctly.

One area where this commonly comes up is in data access layers that allow access to a SQL database. For instance, here’s a class for counting the number of blog posts in a database, along with its test:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 // Class under test public class BlogPostRepository { private readonly IDatabaseConnection m_Connection ; public BlogPostRepository ( IDatabaseConnection connection ) { m_Connection = connection ; } public void AddBlogPost ( string title , string body ) { // Implementation skipped for brevity } public int NumberOfBlogPosts ( ) { return ( int ) m_Connection . ExecuteScalar ( "SELECT COUNT(1) FROM posts" ) ; } } // Tests [ Test ] public void AddingBlogPostsIncrementsNumberOfBlogPosts ( ) { var connection = new Mock < IDatabaseConnection & ht ; ( ) ; connection . Setup ( c = > c . Execute ( "SELECT COUNT(1) FROM posts" ) . Returns ( 2 ) ; var repository = new BlogPostRepository ( connection . Object ) ; Assert . Equal ( 2 , repository . NumberOfBlogPosts ( ) ) ; }

Solution

We can see that the calls to the database are duplicated in the original class and in the test. The problem here is that the most significant piece of logic is the SQL query itself, which isn’t tested at all. A better solution would be to turn our unit test into an integration test by using a real database.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [ Test ] public void AddingBlogPostsIncrementsNumberOfBlogPosts ( ) { with ( var connection = CreateTemporaryDatabase ( ) ) { var repository = new BlogPostRepository ( connection ) ; AddBlogPost ( repository ) ; AddBlogPost ( repository ) ; Assert . Equal ( 2 , blogPostRepository . NumberOfBlogPosts ( ) ) ; } } private void AddBlogPost ( BlogPostRepository repository ) { repository . AddBlogPost ( "Standard blog post title" , "<p>Filler blog post body</p>" ) ; }

We’ve added a method AddBlogPost that will add a blog post to the database using the repository. In this case, the title and body of the post are irrelevant, so we don’t need to pass them as arguments. As before, this also insulates us against changes in the way blog posts are added to the repository.

There are a few things to watch out for. We’ve turned our unit test into an integration test, which means it’s likely to be trickier to set up. Specifically, CreateTemporaryDatabase needs to create a temporary database, open a connection to the new database, and then drop that database at the end of the test. The difficulty of this will vary depending on what database you’re using – for instance, SQLite can create databases in memory:

1 2 3 4 public IDatabaseConnection CreateTemporaryDatabase ( ) { var connection = new SQLiteConnection ( "Data Source=:memory:" ) ; return new DatabaseConnection ( connection ) ; }

Integration tests also tend to be slower and less reliable than unit tests. To keep your test suite as fast and reliable as possible, try to keep such layers as thin as possible so that as much of the code as possible can be tested using unit tests.

How should I change the tests when I extract a method from a method already under test?

Sometimes we want to extract an existing piece of code into its own function that can be reused. When we extract the code, we should also extract appropriate test cases from the original function. However, if we just copy and adjust the existing tests, we’ve introduced unnecessary redundancy into our test suite. On the other hand, we want to make sure the original function continues to behave as expected, so we can’t just delete the original tests for the extracted functionality.

For instance, suppose we’ve written a function that imports blog posts from Word documents, and we’ve written some tests for that functionality. Part of the import process might involve generating slugs from the title of the document, so we’d have some tests for that specific functionality:

1 2 3 4 5 6 7 8 9 10 11 12 13 [ Test ] public void TitleOfDocumentIsConvertedToSlugByConvertingWhitespaceToHyphens ( ) { var document = DocumentWithTitle ( "gentlemen of few" ) ; var blogPost = ImportDocumentToBlogPost ( document ) ; Assert . Equal ( "gentlemen-of-few" , blogPost . Slug ) ; } [ Test ] public void TitleOfDocumentIsConvertedToSlugByConvertingRunsOfWhitespaceToSingleHyphen ( ) { var document = DocumentWithTitle ( "gentlemen of few" ) ; var blogPost = ImportDocumentToBlogPost ( document ) ; Assert . Equal ( "gentlemen-of-few" , blogPost . Slug ) ; }

Now, say we want to extract the slug generation into a separate function. Any of our original tests that tested slug generation should be converted into tests for our new slug generation function.

1 2 3 4 5 6 7 8 9 [ Test ] public void WhitespaceIsConvertedToHyphens ( ) { Assert . Equal ( "gentlemen-of-few" , ToSlug ( "gentlemen of few" ) ) ; } [ Test ] public void RunsOfWhitespaceAreConvertedToSingleHyphen ( ) { Assert . Equal ( "gentlemen-of-few" , ToSlug ( "gentlemen of few" ) ) ; }

The question is: what do we do with our original tests? The document importer hasn’t changed its behaviour, but if we keep the original tests, then we’re testing the same behaviour twice. This would lead to brittle tests, as described above.

Solution

One option is to use mocks for the slug generation. We can then change the implementation of slug generation without affecting the tests for the import function. While this is often an appropriate response, it means introducing more boilerplate code in our tests. It also means that the tests are exposed to the interface of the slug generation function. If the slug generation interface changes, we’d need to update both the import function and its tests, rather than just the import function.

In cases such as this, an alternative is to leave a single test that ensures that we’re calling the child function, but to leave the thorough testing to the direct tests of the child function. Ideally, we’d choose a single test that relies on the behaviour least likely to change. In our example, we might keep the first test (verifying single whitespace characters are converted to a hyphen) since it’s unlikely we’d start converting whitespace to a different character. If we did, then fixing this one case is relatively quick. If we were to change slug generation so drastically, then a test failure might even be useful to make us consider whether such as a change is suitable for each use of the slug generation function.

On the other hand, we should discard the second test (verifying runs of whitespaces are converted to a single hyphen). This edge case is already covered by the direct tests of slug generation, and is more likely to change than the first test. In reality, we’d probably be discarding a much larger number of tests, having already converted them to tests for the child function.

Summary

We’ve looked at solutions for a few problems you might encounter when using test-driven development on a real project, although this is far from comprehensive. The general ideas are just as important as the specific details: