Do you write unit tests?

Yes, of course!

Do you write them first?

Yes, I follow the three laws of TDD.

What is the difference in module structure between your tests and your code?

I create one test class per production class.

So if you have a production class named User you will have a test class named UserTest ?

Yes, almost always.

So the structure of your tests, and the structure of your code are covariant.

Um. I suppose so, yes.

And so you are coupling the structure of your tests to the structure of your production code.

I hadn’t thought about it being a coupling before; but, yes, I suppose I am.

So when you refactor the class structure of your production code, without changing any behavior; do your tests all break?

Well, yes. Of course.

And that means you can’t run your tests while you are refactoring, doesn’t it?

Yes, yes, it does.

So then you can’t really call it refactoring, can you?

Why not?

Because refactoring is defined as a sequence of small changes that keep the tests passing at all times.

Well, OK. I guess by that definition, the changes aren’t refactoring.

Instead, you have to commit yourself to a big change and hope you can put everything back together again – including the tests.

Yes, yes. What of it?

That is an example of the Fragile Test Problem.

The Fragile Test Problem?

Yes. A common complaint amongst people who try TDD for the first time. They note that small changes to the production code can cause large changes to the tests.

Yeah. That’s really frustrating. I almost gave up on TDD when I first encountered it.

Unfortunately, that is a common reaction.

OK, but what can be done about it?

Design the structure of your tests to be contra-variant with the structure of your production code.

Contra-variant?

Yes. The structure of your tests should not be a mirror of the structure of your code. The fact that you have a class named X should not automatically imply that you have a test class named XTest .

But wait. That breaks the rules!

What rules?

The rules that say that there should be one test class per class.

There is no such rule.

There isn’t? I’m sure I’ve read it and seen it.

Not everything your read and see is a rule.

Fair enough. But if the structure of the code and the tests must be, um. contra-variant, then how should the tests be structured?

First, let’s agree on a basic fact. If a small change in one module of a system causes large changes in many other modules of the system, the system has a design problem.

Yes, I think that’s obvious – software design 101 so to speak.

Then, clearly, if a small change to the production code causes large changes to the tests, there is a design problem.

I see that point, yes.

Therefore the tests must have their own design. They cannot simply follow along with the structure of the production code.

Hmmm. I see. If the two designs are the same, then they are coupled; and that coupling causes fragility.

Right. The coupling between the tests and the production code must be minimized.

But wait! The tests and the code must be coupled because they both describe the same behavior.

Correct. Their behavior is coupled; but their structure need not be coupled. And even the behavioral coupling need not be as tight as you think.

Can you give me an example?

Suppose I begin writing a new class. Call it X . I first write a new test class named XTest .

But you said we shouldn’t do that.

Bear with me. We’ve just begun. As I add more and more unit tests to XTest I add more and more code to X .

And you refactor that code!

Indeed I do. I refactor it by extracting private methods from the original functions that are called by XTest .

And you refactor the tests too, right?

Absolutely! I look at the coupling between XTest and X and I work to minimize it. I might do this by adding constructor arguments to X or raising the abstraction level of the arguments I pass into X . I may even impose a polymorphic interface between XTest and X .

You’d do all that just for a test?

Think of it this way. The XTest is just the first client of X . I always want to decrease the coupling between clients and servers. So I used the same techniques I would use in normal production code to reduce the coupling between the XTest and X .

OK, but the structure of the tests is still the same as the structure of the code. You still have X and XTest .

Yes, at the class level they are the same; and that’s about to change. Before we explore that change, however, I want you to note that there are already profound structural differences at the method level.

Um. Sure. XTest is just using the public methods of X ; but most of the code is now in the private methods that you extracted.

Right! The structural symmetry is already broken. But now it’s going to break even more.

How so?

As I look at all those private method in X I will inevitably see that there are ways to group those methods into different classes. One group of methods will use a particular subset of the fields of X . That group can be extracted as a class.

But you don’t write a new test for that class, do you?

No! Because every bit of the code within that new class is being covered by the tests that are still just using the public API of X .

And this process continues, doesn’t it?

Yes! More and more functions are extracted. More and more classes are discovered. After awhile we have a whole family of classes in the production code that sit behind that simple API of X .

And they are all tested by XTest .

Right! The structure has been almost perfectly decoupled. What’s more the API of X has been successively refined to be so narrow and abstract that it is minimally coupled to the clients that use it; including XTest .

OK, I see that. I see that the structure of the tests can vary independently from the structure of the production code. And I agree that that’s a good thing. But what about the behavior. They are still strongly coupled by behavior.

Think about what’s going while X is being developed. What’s happening to XTest ?

Well, more and more tests are being added to it; and the interface with X is being progressively narrowed and abstracted.

Right. Now say that first part again.

More and more tests are being added?

Right. And each one of those tests is entirely concrete. Each one of those tests is a small specification of a very particular behavior. Taken together the sum of all the tests is…

The specification of the behavior of the X API.

Right! So as development proceeds the test suite becomes more and more of a specification – it becomes more and more specific.

Sure. I see that.

But what is happening to the classes behind the X API? What would any good software designer do when confronted with an ever growing list of specifications?

Well, of course, the way we deal with complex specifications is to generalize.

Correct! Instead of writing code for each and every case of every paragraph of every specification, we find ways to make the code generic.

Why does this matter to the coupling of the behavior?

As development proceeds, the behavior of the tests becomes ever more specific. The behavior of the production code becomes ever more generic. The two behaviors move in opposite directions along the generalization axis.

And this reduces coupling?

Yes. Because while the behavior of the production code satisfies the specifications within the tests, it also has the ability to satisfy a whole spectrum of unspecified behaviors.

And that last bit is absolutely essential, because no test suite can specify every required behavior. The production code must generalize the subset of behaviors specified by the tests to all the behaviors required of the system.

So you are saying that the test suite in incomplete?

Of course! It is entirely impractical to specify absolutely everything. So what happens instead is that we gradually increase the generality of the production code until every test that we could possibly write will pass.

Woah! We keep writing failing tests in order to drive the generality of the production code to a point where it becomes impossible to write another failing test. Woah!

Woah indeed. But here’s the thing. The act of generalizing is the act of decoupling. We decouple by generalizing!

Oh, wow! And so we decouple structure AND behavior. Wow.

Right! Now, give me a recap.

OK. Um. The structure of the tests must not reflect the structure of the production code, because that much coupling makes the system fragile and obstructs refactoring. Rather, the structure of the tests must be independently designed so as to minimize the coupling to the production code.

Good! And what about behavior?

As the tests get more specific, the production code gets more generic. The two streams of code move in opposite directions along the generality axis until no new failing test can be written.

Good! I think you’ve got it.