Generally, I prefer the GOOS school of TDD which includes isolating my classes as much as possible, putting mocks and stubs everywhere. Even though one of its known disadvantages is that you risk testing your classes in a fake environment that won’t match the real production code using it, I’ve rarely come across a place where I got really bitten by it.

Today I set out with my pair to add some functionality to a certain class. That class had about 30-40 lines of code and about 10 test cases, which seemed quite decent. We added our changes TDD style and just couldn’t get the thing working. After digging into it for a few more minutes we suddenly realized the class shouldn’t be working at all and checking in the DB showed that indeed the last time that specific feature had any effect was 3 months ago!

Fortunately for us, all the problems that caused this bug are solved problems, we just need to get better at implementing the solutions:

Isolated tests go much better hand in hand with a few integration tests (some might say the right term is acceptance tests) that execute the whole system and make sure the features are working. Had we had those, we would have caught the bug much sooner.

The bug was introduced in a huge commit that changes 35 files and 1500 lines of code. We usually try and go over every commit made, even if it was paired, because we believe in collective code ownership, but it’s impossible to go over such a huge diff and find these intricacies. Working in small baby steps makes it far less likely to break something and more likely that someone else will spot your mistakes. Huge refactorings give me the creeps.

After the change was committed, it was not followed-through: this specific feature is a feature you usually notice over a few days and we missed out on making sure it kept working. We moved on to other tasks and forgot all about it, thinking it was working all this time. Had we taken the time to make sure we were seeing, it would have been squashed by the next deployment.

Any of these would have helped us spot sooner that the isolated tests were actually testing the code against a scenario that never happens. These tiny changes of our workflow would have made several of our users happier over this timeframe.

Hopefully all is well now and the feature is back at 100%, but only time will tell whether we were able to learn from this mishap.

You should subscribe to my feed and follow me on twitter!