I’m a unit test aficionado, and, as such, have attempted to unit test what really shouldn’t be. It’s common to get excited by a new hammer and then seeing nails everywhere, and unit testing can get out of hand (cough! mocks! cough!).

I still believe that the best tests are free from side-effects, deterministic and fast. What’s important to me isn’t whether or not this fits someone’s definition of what a unit test is, but that these attributes enable the absence of slow and/or flaky tests. There is however another class of tests that are the bane of my existence: brittle tests. These are the ones that break when you change the production code despite your app/library still working as intended. Sometimes, insisting on unit tests means they break for no reason.

Let’s say we’re writing a new build system. Let’s also say that said build system works like CMake does and spits out build files for other build systems such as ninja or make. Our unit test fan comes along and writes a test like this:

assert make_output == "all: foo

foo: foo.c

\tgcc -o foo foo.c"

I believe this to be a bad test, and the reason why is that it’s checking the implementation instead of the behaviour of the production code. Consider what happens when the implementation is changed without affecting behaviour:

all: foo

foo: foo.c

\tgcc -o $@ $<

The behaviour is the same as before: any time `foo.c` is changed, `foo` will get recompiled. The implementation not only isn’t the same, it’s arguably better now, and yet the assertion in the test would fail. I think we can all agree that the ROI for this test is negative if this is all it takes to break it.

The orthodox unit test approach to situations like these is to mock the service in question, except most people don’t get the memo that you should only mock code you own. We don’t control GNU make, so we shouldn’t be doing that. It’s impossible to copy make exactly in a mock/stub/etc. and it’s foolish to even try. We (mostly) don’t care about the string that our code outputs, we care that make interprets that string with the correct semantics.

My conclusion is that I shouldn’t even try to write unit tests for code like this. Integration tests exist for a reason.