Emissions Test-Driven Development is a software development methodology adapted specifically to the programming of internal combustion engine control units (ECUs).

ETDD is a subcategory of a more general form of software development known as "Adversarial TDD", which is a kind of Test-Driven Development.

In Test-Driven Development, the software development cycle begins with the creation of a test case, which naturally fails at first. Once this is done, a minimal amount of code is produced which causes the test case to pass. The process continues, tests first, code second, with the tests forming an implicit specification for the behaviour of the software. If the tests pass, then the software is considered correct. Assuming the tests fully encompass all the desired behaviour of the software, then the software is complete.

(Opinions differ on whether this a universally sound approach for creating good software. But opinions also differ on whether good software is theoretically achievable by humans, whether there is such a thing as good software at all, and even on whether an empirical definition of "good software" can ever be reached. Be that as it may, unquestionably TDD is a real thing which real software developers practice.)

It's not uncommon in TDD (and elsewhere) for the tests and the software to be written by different people. In Adversarial TDD, not only are the people developing the software (henceforth "the developer") and the people developing the tests for that software (henceforth "the tester") different people, not only are they physically separated and not in communication, they work for entirely different organisations and have opposing goals.

In the particular case of Emissions Test-Driven Development, the goal of the tester is to ensure that the internal combustion engine under test is clean, meeting certain emissions standards in regular use. Meanwhile, the goal of the developer of the ECU is to make a fast, powerful, and perhaps fuel-efficient engine which happens to also pass the tests. To be clear: the developer cares not one jot about emissions. Just emissions tests.

*

It's only fair to note at this point that neither the developer or the tester necessarily care about anything by default, and their priorities are largely received from higher powers. Still, these are the motivations at work here.

Experimenting with Adversarial TDD

We tried this at work once, as an exercise. We were divided into pairs and tasked with implementing Conway's Game Of Life.

One of us was the tester, the other was the developer. The tester was to write exactly one unit test case. Then, with no communication and no cooperation, deliberately trying to blot out all knowledge of Conway's Game Of Life, the developer was to write the bare minimum amount of code required to make the unit test case pass. After this, the keyboard was passed back to the tester and the cycle continued.

We immediately observed a pattern. The first test case could only ever expect a single result. It would look something like:

assert is_alive([ [0, 0, 0], [0, 0, 0], [0, 0, 0], ], 1, 1) == False

(Java, obviously.) Given this, the first implementation of is_alive would invariably read as follows:

def is_alive(a, b, c): return False

Which is to say, in general, returning the same result every time.

After adding another test case:

assert is_alive([ [0, 0, 0], [0, 0, 0], [0, 0, 0], ], 1, 1) == False assert is_alive([ [1, 1, 1], [0, 0, 0], [0, 0, 0], ], 1, 1) == True

The developer would reluctantly produce a minimum-effort implementation such as:

def is_alive(a, b, c): return a[0][0] > 0.8

As time went on, the amount of effort the developer needed to expend in order to continue the bluff increased. Taking it as a challenge, some developers were able to continue the bluff for quite some time. Still, eventually the sheer weight of tests made it so that continuing the bluff was impractical, and it was impossible to pretend not to understand what the tests were really testing for. At this point, the developers gave up and implemented Conway's Game Of Life, as the path of least resistance.

Of course, by this time the lesson of the exercise had been learned: development and testing are cooperative roles. Even if the two roles are in separate people, they need to have a common goal. There can't be barriers of communication between them, they must work together.

Heck, even if there are real, good reasons for the communications barrier — say, the developer is building a clean-room reimplementation of a piece of software which, for legal reasons, they cannot directly inspect, only its test suite — there still needs to be a good faith effort, on the part of the developer, to build the thing which the tests clearly "want".

If there isn't good faith, it can get very difficult.

Bifurcation threshold

We ended that exercise there, at the unit test case level, while it was still funny. But there's no reason why we couldn't have continued. If the developer is motivated by factors other than mere difficulty of implementation, then the charade can continue indefinitely.

Quite quickly the developer would arrive at a situation where the simplest course of action is to produce two implementations. One would be the genuine software desired by the tester and their tests. The other would be the software which the developer really wanted to build. Crucially, the switch in behaviour would be controlled by the environment. If there's a test running, the software would behave like this. If not, it would behave like that.

Broken promises

For a concrete example, consider the extremely stringent, technical and precisely-stated Promises/A+ specification, and its associated Compliance Test Suite. The Compliance Test Suite is rigorous and tests every single part of the Promises/A+ specification; it is entirely safe to assume that a piece of software which passes this specification is a bulletproof, totally correct implementation of the spec, and deserves the right to use the logo.

And then consider broken-promises-aplus , a totally compliant implementation which only behaves like a compliant Promises/A+ library when it detects that the Compliance Test Suite is running. When used in practice, it never does any work (there is no documented public API for this), and whenever you call then it throws what at first glance appears to be an exception.

Naturally, it could do something more puzzling — such as only working ninety-nine times out of a hundred — or something more nefarious — such as locking itself into a busy loop, consuming CPU power, using up electricity and increasing global CO 2 emissions.

*

How does it work? The Compliance Test Suite gives the implementer carte blanche when implementing an "adapter" between the suite and the actual library. So in this case, the adapter sets a special "THIS IS A TEST" flag and passes this forward to the library.

If the adapter's structure were properly locked down, more advanced approaches would still be available. It would be possible for broken-promises-aplus to inspect the command line for strings like " npm test ", or to test whether the promises-aplus-tests library is currently loaded.

Assuming no other information is available, the library still has access to the arguments of each API call, the sequence in which each call is made and the time interval between each call... in other words, the usage profile. It could compare this profile with the known behaviour of the Compliance Test Suite to determine whether it was likely that a test was in progress.

*

Turning back to ETDD, take a look at this diagram. The horizontal axis is the amount of time since the Volkswagen engine was turned on. The vertical axis is the distance driven. The coloured lines mark pre-programmed settings inside the engine control unit; if the usage profile crosses any of these coloured lines, it triggers a change in behaviour.

You can see that the lines clearly mark out three relatively narrow, straight channels. The profile of an emissions test always passes down one of these three channels. When this happens, the ECU conforms to low-emissions standards. When the profile steps outside one of the channels, "regular" behaviour appears. It's that easy to detect an emissions test in progress!

Fighting back

The crucial point is this: it needs to be impossible to programmatically distinguish a testing scenario from regular use.

Specifically, it needs to be impossible to programmatically distinguish an emissions test from regular driving.

This may be a much taller order than it first appears for two major reasons.

Time and distance cannot be the only two pieces of information available to the ECU. The ECU receives data from all over the engine and there's no particular reason (is there?) why it can't receive data from all over the car. This is your attack surface. I'll admit I'm not a car person, so I wouldn't even like to speculate where this starts or stops. There are numerous mechanical techniques to improve fuel economy, such as taping over joints in the bodywork for improved aerodynamics, and overinflating tyres. Can the ECU detect tyre pressure and temperature? Does it know the position of the pedals? Can it guess how many real humans are in the car, based on whether the seatbelts are buckled and whether pressure sensors in the seats are fluctuating as people move? Does it know whether the stereo is on? What about the steering wheel? Can the ECU use steering and speed data to work out the shape of the "road"? Can it compare that shape with real maps to determine whether it's driving on a real road or not? Having a test which is undetectable means that it must also be unpredictable, i.e. randomised. This is somewhat antithetical to having a test which is standard across all engines, whose specification can be made public, and which gives fairly comparable results. I don't think this part is unsolvable, but it's certainly a problem.

Conclusions

Honestly? I blame the testing regime here, for trusting the engine manufacturers too much. It was foolish to ever think that the manufacturers were on anybody's side but their own.

It sucks to be writing tests for people who aren't on your side, but in this case there's nothing which can change that.

Lesson learned. Now it's time to harden those tests up.