Or ‘On the breakdown of Bayesian techniques in the presence of knowledge singularities’

One of the abiding problems of safety critical ‘first of’ systems is that you face, as David Collingridge observed, a double bind dilemma:

Initially an information problem because ‘real’ safety issues (hazards) and their risk cannot be easily identified or quantified until the system is deployed, but By the time the system is deployed you now face a power (inertia) problem, that is control or change is difficult once the system is deployed or delivered. Eliminating a hazard is usually very difficult and we can only mitigate them in some fashion.

This is as Collingridge noted a paradox and it’s a paradox that lies at the heart of our problems in dealing with a lot of new technologies, from drone strikes to genetic screening. By the time we figure out what we need to regulate, there’s always an incumbency; happy, nay eager, to argue as to why we can live with the downside. Collingridge was pretty gloomy about our ability to work on the ‘prediction’ side of the dilemma using a Bayesian approach, due to the high level of ignorance (read ontological risk) which in his view invalidated Bayesian risk assessments.

When change is easy, the need for it cannot be foreseen; when the need for change is apparent, change has become expensive, difficult and time consuming. David Collingridge, The Control of Technology (NY, St. Martin’s Press, 1980)

On the other hand if you wait till late in the day to identify ‘real’ issues then you are dealing with an entrenched technology with all it’s stakeholders and the inevitable tyranny of incumbency. All of which sounds very much like the problem that we face when developing and fielding safety critical systems. So what can we glean from all this? Well Collingridge basically gave up on the prediction side of the problem and instead turned his attention to what he called the property of ‘corrigibility’, that is how easily a decision can be reversed.

A decision is easy to correct, or highly corrigible, when, if it is mistaken, the mistake can be discovered quickly and cheaply and when the mistake imposes only small costs which can be eliminated quickly and at little expense. David Collingridge, The Control of Technology (NY, St. Martin’s Press, 1980)

Corrigibility is intended to deal with the problem we face when making decisions in ignorance, and in essence boils down to, “take small steps and have an exit strategy”. For example, committing entirely to the nuclear fuel cycle, in the case of France (or to coal in Australia), is not a decision that can be undone easily. But adopting a balanced energy portfolio, as the Scandinavian countries have, provides an exit strategy from any one component.

At the level of system design and technology adoption, an incremental trial and adopt approach for new technologies where the new technology or approach is used in a non critical but analogous application first would also satisfy the corrigibility criteria of Collingridge. In other words, the greater the ontological uncertainty the more resource you should apply to incremental exploration of that space and the more prepared you should be to ‘reverse out’ a specific design path.

I’d add that often when doing something for the first time it’s valuable to explore two or three concepts at the mission and architecture level (even if only on paper) because these give one an insight into the strengths and weaknesses of what the high level, and usually irreversible, architectural decisions (and their hazards) will be. See for example the US Apollo programs migration from the “direct ascent” mission architecture requiring the ambitious Nova booster to the final “lunar rendezvous” mission architecture which required the much less challenging Saturn.

Recursively this approach also has application to the behaviour of systems operating in environments of ontological risk. In such environments a critical system should also exhibit corrigibility, that is not commit irreversibly to a state carrying with it a potentially unacceptable consequence, if that commit decision could prove to be wrong.

As another example, this time at the level of hardware or software, when introducing novel components, design with a ‘new/old’ redundant architecture, that is with one redundant element based on simple, mature and well understood design principles, to give a ‘fail simple’ architecture.