Ideasroom

Tread carefully with big data ethics

It is important to develop ethical control of big data, but we should do so carefully, warns the University of Auckland's Tim Dare

Sophisticated computer algorithms can be used to support important decisions about individuals – decisions such as whether they receive a loan, are subject to extra searches at a border, or are given extra social or health support. But the complex patterns such algorithms identify in very large data sets may be beyond human comprehension. Humans may know that a system has allocated a certain score to an individual without being able to understand or explain why.

New European data protection guidelines have gone some way towards banning such systems. Those affected by automated decision-making systems are entitled to “meaningful information about the logic involved”. Our own Privacy Commissioner has, with Stats NZ, just issued a set of principles for the use of data and analytics, which specify that “explanations of decisions – and the analytical activities behind them – should be in clear, simple, easy-to-understand language”.

Even though data ethics is still an emerging field, this is to say, ‘transparency’ is already widely-accepted as crucial to ethical data uses.

It is not hard to see why. If we don’t know what a system is doing, we can’t be sure it isn’t relying on spurious and legally or ethically improper correlations. In the US, for instance, police ‘stop and search’ records show a correlation between drug possession and ethnicity simply because young black men are more likely to be stopped and searched than their white counterparts. The records evidence bias that may not be apparent in opaque automated decision-making systems. And if and when it becomes clear that something has gone wrong – perhaps the outputs of the automated system are clearly failing to track illness or need or risk - we may have no idea what to do if we cannot understand the inner workings of the system.

There is another, less practical, but important justification for transparency. I recognise the moral agency and value of those I deal with by explaining what I am doing, by giving them reasons, and by receiving explanations and reasons from them. One pretty sure sign I don’t acknowledge your moral agency or value is that I treat you as someone who doesn’t warrant an explanation. Algorithmic black-boxes appear to prevent us giving reasons and explanations, and so to prevent us recognising the moral agency and value of those they affect.

What matters is not transparency, or ‘explainability’, but whether I have evidence of reliability: it doesn’t matter how the thermometer identifies my temperature as 36.7º, provided that I know that it does so reliably.

However, there’s something puzzling here. The consensus around transparency may have emerged too quickly since we often rely upon non-transparent tools to support important decisions.

Suppose my GP takes my temperature with an old fashioned mercury thermometer. I have a rough idea how it works: the molecules in the mercury move about when heated and it expands up a tube in a predictable way. Perhaps my GP understands it better than I do; perhaps not. Suppose she uses one of the fancy digital thermometers? Now I’m completely out of my depth, and probably my GP is too.

These simple, everyday tools, which might be used to support very important decisions (is it safe to give me a flu shot?) are, at least in my GP’s surgery, neither transparent nor explainable. But I shouldn’t care. What matters is not transparency, or ‘explainability’, but whether I have evidence of reliability: it doesn’t matter how the thermometer identifies my temperature as 36.7º, provided that I know that it does so reliably.

And it may be evidence of reliability – rather than transparency – that we should insist on in the case of automated decision-making systems too.

It might seem that there is an important difference between thermometers and algorithmic black boxes. Someone knows how the thermometers work, even if I don’t, but an algorithm may find patterns between too many variables, ranked, and weighted in too many ways, for any human to understand.

Does it matter that someone understands the thermometers even if I do not? Well, we rely on technologies which are opaque in this more dramatic sense too. MRIs rely on quantum mechanical explanations of the spin and orbital angular momentum of subatomic particles, and “I think I can safely say that nobody understands quantum mechanics” (Richard Feynman). Should we stop relying on MRIs? No, not if we know they reliably produce accurate images.

What of the important reasons to require transparency? Emphasis on reliability does not diminish the importance of identifying the effect of spurious and improper correlations or developing the capacity to work out what has gone wrong when systems do prove unreliable. Those are important goals, though the level and nature of transparency they require is thus far unclear.

Finally, explanations about reliability do seem as if they could be adequately respectful. When I ask about the MRI my GP will probably give me evidence that the scans are accurate and useful, and that – rather than a course in quantum mechanics – seems just the sort of thing I am likely to want.

It is important to develop adequate ethical control of big data, but we should do so carefully, lest our concern for the unfamiliar leads us to impose constraints that sit uneasily alongside everyday practices with which we are justifiably comfortable.

Professor Dare will talk about issues around the ethical use of data at his inaugural lecture, Big data, transparency and explainability, at Old Government House on July 19 at 5.30pm.