Of all the recent new technologies thought to have promise for the public sector, few have captured attention as much as Artificial Intelligence (AI). It’s not hard to see why. AI can do things that public sector organisations really care about and need. Analysing case notes to determine an individual’s future likelihood of needing certain interventions. Spotting tumours in X ray scans. Identifying risky behaviour from CCTV footage. Predicting where certain crimes are likely to occur. Detecting fraudulent benefits or tax claims. Optimising traffic intersections. Enabling smart chatbots to answer citizen questions. As the world in general - and the public sector specifically - becomes ever more saturated with data, AI offers to make it intelligible.

Ethical concerns

Yet, there are also concerns that AI could be used by public sector organisations in ways that invade privacy; or cause harm, unfairness and moral wrongs. These concerns tend to focus on one or more of the following three stages of deploying an AI.

Three stages of deploying an AI

First, how the AI is created (Creation). Does the AI use training data that invades or poses significant risks to individuals’ right to privacy? Is it accurate and truly representative? Does it contain historic biases that could be perpetuated? Second, how the AI works (Function). Are the assumptions used by the AI correct? Are the processes used by the AI to make a decision reasonable and fair? Can anyone see and understand how the AI works and audit how a given output was created? Can we be sure the AI is protected against hacking and manipulation? And third, what the AI is used to do (Outcome). Is the AI being used to do something unethical? Will people know if a decision affecting them was informed by an algorithm? Will those using an AI blindly follow its ‘decision’ rather than applying the correct measure of professional judgement? What recourse will people have if an AI discriminates against them or causes them harm? Other concerns cut across all three stages. For example, whether an AI is being deployed in an appropriate context in the first place, and used in the way it was intended.

Codes for the responsible use of AI

Given these (and many other) concerns raised, a variety of academics, companies, cities and governments have attempted to outline codes for the responsible use of AI. In June 2018, Google published seven principles for the use of AI. In December 2018, the European Commission’s High Level Expert Group on Artificial Intelligence shared its Draft Ethics guidelines for trustworthy AI founded on fundamental principles of individuals’ rights. Many more codes, standards and principles can be found on Nesta’s Map of the global AI governance landscape. (I’ll pause here briefly to note the critique that their reference to ‘ethics’ can seem a tad highfalutin for many of the more mundane principles advocated.) Across nearly all these documents, we notice striking similarity in their principles and recommendations. At the Creation stage, common calls are to publish and minimise bias in the training data; respect privacy; avoid using data on sensitive factors such as race and religion; and ensure data is handled in compliance with data protection rules such as GDPR. At the Function stage, recommendations typically suggest making the code of an AI transparent and open for inspection; identifying and minimising bias and limitations in the AI’s assumptions; ensuring the function of the AI can be explained; and protecting it from manipulation and hacking. And at the Outcome stage, recurring suggestions involve ensuring that an AI’s intended and actual outcomes are fair, transparent, legal and aligned with human values, and that its outcomes can be explained; putting in place a process of oversight and evaluation; and holding a person accountable for decisions made. All such recommendations are well intentioned. Many provide helpful guidance on the appropriate direction of travel for AI. However, nearly all are flawed or of minimal utility for one of two reasons. Those - like Google’s - that offer broad principles, are so high level as to offer almost no guidance on how to act in any specific scenario. Meanwhile, the recommendations in more detailed codes tend to cease to be practical or meaningful as the complexity of an AI increases.

AI complexity and its impact on ethical codes

To understand the latter point, let’s outline three broad levels of complexity of AI.

Different levels of complexity of AI: 1-3

The simplest form of AI (let’s call it Level 1) involves using a few structured datasets to correctly weight factors that are deemed to be important by humans. For example, firefighters could be asked to list the factors relevant to a building’s fire risk. Datasets can then be sought that relate to those factors in order to train an AI. Machine learning is used in a one-time process to weight the factors according to the extent they are predictive of a high risk building. At the next level of complexity (Level 2), instead of merely weighting factors deemed significant by humans, an AI can decide for itself what factors are relevant, how they should be weighted, and how they lead to a given outcome. For example, a local authority could use machine learning to analyse thousands of free-text case notes about vulnerable children to spot patterns and correlations that predict which of them are most likely to be taken into care in the future. At the more advanced end of AI (Level 3), neither the training data nor the models created are static. The model is continuously updated based on new data, which will often be unstructured and unlimited. Imagine a police surveillance system that constantly analyses CCTV footage and sound data from dozens of train stations in order to spot suspicious behaviour. Now, I am not implying these are formal or exclusive categories; I use them merely to explain my case. For when we consider these different levels of complexity, it’s clear that Level 2 and 3 applications make certain recommendations highly problematic, if not impossible. Start with the Creation stage. Being transparent about the training data used to create an AI is, in principle, possible for Levels 1 and 2, which use a finite number of datasets. However, it becomes significantly harder at Level 3, where the training data is essentially unlimited and constantly changing. At both Levels 2 and 3, analysis for biases in the training data is extremely hard if the quantity of data is vast and unstructured. For example, an algorithm could generate a model which has learned the race or religion of an individual from other features, which are then used to predict an outcome. In the case of AIs that analyse thousands of hours of unstructured data like video footage, meaningful inspection is not impossible (summary statistics and samples of screenshots from significant events could be analysed) but it’s at best challenging and partial. Even greater complications exist at the Function stage. Calls to make the code of an AI open often won’t achieve meaningful transparency or 'explainability' at Levels 2 and 3. Even if viewable, the code may be essentially uncheckable if it’s highly complex; where the model continuously changes based on live data; or where the use of neural networks (a common form of machine learning) means there is no single ‘point of decision making’ to view. To understand this, consider the famous case of DeepMind’s AphaGo AI. It beat the world’s best Go player, not by being trained on the best strategies (ruled based AI), but by playing millions of games against itself (learning based AI). In this scenario, no human coder can sensibly explain how the AI is reasoning, only what outcome - winning the game of Go - it’s optimised to achieve. Finally, at the Outcome stage, the recommendations for explanation and accountability for the outputs of an AI are highly problematic if no one is able to understand the inner workings of the AI at the Function stage. How can a decision be explained, and how can a public sector worker know when to override an AI’s assessment with their own professional judgement, if they cannot even in principle know how the AI reasoned? In the tables that follow, for each of the most common recommendations at the three stages: ✔ indicates it’s straightforward / possible ✘ indicates it’s extremely hard / impossible ~ indicates it’s only possible in some circumstances

Viability of ethical recommendations for the 3 levels of AI - Creation

Viability of ethical recommendations for the 3 levels of AI - Function

Viability of ethical recommendations for the 3 levels of AI - Outcome

Different readers will, no doubt, challenge my assessments in the tables above. In part, it may be that we all have different examples in mind when we try to envisage each level. However, the key point I’m trying to make is that it’s extremely hard to conceive of a single code of AI ethics that could genuinely cover all instances of complexity of AI, unless it's so high-level as to offer little practical guidance.

Where next?

Where does this leave public sector organisations who want to ensure their approach to AI is ethical? Let’s start by pointing out three reasons not to worry unduly. First, many public sector applications of AI are much closer to Level 1 than Level 3. The more severe challenges noted above are therefore likely to be relatively rare, at least in the short term. Second, we should not assume that every error in the outcome of an AI will cause serious injustice. AIs can be deployed for harmless activities, such as playing Go (the only harm inflicted presumably being that on the self esteem of its human opponents). In the public sector, equivalent low risk activities might be AIs that personalise a chatbot for routine enquiries, or optimise waste management collection routes. Some commentators, like, Kai-Fu Lee, might also add that human decision making is prone to flaws and biases of its own, and most uses of AI are likely to represent an improvement. Third, we should remind ourselves that AI does not exist in a legal or ethical vacuum. There are established principles for the fair and responsible use of data within UK and EU legislation. Though not yet tested in practice, GDPR includes the right to explanation in any process in which a human is not involved. Countries also guarantee certain rights to citizens, and have regulatory requirements and restrictions on the actions of the public sector. We should assume that public sector organisations will at least set out with the firm intention that their use of AI will be compatible with these existing rules. However, providing nothing beyond these existing provisions feels dangerous - and not just to the critics. For those of us who see AI’s positive potential to help the public sector better serve citizens, we need to be proactive in ensuring the public can have real trust in how it’s used. It would only take a few scare stories to risk setting back beneficial applications by years. So here’s my suggestion: let’s ditch the codes and choose the harder but potentially more effective path of educating, guiding and then trusting in the professionalism of public sector staff. One part of the rationale for this approach are the three points made above. As we’ve established, the wide variety of different contexts in which an AI might be used, and the risks associated with doing so, will vary considerably from case to case. As a result, those who are intimately involved in the particulars of each scenario are best placed to use their professional judgement as to what level of care and attention are required. In a climate where trust has to be continually earned, the reputational penalty for getting it wrong in an area of any significant consequence will be severe. One would hope that that would be incentive enough. Beyond that, it’s often argued that the future of employment will entail humans learning to work alongside technology to augment their capabilities. We will not achieve that in the public sector by infantilising staff and delegating the thinking about the difficult nuances of using inherently complex technology to someone else. Indeed, there is a serious downside if we don’t ask our public sector staff to take real ownership of how they use AI. As Batya Friedman and Peter H. Kahn Jr argued as long ago as 1992, when “human users are placed largely in mechanical roles, either mentally or physically,” and “have little understanding of the larger purpose or meaning of their actions... human dignity is eroded and individuals may consider themselves to be largely unaccountable for the consequences of their computer use”. So how do we equip staff with the ability to use AI with good judgement? Education and training concerning the nuances above will be important. I also propose that they should be supported to ask the right questions about what the use of AI means in their given context. It’s for these reasons that in 2018 I proposed the 10 questions outlined below. The point is not that there is a set of ‘right’ answers; we have already established that context, risk level and complexity makes that all but impossible. Rather, I think it would be unacceptable for a public sector organisation to deploy an AI in a live environment without first being able to have an answer to these questions (and to provide those answers when challenged):

For those questions that we have seen are difficult to handle in more complex instances of AI (such as those about the assumptions and training data), public sector staff need to make a judgement call on whether the lack of a detailed answer is reason enough not to proceed. They might, for example, decide that a particular domain of use is sufficiently sensitive that they are not willing to use an AI in the absence of being able fully to account for and explain its operation, from training data to outcome. My point is that setting this as a hard rule for all contexts (as many codes imply), would - at least for now - prohibit the public sector from benefiting from some of the more advanced, Level 3, forms of AI. Better, surely, for some professional judgement, which might follow a heuristic along the lines of the graph below.

For simple, low-risk contexts, like waste collection, it might not matter if we don't understand how decisions are being made. But for high-risk areas, like child protection, complex and dynamic algorithms aren't appropriate because we need explainability at a level that isn't possible. I would rather a public sector professional made that call, rather than a distant policymaker.

New tech, new responsibilities