Recognition of the powerful pattern matching ability of humans is growing. As a result, humans are increasingly being deployed to make decisions that affect the well-being of other humans. We are starting to see the use of human decision makers in courts, in university admissions offices, in loan application departments, and in recruitment. Soon humans will be the primary gateway to many core services.

The use of humans undoubtedly comes with benefits relative to the data-derived algorithms that we have used in the past. The human ability to spot anomalies that are missed by our rigid algorithms is unparalleled. A human decision maker also allows us to hold someone directly accountable for the decisions.

However, the replacement of algorithms with a powerful technology in the form of the human brain is not without risks. Before humans become the standard way in which we make decisions, we need to consider the risks and ensure implementation of human decision-making systems does not cause widespread harm. To this end, we need to develop principles for the application for the human intelligence to decision making.

Before humans become the standard way in which we make decisions, we need to consider the risks and ensure implementation of human decision-making systems does not cause widespread harm.

Below I suggest four draft principles that we should apply to the use of human decision-makers and describe how humans often fall short of meeting them. These principles are not complete but are designed to open to a conversation about how human decision making can be deployed to greatest benefit. We need to do this before they become the default method of making decisions.

Avoid creating bias

Humans are biased decision makers, in more senses than one.

First, humans predictably and routinely deviate from many of the established rules of probability and logic. Humans have been shown to assign higher probabilities to specific events than the larger set of events for which they are also apart. Humans often neglect the base rate occurrence of an event, focusing on the idiosyncratic features of the particular case in front of them in determining its probability.

Undoubtedly, some of what we call biases come from a misunderstanding of the objectives of the decision maker. “Biased” decision rules may also be more likely to deliver good outcomes in an uncertain environment. But we now have ample evidence that people in some of our most important decision-making environments are systematically erring. For instance, judges and loan applications officers exhibit the gambler’s fallacy. Doctors can be poor Bayesians when information is presented in unintuitive ways.

Second, humans demonstrate considerable bias to outgroups. They often output different decisions based on the sex, age, or race of the subject of the decision, despite those factors not being relevant to the decision.

While algorithms have certainly been demonstrated to be biased in some circumstances, due to both the data from which they were developed and the biases of the analyst developing them, the bias of these algorithms has typically been less severe. Further, the potential to systematically audit algorithms and implement improvements has steadily seen a reduction in their level of bias.

Various techniques have been developed to reduce human bias. Unfortunately, these techniques have limited demonstrated success at scale and may even backfire. Until these human debiasing techniques reach the efficiency of our regular auditing, review, and modification of algorithms, we should not implement these human decision systems.

Transparency and interpretability

Human minds are black boxes. While humans create the impression of transparency through the verbal and written explanations that they offer, there is strong evidence that these explanations cannot be trusted to provide the true basis for the decision.

One piece of evidence comes from the split-brain research of Michael Gazzaniga. Some patients with severe epilepsy have the corpus callosum that joins the two hemispheres of the brain surgically severed, effectively resulting in two independent hemispheres.

Gazzaniga showed images to split brain patients. For patient P.S., he placed a chicken claw in the right visual field, which projects to the left hemisphere. He placed a snowy scene on the left, projecting to the right hemisphere. He then asked P.S. to point to one of an array of images and asked which matched the pictures P.S. had seen. P.S.’s right hand, linked to the left hemisphere, pointed to a chicken. The left hand, linked to the right hemisphere, pointed to a shovel. Why did P.S. point to the shovel? Here P.S.’s left hemisphere, where language capabilities typically sit, took over. “Oh, that’s simple. The chicken claw goes with the chicken, and you need a shovel to clean out the chicken shed.”

The mind is a great improviser. Or to put it more bluntly, in the absence of knowledge, the left hemisphere simply made the reason up.

Until these human debiasing techniques reach the efficiency of our regular auditing, review, and modification of algorithms, we should not implement these human decision systems.

The lack of transparency of the human mind is also apparent from broader experiments on how we reason. Typically, intuitions come first, reasons later. For instance, when hypnotised to feel a flash of disgust when reading arbitrary words, subjects later made up absurd reasons to justify judgments they had made on the basis of implanted words. Our gut feelings are integral to how we decide, yet the sources of these feelings are not observable or reliably reported.

Transparency is one of the most difficult principles to solve. We can never have access to the full training data that the human has been exposed to during their development. Absent this training data, we lack an understanding of the patterns that a particular human decision maker is likely to spot. Even if we can observe the full neural network that comprises the human brain, we have no present ability to extract reasons from this observation.

Conversely, the algorithms we have chosen to implement in the past are clearer to understand. While some classes of algorithms, such as our deep neural nets, present difficulties in their interpretation, we have started to see the development of interpretative technologies. But more importantly, the simple algorithms we do tend to use still gives us some great results while maintaining interpretability and transparency.

Consistency

Humans are noisy decision makers. Two different humans confronted with the same decision will often come to a different conclusion. The same human confronted with a decision on different occasions will also often decide inconsistently.

As examples of the size of this variation, software programmers differed by a median of 71 percent in the estimates for the time to complete the same project. Pathologists assessing biopsy results had a correlation of only 0.63 with their own judgment of severity when shown the same case twice (the same answer each time would result in a correlation of 1).

We are starting to use human decision makers in many domains where we should have consistent decisions. Gross differences in outcome are based on little more than the luck of the draw. And this burden does not fall equally. Those who are more sophisticated are able to exploit this inconsistency,

Conversely, provided we correctly measure and code the inputs, algorithms provide the same decision every time. Those subject to the decision are not going to be at the whim of how much sleep the decision maker had the night before, the order of their application in the queue, or the time of day (factors which the black box human mind does not include in the explanation of why they made the decision).

Standards of scientific excellence

Human decision makers are often poor substitutes for the primary purpose for which they are implemented. That is, they are typically outperformed by algorithms in decision making quality. Today, as companies ride the zeitgeist and appoint their first Chief Human Officers, the human underperformance is often forgotten or ignored, with some classic stories of human success often overshadowing the more mundane lack of performance. In fact, it is difficult to find domains where human decision makers are clearly the superior option.

Humans may be a powerful technology with great potential. But until we have developed human decision-making systems that comply with some basic principles, we risk substantial harm.

Stories of hybrid decision-making, in which humans work with algorithms, are common. They are often provided as a reason why we should include humans in the loop. Yet despite the stories about successful human-algorithm teams, the typical case results in degraded performance relative to the algorithm alone. To develop successful teams involving humans, we need much more work on how to get the humans to effectively work with the algorithms and avoid interposing their judgment too often.

Shaping the future of human decision makers

These principles are only the start of the discussion we need to have about the use of human decision makers. Before we deploy, we need to accept the basic evidence of the harm they cause in many situations, and their low accuracy, transparency, and consistency relative to algorithms.

We also need to continue building our evidence base. We should be systematically reviewing the quality of human decisions where they are made, measuring performance, and comparing those measures against our algorithmic benchmarks. Humans may be a powerful technology with great potential. But until we have developed human decision-making systems that comply with some basic principles, we risk substantial harm.