Managing Automated Systems in a Time of Crises

Automated decision-making systems are increasingly pervasive in every aspect of our lives. Algorithms have taken center stage in flying our planes, handling our money, and monitoring our health. But crisis situations impose challenges for the operators of such systems. It can take time to recognize when a system appears past the boundaries of its ability, namely, it is challenging to determine whether the system is facing a situation that is very different from anything that its training data would allow it to interpret. The challenge for the operator is judging when and how to intervene in order to prevent major damage from occurring. As the operator of an automated AI-based systematic investment algorithm, I maintained a diary of my thinking and actions as the COVID-19 phenomenon unfolded. I used my notes to write this case study which highlights the general challenges that designers and operators of AI systems face in times of crisis.

Introduction

Something didn’t seem right in the financial markets on Monday morning, February 24 2020. I wondered whether there was a decimal error in the P&L number that the machine sends out periodically. The number was correct. I scoured the news. Financial markets appeared to be acknowledging the significance of COVID-19. But why today, a full seven weeks after the spread of the deadly virus was first made public by China, and almost a month after it started appearing in the headlines globally?

I had been discussing the implications of the virus in my Systematic Investing class at NYU Stern since late January. Most of my students favored strategies such as shorting the airline sector in anticipation of plummeting demand, and shorting equity markets in general due to potential disruptions in global supply chains. But the market had ground steadily up, hitting a high on February 19. The S&P500 chart below is what the machine was seeing from December 2019 to Friday, February 21, 2020. Smooth sailing and new highs.

Over the weekend, South Korea had gone on high alert in response to the spread of the virus, and a story reported the jump in Coronavirus cases in Italy. But the news had largely focused on the election, such as which of the Democratic candidates was the best Trump challenger. Trump was center stage and Sanders was rising as shown in the word cloud.

If financial markets were shrugging off the virus, was I being paranoid? Was this just another tremor, like so many others? Should I step in and take control back from the automated trading system that had been running on autopilot reliably since 2009?

Should I continue to trust the robot or should I trust my gut?

When do we trust machines?

My research argues that we trust machines with decision making if their rates of error and the consequences of those errors — typically in the worst-case — are acceptable. As error rates increase, we require that the consequences of the errors be less costly.

Figure 1 shows a “trust heatmap” that I developed several years ago to evaluate these trade-offs. The dark green zone of high predictability and low error costs allows higher trust in automation. Problems with higher error costs belong in the red zone, where humans make the final decisions.

The heatmap explains why my colleagues and I can automate financial trading decisions despite the low predictability of financial markets, as long as the costs of error are low. In contrast, most of us are still hesitant to fully trust driverless cars, even though they are rarely wrong, because of unacceptably high error costs. Most domains, such as healthcare, lending, fraud prediction, etc., lie somewhere in between these two extremes in terms of predictability and error cost.

The placement of problems on the heatmap is dynamic; more or better data and algorithms can shift problems towards the right in the heatmap. Regulation and changes in norms can move them up or down.

The “automation frontier” denotes the area below which automation becomes compelling.

Crisis environments create considerable uncertainty regarding the cost of errors. In such settings, an airline pilot (or algorithm) must be able to make the right decision in the heat of the moment and in mid-flight. The same applies to the healthcare professional (or a decision making system) in a life and death situation when a patient is in critical condition and worsening. Or to an investment professional (or algorithm), confronting a new “unknown unknown.” In crises, the spike in the cost of error shifts the problem upwards into the red zone. Furthermore, in some cases, because the environment is often quickly changing, and typically unfamiliar, the likelihood of errors increases, pushing these decisions towards the red zone.

What should a decision maker who uses automated decision tools during normal times, do in a crisis situation to avoid bad outcomes?

In the late 90s, I created one of the earliest machine learning based investment systems. That system, updated and improved, has been operating on “autopilot” since 2009 in the aftermath of the previous financial crisis. Autopilot means two things. First, the machine retrains its predictive model periodically according to a well-defined algorithm and based on well-defined monitoring criteria and as more data become available. Second, the machine updates its investment positions regularly, by applying its models to the most recent data. As the operators, my staff and I keep an eye on things.

Intelligent machines operate on autopilot in many parts of our lives, often driven by large volumes of data. During normal times, they are great boons. But during times of crisis, shocks induce substantial uncertainty and fear about the ability of automated systems to perform, and for good reason. The root of this fear is a fundamental concern that that when confronted by a truly novel situation, the machine will do something stupid, dangerous or costly (or all three).

This is not unique to financial trading. The two crashes of Boeing’s MAX-737’s provided a prototypical case of automation failure combined with the absence of a well-known process that the pilots could invoke to deal with that type of failure. In the heat of the moment, the crew were unable to figure out how to disable the automation and retake control. Equally tragically, the machine didn’t have sufficient intelligence to recognize its own need to cede control to human experts who had the capability to operate the aircraft manually in challenging situations. The result was a catastrophic failure in the human-machine interface.

COVID-19 is an entirely new crisis, whose health and economic consequences we are still trying to understand. But how did it appear as it has unfolded in real time?

As part of my trading practice, I kept a diary as the COVID-19 pandemic played out. It is a transcript of my experiences as I was updating my beliefs with increasing information about the crisis, keeping a tally of machine versus the human plus machine.

The COVID-19 Crisis for Financial Markets: The Human and Machine Views

By the end of the day on February 24 — the same day I was trying to understand whether something was wrong, the S&P500 fell almost three and a half percent. It dropped another 3% on Tuesday, and gyrated wildly the following day. Strangely, at this time the market’s “fear index,” the VIX, ended the day at 25, which implies that the market expected the daily standard deviation of returns to be roughly 1.5% for the next month. Clearly, the drop of over 3% had not alarmed the market.

Three days into that week, the chart painted a bemusing picture. Despite the plunge, the VIX had barely risen to 27. Some short-term momentum algorithmic trading programs seemed to have started to short the equity market, a sign of low confidence and fear, but those with a longer view were agnostic or still long. Some algorithms saw the plunge as a buying opportunity.

While our trading machine has experienced big red bars like those in the chart before, to me the emerging news signaled that more uncertainty was on the way, not less. Based on this assumption, I concluded it likely that the machine (and the market) were underestimating future risk. I “took back control” by cutting risk substantially across the portfolio in general and particularly in equities.

As March began, market continued to gyrate wildly as we tried to reconcile two opposing forces. On the one hand, the economics weighed heavily, resulting in massive selloffs. On the other hand, the “step function” policy responses by Federal agencies and President Trump, both of whom were determined to prop up the stock market by force of will and at any cost, resulted in record-breaking rallies. By March 8, however, Corona was finally center stage.

Markets rollercoastered for the remainder of March, depressed by the health and economic toll, buoyed by monetary and fiscal shots in the arm. On March 18, the VIX hit a record of 85, which can be interpreted as expecting one standard deviation return swings of +/- 5% each day for the next month. In the world’s most liquid financial markets, things don’t get more uncertain than that.

It is hard to overstate the magnitude of the change in the markets or how quickly it came upon market participants. Over a period of just 23 trading sessions, the S&P500, the benchmark measure of US equity markets, fell through the bottom of most charts, dropping a breathtaking 35% by March 18 from its peak on February 19. The market had indeed grossly underestimated future risk, relative to my own “common-sense” human reasoning which had made me uncomfortable on autopilot back in February.

It was not until March 31, that I once again was comfortable enough to return the system to autopilot, judging that the first-order impacts of the shock were now incorporated into prices along with details about the government and Fed stimuli, along with an expectation of future support by the government.

The chart below shows the relative performance of the machine and the human (augmenting the machine). It is based on my diary from Feb 24 to March 31 2020 when I handed back full control to the machine.

The orange bars show what the fund actually did during this period, while the blue bars show the machine if left alone on autopilot. The actual magnitudes are somewhat irrelevant. What is striking is the more even magnitudes of returns with intervention.

The fact that the intervention turned out to be positive is irrelevant in that the big blue losing bars (below the axis) could easily have been big positive ones, in which case the machine’s strategy could have been positive. Rather, the primary objective of the operator is to take the right amount of risk under the circumstances, regardless of how the future will unfold.

Lessons and Challenges

The phenomenon of “automation bias” is prevalent among operators of highly automated systems. Automation bias is a form of complacency that can border on learned helplessness. After all, some operators reason, if the system has functioned correctly in the past, why shouldn’t it in the future? Is there really cause for alarm?

Because of this inherent bias and the infrequent occurrence of failure, it takes time for the operator to realize that he or she should assume control, and even longer to evaluate alternative courses of action.

And timing is essential. Act too slowly, and the costs could be catastrophic. Act to quickly and the costs could also be very high. In settings in which a machine has little context, its decision making is severely disadvantaged. For the operator, who may have only slightly more context than the machine, the decision about if and when to override must be driven by estimates of the costs of false positives –jumping in too soon or too often, and the costs of false negatives — waiting too long and possibly doing nothing.

The heatmap can provide some context for such unnerving intervention decisions.

For problems with high predictability and in which the operator is highly skilled, the machine could be designed to set a relatively low bar for what it identifies as exceptional, and to engage the operator, for example, to confirm that an aircraft’s flight sensor reading is correct. It would not be unreasonable for a pilot to engage with the machine and even assume control during such a diagnostic procedure. Indeed, such periodic engagement could have the accompanying benefit of mitigating automation bias that sets in where error rates are low. Most critically, the interface can be designed to allow the operator to take control seamlessly. In retrospect, this seems to have been a failure of the Boeing MAX 737 interface, which made it difficult for the experts to retake control even after it became obvious that it was failing.

The same reasoning applies to autonomous vehicles. At this point in the evolution of self-driving vehicle technology, it is not unreasonable to expect autopilot systems to set a low bar for exceptions and for the interface to enable a seamless handover to a human driver. Errors such as the Uber crash in Arizona that killed a pedestrian might be avoidable in the future if human operators have sufficient time to take over. The costs of false positives should be low in situations involving an autonomous vehicle that pulls over to the side of the road and asks the human driver to take control when its level of operational risk exceeds a relatively low threshold.

For settings characterized by low predictability, such as investing, the decision about whether and when the operator should take control is far less clear cut. For this reason, these decisions can be extremely difficult, as I’ve tried to illustrate in this short case study. It can be exceedingly difficult for the machine to discriminate between the trembles which are the market equivalent of head fakes, and those that will develop into full-blown crises.

These machines therefore need to set a relatively high bar for alerting the human in order to minimize the number of false alarms. In contrast, the operator’s bar should be low for acting in the presence of “unknown unknowns” where the machine may quickly become untrustworthy. This turns out to be the single hardest job for the operator of an automated trading platform.

The difficulty of predicting the implications of “discontinuities” such as COVID-19 and the burden they place on the operator of an automated decision making system cannot be underestimated. Even though a handful of scientists and government officials issued dire warnings as early as January, it was impossible to predict the trajectory of the virus and its economic impact. Equally if not more difficult was predicting the nature of the step function interventions of leaders and central banks to avert the impending crisis, and the impact of such interventions, which are the current “unknown unknowns.” If there is one lesson for the operators of automated systems, it is this: as difficult as the decision might be about whether and how to intervene, they cannot afford to be complacent in assuming that all tremors are the same. You can never be too vigilant.