In the fight against the coronavirus pandemic, governments have not leveraged the advances in machine learning and AI, specifically the technology of personalized prediction, that have been such a feature of the private sector. Doing so would make management of a future pandemic far more effective, and even could help make the process of coming out of Covid-19-related social confinement safer and less expensive. To enable this to happen governments need to revisit their current approaches to data privacy.

Orbon Alija/Getty Images

Leer en español In these difficult times, we’ve made a number of our coronavirus articles free for all readers. To get all of HBR’s content delivered to your inbox, sign up for the Daily Alert newsletter.

Over the past few months the world has experienced a series of Covid-19 outbreaks that have generally followed the same pathway: an initial phase with few infections and limited response, followed by a take-off of the famous epidemic curve accompanied by a country-wide lockdown to flatten the curve. Then, once the curve peaks, governments have to address what President Trump has called “the biggest decision” of his life: when and how to manage de-confinement.

Throughout the pandemic, great emphasis has been placed on the sharing (or lack of it) of critical information across countries — in particular from China — about the spread of the disease. By contrast, relatively little has been said about how Covid-19 could have been better managed by leveraging the advanced data technologies that have transformed businesses over the past 20 years. In this article we discuss one way that governments could leverage those technologies in managing a future pandemic — and perhaps even the closing phases of the current one.

The Power of Personalized Prediction

An alternative approach for policy makers to consider adding in their mix for battling Covid-19 is based on the technology of personalized prediction, which has transformed many industries over the last 20 years. Using machine learning and artificial intelligence (AI) technology, data-driven firms (from “Big Tech” to financial services, travel, insurance, retail, and media) make personalized recommendations for what to buy, and practice personalized pricing, risk, credit, and the like using the data that they have amassed about their customers.

In a recent HBR article, for example, Ming Zeng, Alibaba’s former chief strategy officer, described how Ant Financial, his company’s small business lending operation, can assess loan applicants in real time by analyzing their transaction and communications data on Alibaba’s e-commerce platforms. Meanwhile, companies like Netflix evaluate consumers’ past choices and characteristics to make predictions about what they’ll watch next.

The same approach could work for pandemics — and even the future of Covid-19. Using multiple sources of data, machine-learning models would be trained to measure an individual’s clinical risk of suffering severe outcomes (if infected with Covid): what is the probability they will need intensive care, for which there are limited resources? How likely is it that they will die? The data could include individuals’ basic medical histories (for Covid-19, the severity of the symptoms seems to increase with age and with the presence of co-morbidities such as diabetes or hypertension) as well as other data, such as household composition. For example, a young, healthy individual (who might otherwise be classified as “low risk”) could be classified as “high risk” if he or she lives with old or infirm people who would likely need intensive care should they get infected.

These clinical risk predictions could then be used to customize policies and resource allocation at the individual/household level, appropriately accounting for standard medical liabilities and risks. It could, for instance, enable us to target social distancing and protection for those with high clinical risk scores, while allowing those with low scores to live more or less normally. The criteria for assigning individuals to high or low risk groups would, of course, need to be determined, also considering available resources, medical liability risks, and other risk trade-offs, but the data science approaches for this are standard and used in numerous applications.

A personalized approach has multiple benefits. It may help build herd immunity with lower mortality — and fast. It would also allow better — and fairer — resource allocation, for example of scarce medical equipment (such as test kits, protective masks, and hospital beds) or other resources.

De-confinement strategies at later stages of a pandemic — a next key step for Covid-19 in most countries — can benefit in a similar way. Deciding which people to start the de-confinement process with, is, by nature, a classification problem similar to the classification problems familiar to most data-driven firms. Some governments are already approaching de-confinement by using age as a proxy for risk, a relatively crude classification that potentially misses other high-risk individuals (such as the above example of healthy young people living with the elderly).

Performing classification based on data and AI prediction models could lead to de-confinement decisions that are safe at the community level and far less costly for the individual and the economy. We know that a key feature of Covid-19 is that it has exceptionally high transmission rate, but also relatively low severe symptoms or mortality rate. Data indicate that possibly more than 90% of infected people are either asymptomatic or experience mild symptoms when infected.

In theory, with a reliable prediction of who these 90% are we could de-confine all these individuals. Even if they were to infect each other, they would not have severe symptoms and would not overwhelm the medical system or die. These 90% low clinical risk de-confined people would also help the rapid build up of high herd immunity, at which point the remaining 10% could be also de-confined.

If a prediction score were to prove wrong, the consequences would be limited to the “safest” individuals who were first released from confinement. They could be managed with available medical resources, which would not be overtaxed by treating the remaining 10% or more high-risk people who remained confined. In practice, of course, we would introduce de-confinement more gradually, starting from the lowest clinical risk groups first and building up herd immunity over time.

Of course we do not have perfect clinical risk prediction models, much like we do not have perfect hospital triage systems or credit-default prediction models. However, this does not stop the provision of credit to many businesses and individuals who, with good enough credit scoring tools, mostly do not default. To be sure, the stakes in this case are significantly higher than a loan default, so we need to make the models as robust as we can. But that does not mean we should not consider using them.

Unlike medical tests which are scarce, expensive, and slow to deploy, this clinical data-driven digital personalization approach can be applied quickly and is easy to scale. It could enable, with the right models, safer de-confinement at a much faster rate than current test-track-isolate best practices for Covid-19, under which anyone infected and their contacts would remain in confinement, even if they are at low risk of suffering serious symptoms.

Getting the Data

At present, the data required for assessing an individual’s clinical risk from contracting a given virus are not easily accessed. Governments can certainly ramp up national health data gathering by creating or rolling out more comprehensive electronic medical records, but the value of these may be limited as it would take time for patterns to emerge between the historical data in medical records and the impact of a virus on its victims.

In a context of a pandemic that could rapidly affect millions on a global basis, a better approach might be to create and share a prediction model that is “trained” using the data from an initial outbreak. A dataset with tens of thousands seriously affected (those requiring an ICU) individuals, balanced with many more relatively less affected ones (those exhibiting mild symptoms), is large enough to enable some level of personalized prediction, the quality of which improves as more data is added.

Once a model is up and running it can be shared to help other cities and even countries in the early stages of the spread, because the basic underlying biological and physiological data in people’s medical records do not vary much (everyone grows old, and diabetes in Wuhan is the same as diabetes in Baltimore.) If a virus strikes two countries whose populations resemble each other, the outcomes are likely to be similar. Given this, the two countries could use the exact same prediction model without having to share the actual medical records that went into training the model. Of course data patterns across countries may vary due to, say, demographics (Japan has more old people than Mexico) and cultural or lifestyle differences (Italian grandparents may be more involved in child care than German ones), but data analysts can rework the model to accommodate these variations if the data were collected according to a commonly developed standard or protocol.

Consider how this could have played out for Covid-19: When the coronavirus emerged in Wuhan, data was initially non-existent, making model-based personalization infeasible. At this point, the lockdown approach made sense: Shut down the cities, implement total social distancing, and monitor closely, making no major exceptions. This obviously helped contain the disease, but it also created the opportunity for the Chinese government to collect all available training data for clinical risk prediction models that it could then have shared with other countries, which could in turn have added their own training data to improve the model further.

The Challenge of Privacy

Implementing the technological innovations, however, will require policy changes. Existing policies covering data privacy and cybersecurity, and their respective and differing interpretations across countries, will largely prohibit the kind of personalized pandemic management approach we are advocating.

This is largely because current policies do not differentiate between the input data (used to train a model), the prediction models themselves, and the “output data” (predictions from the trained model). When a policy, implicitly or explicitly, prohibits data sharing or requires data to be stored on servers within a country, it covers anything that can be legally interpreted as data, including models and their parameters. We would, therefore, urge policymakers to consider distinguishing the sharing of models and the sharing of data.

We also encourage national governments to agree on a protocol for determining when data could be shared. For example, a declaration by the WHO or UN that a particular outbreak qualified as a pandemic could serve as a trigger to suspend normal privacy laws to allow the sharing of anonymized data. During such times, many people might be willing to exceptionally and temporarily provide their data, through appropriate and secure channels, for training models that can guide policy decisions with major life and economic consequences. If that happens, there is a great deal that modern data science and AI could do to mitigate the fallout from this pandemic and to prepare us for limiting the impact of the next.