AI has gotten something of a bad rap in recent years, but the Covid-19 pandemic illustrates how AI can do a world of good in the race to find a vaccine. AI is playing two important supporting roles in this quest: suggesting components of a vaccine by understanding viral protein structures, and helping medical researchers scour tens of thousands of relevant research papers at an unprecedented pace. Over the last few weeks, teams at the Allen Institute for AI, Google DeepMind, and elsewhere have created AI tools, shared data sets and research results, and shared them freely with the global scientific community.

WIRED OPINION ABOUT Oren Etzioni is the CEO of the nonprofit Allen Institute for AI, and a professor of computer science at the University of Washington. Nicole DeCario is Senior Assistant to the CEO at the Allen Institute for AI.

Vaccines imitate an infection, causing the body to produce defensive white-blood cells and antigens. There are three main types of vaccines: whole-pathogen vaccines, like those for the flu or MMR, use killed or weakened pathogens to elicit an immune response; subunit vaccines, (e.g., pertussis, shingles) use only part of the germ, such as a protein; and nucleic acid vaccines inject genetic material of the pathogen into human cells to stimulate an immune response. The latter is the type of vaccine targeting Covid-19 that began trials this week in the United States. AI is useful in accelerating the development of subunit and nucleic acid vaccines.

An essential part of viruses, proteins are made up of a sequence of amino acids that determine its unique 3D shape. Understanding a protein’s structure is essential to understanding how it works. Once the shape is understood, scientists can develop drugs that work with the protein’s unique shape. But it would take longer than the age of the known universe to examine all possible shapes of a protein before finding its unique 3D structure. Enter AI.

Read all of our coronavirus coverage here.

In January, Google DeepMind introduced AlphaFold, a cutting-edge system that predicts the 3D structure of a protein based on its genetic sequence. In early March, the system was put to the test on Covid-19. DeepMind released protein structure predictions of several under-studied proteins associated with SARS-CoV-2, the virus that causes Covid-19, to help the research community better understand the virus.

At the same time, researchers from The University of Texas at Austin and the National Institutes of Health used a popular biology technique to create the first 3D atomic scale map of the part of the virus that attaches to and infects human cells—the spike protein. The team responsible for this critical breakthrough had spent years working on other coronaviruses, including SARS-CoV and MERS-CoV. One of the predictions released by AlphaFold provided an accurate prediction for this spike structure.

Another effort at the University of Washington’s Institute for Protein Design also used computer models to develop 3D atomic-scale models of the SARS-CoV-2 spike protein that closely match those discovered in the UT Austin lab. They are now building on this work by creating new proteins to neutralize coronavirus. In theory, these proteins would stick to the spike protein preventing viral particles from infecting healthy cells.

More broadly, scientific research on Covid-19 requires a Herculean effort to keep up with the results emerging from other labs. Learning about work at another lab can save months or even years of work by moving past a blind alley, avoiding reinventing the wheel, or suggesting a shortcut. Labs report their work via published articles and increasingly via preprint services like bioRxiv and medRxiv.

Several thousand papers relevant to Covid-19 have appeared in the first three months of 2020, and the scientific literature is growing rapidly. As a result, scientists struggle to find the papers relevant to their specific research, to review the breadth of recent findings, and uncover insights. The first challenge is to collect the relevant literature and put it in a single, accessible location. In response, we at Allen Institute for AI have partnered with several research organizations to produce the Covid-19 Open Research Dataset (CORD-19), a unique resource of over 44,000 scholarly articles about Covid-19, SARS-CoV-2, and related coronaviruses. It’s updated daily as new research is published. This freely available data set is machine-readable, so researchers can create and apply natural-language processing algorithms, and hopefully accelerate the discovery of a vaccine.