No human, or team of humans, could possibly keep up with the avalanche of information produced by many of today’s physics and astronomy experiments. Some of them record terabytes of data every day—and the torrent is only increasing. The Square Kilometer Array, a radio telescope slated to switch on in the mid-2020s, will generate about as much data traffic each year as the entire internet.

The deluge has many scientists turning to artificial intelligence for help. With minimal human input, AI systems such as artificial neural networks—computer-simulated networks of neurons that mimic the function of brains—can plow through mountains of data, highlighting anomalies and detecting patterns that humans could never have spotted.

Of course, the use of computers to aid in scientific research goes back about 75 years, and the method of manually poring over data in search of meaningful patterns originated millennia earlier. But some scientists are arguing that the latest techniques in machine learning and AI represent a fundamentally new way of doing science. One such approach, known as generative modeling, can help identify the most plausible theory among competing explanations for observational data, based solely on the data, and, importantly, without any preprogrammed knowledge of what physical processes might be at work in the system under study. Proponents of generative modeling see it as novel enough to be considered a potential “third way” of learning about the universe.

Traditionally, we’ve learned about nature through observation. Think of Johannes Kepler poring over Tycho Brahe’s tables of planetary positions and trying to discern the underlying pattern. (He eventually deduced that planets move in elliptical orbits.) Science has also advanced through simulation. An astronomer might model the movement of the Milky Way and its neighboring galaxy, Andromeda, and predict that they’ll collide in a few billion years. Both observation and simulation help scientists generate hypotheses that can then be tested with further observations. Generative modeling differs from both of these approaches.

“It’s basically a third approach, between observation and simulation,” says Kevin Schawinski, an astrophysicist and one of generative modeling’s most enthusiastic proponents, who worked until recently at the Swiss Federal Institute of Technology in Zurich (ETH Zurich). “It’s a different way to attack a problem.”

Some scientists see generative modeling and other new techniques simply as power tools for doing traditional science. But most agree that AI is having an enormous impact, and that its role in science will only grow. Brian Nord, an astrophysicist at Fermi National Accelerator Laboratory who uses artificial neural networks to study the cosmos, is among those who fear there’s nothing a human scientist does that will be impossible to automate. “It’s a bit of a chilling thought,” he said.

Discovery by Generation

Ever since graduate school, Schawinski has been making a name for himself in data-driven science. While working on his doctorate, he faced the task of classifying thousands of galaxies based on their appearance. Because no readily available software existed for the job, he decided to crowdsource it—and so the Galaxy Zoo citizen science project was born. Beginning in 2007, ordinary computer users helped astronomers by logging their best guesses as to which galaxy belonged in which category, with majority rule typically leading to correct classifications. The project was a success, but, as Schawinski notes, AI has made it obsolete: “Today, a talented scientist with a background in machine learning and access to cloud computing could do the whole thing in an afternoon.”

Schawinski turned to the powerful new tool of generative modeling in 2016. Essentially, generative modeling asks how likely it is, given condition X, that you’ll observe outcome Y. The approach has proved incredibly potent and versatile. As an example, suppose you feed a generative model a set of images of human faces, with each face labeled with the person’s age. As the computer program combs through these “training data,” it begins to draw a connection between older faces and an increased likelihood of wrinkles. Eventually it can “age” any face that it’s given—that is, it can predict what physical changes a given face of any age is likely to undergo.