There’s just one problem, say Jemma Geoghegan and Edward Holmes, two virologists based on Sydney. It won’t work.

In a new paper, Geoghegan and Holmes argue that these projects aren’t going to help preempt pandemics, for the simple reason that there are just too many viruses. About 4,400 have been identified; millions more have not, and only a tiny fraction of these could conceivably jump into humans. “The GVP will be great for understanding more about viruses and their evolution, but I don’t see how it’ll help us work out what’s going to infect us,” says Geoghegan. “We’re only just coming to terms with the vastness of the virosphere.”

There are ways of narrowing down the culprit list. Many teams have tried to map geographical hotspots from which diseases are most likely to emerge, pinpointing areas with tropical forests and lots of mammal species. Others have tried to find features in viruses that make it easier for them to spread between people. But having tried this approach themselves, Geoghegan and Holmes argue it’s not very useful.

Partly, that’s because the results of such studies are too broad to narrow down the list of suspicious viruses in a helpful way. Partly, it’s because such work is based on past epidemics—events that are relatively rare, and so difficult to draw reliable patterns from. For example, Saudi Arabia comes out as mostly cold in maps of disease hotspots, and yet it’s where MERS virus recently jumped into humans from an unlikely host: camels. “We’re trying to predict really, really rare events from not much information, which I think is going to fail,” Geoghegan says.

Ultimately, the odds that a given virus will cause an outbreak depend on the virus itself, the animals that host it, the people who stand to contract it, and the environment that all of them live in. “Within each of these categories, there are so many variables that could influence disease emergence,” says Jennifer Gardy, from the University of British Columbia. “It’s hard enough to model the effect of any one, and these factors likely interact in ways that we can’t possibly understand just by looking at each of them discretely.”

It’s even difficult to work out whether the viruses we already know about are going to cause outbreaks. Ebola and Zika, for example, were discovered in 1976 and 1947 respectively, but both managed to catch the world unawares this decade. “This is the easiest kind of prediction to make,” says Kristian Andersen, from the Scripps Research Institute, and we’re still about 10 to 20 years from doing it well. Next up in difficulty: predicting whether a virus like H7N9 bird flu, which can infect humans but isn’t known to cause major outbreaks, will eventually do so. Again, Andersen says that this isn’t feasible now, but should be with more research.

But predicting whether a newly discovered animal virus could jump into humans and cause a pandemic “is simply impossible,” he says. “What you’re trying to predict is likely something that happens maybe once out of tens of billions of encounters, with one virus out of millions of potential viruses. You will lose your fight against the numbers.” Even machine learning—using computers to divine patterns in data that humans might miss—won’t solve the problem because there isn’t enough good data for the computers to sift through.