The problem with the universe is that it's big. Not just large in terms of physical scale compared to the domain that humans occupy, but large in terms of the sheer numbers of atoms, molecules, asteroids, planets, stars, and galaxies. This means that if you want to look for a few special things out there in the cosmos you have to spend an awful lot of time and effort just sifting through everything else.

There is perhaps nowhere that this challenge is more starkly evident than in the quest to see whether or not there are other technological species out there in the universe. The numbers are staggering. Our galaxy, the Milky Way, harbors by some estimates as many as 400 billion stars. We now suspect that most will host planetary systems. If any of those places sustain complex, communicative life then there are an absurd number of channels by which they might - either deliberately or inadvertently - betray their presence to the rest of the galaxy.

Radio transmissions might exist across thousands if not millions of discrete frequencies. Or in other electromagnetic radiation at visible, infrared, ultraviolet, even gamma-ray frequencies. All capable of being pulsed, modulated, polarized and varied across time and according to unknown strategies or accidents. There could be physical structures obscuring otherwise natural starlight, or glowing with thermal energy. Or deliberate neutrino emissions, gravitational waves, engineered molecular vibrational emissions, and isotopically encoded messages. The list is enormous.

Right now we've barely scratched the surface. Even quantifying our shortcomings is quite challenging. In 2018 a paper by Wright, Kanodia, and Lubar applied a multi-dimensional measure to radio searches and concluded that, in colloquial terms, we've thus far "looked" at the equivalent of a hot-tub's worth of water out of all of Earth's oceans.

But the game is afoot to improve on this. Recently the Breakthrough Listen project released a new dataset of some 2 Petabytes of radio telescope data (from the Green Bank and Parkes observatories). That data consists of a series of measurements around the plane of the Milky Way (including 20 known exoplanetary systems that might witness Earth transiting the Sun), the galactic center, and the interstellar comet 21/Borisov.

Yet, as impressive as this raw data is, it too represents just a droplet of what's needed to begin to truly constrain the odds of technologically visible life in our neighborhood. And that raises the question of how we are going to be able to crunch through SETI data as it gets bigger and bigger in scope.

One of the key issues is that we don't want to unduly bias the ways in which we scrutinize the data. Perhaps somewhere there's a beautifully clean "blip" that is irrefutably artificial and extraterrestrial in origin. But perhaps not. How do we look for what we don't know? Many researchers have thought hard on this problem. A particularly hopeful line of attack may be to exploit the extraordinary advances in machine learning over the past decade or so. Deep learning systems in particular (layers and nestings of virtual neural nets) excel at sensing complex and subtle correlations and patterns in data.

Approaches like Generative Adversarial Networks and convolutional neural nets have already been investigated as ways to sift through SETI data - looking for out-of-the-ordinary structures without undue preconceptions. Once a system is trained it can churn through data quite efficiently. But there's the rub. Training can be a huge computational burden. Especially when the 'rulesets' are themselves being discovered by the deep-learning system; when we take humans as far out of the loop as possible.

It may be that the enormity of the SETI challenge isn't just in the scale of the datasets (and acquiring those in the first place) but is also in the open-endedness of the questions. Once we allow for the fact that we may not know how signals are encoded (whether in radio or optical light, or otherwise) the game quickly becomes very, very expensive. For example, Google's "Transformer" deep-learning system can be run on thousands of processors to train hundreds of millions of parameters. But the energy costs are staggering, with training runs clocking up equivalent carbon footprints of hundreds of thousands of pounds of carbon dioxide.

What's the answer? It could be with quantum computing. As much as there is excruciating hype (and misunderstanding) around quantum computing, it does hold enormous promise for some of the tasks that current deep-learning systems rely on (for instance, very large matrix manipulations). It's possible that we'll eventually manage to utilize quantum computing to help train our machines, and then unleash them on huge SETI datasets. Looking for signals in ways that we can't even imagine.

Of course, it might be that we get lucky before needing all of that. But if we don't there is a certain poetry to the idea that our best tool for discovering other intelligences could stem from the deepest, and most bewildering rules of nature.