Could deep learning help paleontologists and geneticists hunt for ghosts?

When modern humans first migrated out of Africa 70,000 years ago, at least two related species, now extinct, were already waiting for them on the Eurasian landmass. These were the Neanderthals and Denisovans, archaic humans who interbred with those early moderns, leaving bits of their DNA behind today in the genomes of people of non-African descent.

But there have been growing hints of an even more convoluted and colorful history: A team of researchers reported in Nature last summer, for instance, that a bone fragment found in a Siberian cave belonged to the daughter of a Neanderthal mother and a Denisovan father. The finding marked the first fossil evidence of a first-generation human hybrid.

Unfortunately, it’s very rare to find such fossils. (Our knowledge of Denisovans, for instance, is based on DNA extracted from a mere finger bone.) Many other ancestral pairings could easily have transpired, including ones that involved hybrid groups from earlier crosses — but they might be practically invisible when it comes to physical evidence. Clues to their occurrence may instead survive only in some people’s DNA, and even then, they may be subtler than the signs of Neanderthal and Denisovan genes there. Statistical models have helped scientists infer the existence of a couple of these populations without fossil data: For example, according to research published in late 2013, patterns of genetic variation in ancient and modern humans point to an unknown human population having interbred with Denisovans (or their ancestors). But experts believe these methods inevitably overlook a great deal, too.

Who else contributed to today’s genomes? What did these so-called ghost populations look like, where did they live, and how often did they interact and mate with other human species?

In a paper published last month in Nature Communications, researchers showed the potential for deep learning techniques to help fill in some of the missing pieces, pieces that experts may not have even been aware of. They used deep learning to sift out evidence of another ghost population: an unknown human ancestor in Eurasia, likely a Neanderthal-Denisovan hybrid or a relative of the Denisovan line.

The work points to the future usefulness of artificial intelligence in paleontology, not only for identifying unforeseen ghosts but also for uncovering the very faded footprints of the evolutionary processes that have shaped who we’ve become.

The Search for Subtle Signatures

Current statistical methods involve examining four genomes at a time for shared traits. It’s a test of similarity, but not necessarily of actual ancestry, because there are many different ways of interpreting the small amounts of genetic mixture it uncovers. For instance, such analyses might suggest that a modern-day European shares certain traits with the Neanderthal genome but not a modern-day African. But that doesn’t necessarily set in stone that those genes came from interbreeding between the Neanderthals and the ancestors of Europeans. The latter, for instance, could have instead bred with a different population, one closely related to Neanderthals but not the Neanderthals themselves.

LEARN MORE The WIRED Guide to Artificial Intelligence

We just don’t know, because in the absence of physical evidence to indicate when, where and how those ancient hypothetical sources of genetic variation might have lived, it’s difficult to say which of many possible inferred ancestries is most probable. The technique “is powerful because of its simplicity, but it leaves a lot on the table in terms of understanding evolution,” said John Hawks, a paleoanthropologist at the University of Wisconsin-Madison.

The new deep learning method is an attempt to do better, by seeking to explain levels of gene flow that are too small for the usual statistical approaches, and by offering a far more vast and complicated range of models to do so. Through training, the neural network can learn to classify various patterns in genomic data based on what demographic histories most likely gave rise to them, without being told how to make those connections.