Today, a teaspoon of spit and a hundred bucks is all you need to get a snapshot of your DNA. But getting the full picture—all 3 billion base pairs of your genome—requires a much more laborious process. One that, even with the aid of sophisticated statistics, scientists still struggle over. It’s exactly the kind of problem that makes sense to outsource to artificial intelligence.

On Monday, Google released a tool called DeepVariant that uses deep learning—the machine learning technique that now dominates AI—to identify all the mutations that an individual inherits from their parents.1 Modeled loosely on the networks of neurons in the human brain, these massive mathematical models have learned how to do things like identify faces posted to your Facebook news feed, transcribe your inane requests to Siri, and even fight internet trolls. And now, engineers at Google Brain and Verily (Alphabet’s life sciences spin-off) have taught one to take raw sequencing data and line up the billions of As, Ts, Cs, and Gs that make you you.

And oh yeah, it’s more accurate than all the existing methods out there. Last year, DeepVariant took first prize in an FDA contest promoting improvements in genetic sequencing. The open source version the Google Brain/Verily team introduced to the world Monday reduced the error rates even further—by more than 50 percent. Looks like grandmaster Ke Jie isn’t be the only one getting bested by Google’s AI neural networks this year.

DeepVariant arrives at a time when healthcare providers, pharma firms, and medical diagnostic manufacturers are all racing to capture as much genomic information as they can. To meet the need, Google rivals like IBM and Microsoft are all moving into the healthcare AI space, with speculation about whether Apple and Amazon will follow suit. While DeepVariant’s code comes at no cost, that isn’t true of the computing power required to run it. Scientists say that expense is going to prevent it from becoming the standard anytime soon, especially for large-scale projects.

But DeepVariant is just the front end of a much wider deployment; genomics is about to go deep learning. And once you go deep learning, you don’t go back.

It’s been nearly two decades since high-throughput sequencing escaped the labs and went commercial. Today, you can get your whole genome for just $1,000 (quite a steal compared to the $1.5 million it cost to sequence James Watson’s in 2008).

But the data produced by today’s machines still only produce incomplete, patchy, and glitch-riddled genomes. Errors can get introduced at each step of the process, and that makes it difficult for scientists to distinguish the natural mutations that make you you from random artifacts, especially in repetitive sections of a genome.

See, most modern sequencing technologies work by taking a sample of your DNA, chopping it up into millions of short snippets, and then using fluorescently-tagged nucleotides to produce reads—the list of As, Ts, Cs, and Gs that correspond to each snippet. Then those millions of reads have to be grouped into abutting sequences and aligned with a reference genome. From there they can go on to variant calling—identifying where an individual's genes differ from the reference.1 A number of software programs exist to help do that. FreeBayes, VarDict, Samtools, and the most well-used, GATK, depend on sophisticated statistical approaches to spot mutations and filter out errors. Each tool has strengths and weaknesses, and scientists often wind up having to use them in conjunction.