It was hard work, not because the data didn’t exist, but because it was scattered. To date, scientists have probably sequenced at least 5,000 full genomes and some 500,000 exomes, but most are completely inaccessible to other researchers. There might be intellectual-property restrictions, or issues around consent. There’s the logistical hassle of shipping huge volumes of data on hard drives. And some scientists are just plain competitive.

Fortunately, MacArthur’s colleagues at the Broad Institute and beyond had deciphered so many exomes that he could gather thousands of sequences by personally popping into offices. Buoyed by that success, he started contacting people who were studying the genomes of people with cancer, heart disease, diabetes, schizophrenia, and more. “There’s a big swath of human genetics where people have learned that you either fail by yourself or succeed together, so they’re committed to sharing data,” MacArthur says.

By 2014, he had amassed more than 90,000 exomes from around a dozen sources, collectively called the Exome Aggregation Consortium. Then, he had to munge them together.

That was the worst bit. Researchers use very different technologies to sequence and annotate genomes, so combining disparate data sets is like mushing together the dishes from separate restaurants and hoping that the results will be palatable. Often, they won’t be.

Monkol Lek, a postdoc in MacArthur’s lab who himself has a genetic muscle disease, solved this problem by essentially starting from scratch. He took the raw data from some 60,706 patients and analyzed their exomes, one position at a time. The raw sequences took up a petabyte of memory, and the final compressed file filled a three-terabyte hard disk.

The prize from all this data-wrangling was one of the most thorough portraits of human genetic variation ever produced. MacArthur went through the main results in the opening talk of this week’s Genome Science 2015 conference, in Birmingham, U.K. His team had identified around 10 million genetic variants scattered throughout the exome, most of which had never been described before. And most turned up just once in the data, meaning that they lurk within just one in every 60,000 people. “Human variation is dominated by these extremely rare variants,” says MacArthur. That’s where the secrets of many rare genetic disorders reside.

But unexpectedly, the most interesting variants turned out to be the ones that weren’t there.

The graduate student Kaitlin Samocha developed a mathematical model to predict how many variants you’d expect to find in a given gene, in a population of 60,000 people. The model was remarkably accurate at estimating neutral variants, which don’t change the protein that’s encoded by the gene, and so have minimal impact. But the model often wildly overestimated the number of “loss-of-function variants,” which severely disrupt the gene in question. Repeatedly, the ExAc data revealed far fewer of these variants than Samocha’s model predicted.