America is not the great melting pot that poets like Ralph Waldo Emerson once extolled. At least, that’s not the story that DNA tells, according to the genealogy company Ancestry. Using more than 770,000 spit samples taken from their customers over the last five years, its researchers mapped how people moved and married in post-colonial America. And their choices—especially the ones that kept communities apart—shaped today's modern genetic landscape.

The study, published today in Nature Communications, combines a DNA database with family tree information collected over the company’s 34-year history. “We’re all living under the assumption that we are individual agents,” says Catherine Ball, chief scientific officer at Ancestry and the leader of the study. “But people actually are living in the course of history." And from the moment they spit, send, and consent, DNA kit customers become actors in a much larger story—told through the massive data sets companies like Ancestry are accumulating from casual genealogists.

Ball’s team of geneticists and statisticians started by pulling out subsets of closely related people from their 770,000 spit samples. In that analysis, each person appears as a dot, while their genetic relationships to everyone else in the database are sticks. The result, Ball says, “looks like a giant hairball.”

From that hairball her team pulled out more than 60 unique genetic communities—Germans in Iowa and Mennonites in Kansas and Irish Catholics on the Eastern seaboard. Then they mined their way through generations of family trees (also provided by their customers) to build a migratory map. Finally, they paired up with a Harvard historian to understand why communities moved and dispersed the ways they did. Religion and race were powerful deterrents to gene flow. But nothing, it turned out, was stronger than the Mason Dixon line.

Ancestry

“I have to admit I was surprised by that,” says Ball. “This political boundary had the same effect as what you’d expect from a huge desert or mountain range.”

Besides being a cool way to see how your ancestor’s dating pools forged the nation today, though, Ancestry's study has real applications for medical research. A lymphoma study pulling subjects from Minneapolis shouldn’t expect to see the same results as one that recruits in Miami, for example. Populations in different parts of the country have very different genetic makeups—and those differences could be incredibly valuable to a company building personalized cancer treatments, immunotherapy drugs, and other gene-targeted therapeutics.

Ancestry jumped in late to the genetic data game—they launched DNA analysis kits in 2012, five years after chief competitor, 23andMe—but strong sales in the last few quarters have sent it sprinting ahead. Today, Ancestry is valued at $2.6 billion and has one of the largest biobanks in the world, with genetic data from 3 million people. For every saliva sample swimming with DNA, the company analyzes more than 700,000 genetic locations, or SNPs, for single-nucleotide polymorphism. 23andMe, by comparison, is valued at $1 billion, has a database of more than 1.2 million individuals, and reports on roughly 650,000 SNPs per spit tube. But for research purposes, 23andMe says it uses a process called imputation to analyze 15 million variants per person.

Ancestry

“Research purposes” are a lucrative business for 23andMe. When its customers give permission, the company anonymizes their data and hands it over to more than a dozen pharma companies and academic institutions. One of those partnerships, a deal with Genentech to look at the genes of people with Parkinson’s disease, netted 23andMe $10 million.

Ancestry, on the other hand, makes the bulk of its revenue from its genealogical subscribers. It currently has only one research initiative in place, with Calico, the Alphabet spin-off focused on longevity science. The idea is that Calico will mine Ancestry’s customer data to figure out why some people live longer than others, then use that genetic information to develop life-extending therapeutics. But with studies like the one Ancestry published today, 2017 could bring more collaborators sniffing around its powerful database.

That should raise some questions for consumers, says Arthur Daemmrich, a healthcare historian at the Smithsonian Institute’s Lemelson Center for the Study of Invention and Innovation. “These companies can’t tell you today who they’re going to license your data to and for what purpose,” he says. “They’re just trying to be the holder of the data. But if they put samples on ice and keep them frozen forever, does consent cover that?”

Scientists and lawyers haven't agreed on the rules yet. But imagine for a moment what your yet-unborn great-grandchildren might learn from your DNA, preserved by a company like Ancestry. Maybe their robot doctors will use complex algorithms to identify the genes that will one day set their cells wild with cancer—and design a cure. Or maybe they'll just see that you married a fellow genealogy geek and lived happily ever after in California.