By using data culled from genealogy websites, computational biologist Yaniv Erlich has put together some of the largest family trees ever seen, including a single pedigree comprising 13 million individuals — some of whom date back 500 years.


Erlich recently presented his work at the American Society of Human Genetics annual meeting in Boston. His work could provide a new tool for analyzing the extent to which genes influence certain traits, like personality, longevity, and facial features. Erlich recently made his database available to other researchers, but he removed all the names to ensure privacy. And indeed, other work by Erlich has considered the end of genetic anonymity.


Called FamiLink, it's a database of crowd-sourced genealogy that contains pedigrees, demographic data, and simple phenotypic information. Websites used include MyHeritage and Geni.com. As of yet, FamiLinx does not contain any DNA information — but the hope is that this will soon change.

Heidi Ledford from Nature News explains more:

Pedigrees provide clues about genetic inheritance. For instance, by comparing an individual to their more distant relatives on the family tree, the change in frequency of a given trait, such as fertility, can indicate to what extent the trait has its roots in genetics. It can also provide clues as to whether the trait is controlled by a few genes that have large effects, or by many genes that each make smaller contributions. But it takes years to assemble genealogical data for even just a few thousand individuals, said Erlich during a presentation at the meeting on 24 October. In the past, researchers have painstakingly gathered such data from church records and individual volunteers. Erlich and his team decided to streamline the process by collecting data from more than 43 million public profiles on the genealogy website geni.com. The profiles typically included birth and death dates, as well as locations and, occasionally, photos uploaded by the users. The team assembled the data into family trees that ranged from a few thousand individuals up to 13 million people in size. Erlich says that pedigrees previously available for genetic studies contained hundreds of thousands of family members at best.

It's not immediately clear how useful this data will be — like the kinds of experiments that can be performed — especially considering potential inaccuracies in the crowd-sourced reporting of family information. But as Ledford notes in her article, genealogical analysis will likely play a big part in genetic studies in the future, particularly as people become more willing to contribute data and medical records.

Top image: PSV/Shutterstock.