For more than ten years, genome-wide association studies (GWAS) have demonstrated that big numbers are crucial to success in using genetics to understand human health. Only by examining genes and genetic variants in hundreds of thousands of people or more do the roles of two important forms of variation become apparent:

common gene variants (generally defined as those found in more than five percent of the population) that make small contributions to the trait or disease in question, and

rare variants that appear infrequently but have much greater individual impact.

Some prime examples of the power of numbers are the studies run by the international Genetic Investigation of ANthropometric Traits (GIANT) Consortium, led at the Broad Institute by institute member and Metabolism Program co-director Joel Hirschhorn. For about a decade, GIANT studies have brought together massive amounts of data from GWAS and other large-scale sources to tease apart the genetics and biology of three physical human traits: height, body mass index (BMI), and waist circumference (the latter two of which relate closely to obesity).

"Massive" is not hyperbole. A pair of 2015 obesity studies sifted through data from more than 300,000 people, in the end identifying more than 140 loci (regions of the genome, which can span one or more genes) influencing obesity traits. For their most recent study of height conducted in 2017, the GIANT team amassed genetic data and measurements from more than 700,000 people to bring the totals for height to more than 800 signals in about 540 genetic loci. Now the dataset is poised to get a boost from commercial genotyping provider 23andMe, Inc., which began a collaboration with the GIANT consortium on height studies last year.

"The larger the sample size you have, the more discoveries you can make, and the more you can learn about the underlying biology," said Hirschhorn, who is also a pediatric endocrinologist at Boston Children's Hospital. "And the better job you can do at predicting a person's adult height or risk for obesity based on their genetics."

Rather than recruiting hundreds of thousands of participants in a single study, GIANT instead fosters collaboration between academic, non-profit, government, and commercial partners to collect, merge, and analyze numerous independently generated GWAS datasets as a single, enormous set. At the moment, GIANT studies incorporate data from nearly 200 established cohorts, with contributions from investigators at more than 150 institutions.

"We want to assemble data from between one and three million people for height and measures of obesity," Hirschhorn explained. "We continue to gather collaborators from around the world, and especially want to capture data from additional ancestries and continents."

Once 23andMe’s data from customers who consented to participate in research has been integrated into GIANT's overall dataset, Hirschhorn estimates more than 1 million people's data will become part of their analyses.

"23andMe's dataset is very large, it's very well genotyped, and because self-reported height and BMI data tend to be very accurate, we think the phenotypic data are likely to be quite accurate," he said. Even at a glance, he added, "their height data look really good. The signals we've already seen in our other GIANT studies are strongly confirmed, and there are many, many additional signals that we hadn't seen yet."

“It's very gratifying to have collaborators like those at the Broad and to see them delve into the data collected by 23andMe," said Adam Auton, principal scientist and collaboration lead at 23andMe. “We’re tremendously privileged to have a large number of customers who have chosen to contribute to scientific research. We know our research model has the power to accelerate the pace of discovery, so we're eager to see how our data can aid the work being done by the GIANT consortium."

Hirschhorn explained that GIANT has already cataloged height and BMI-related data for more than 1.5 million people, including recent data provided by the UK Biobank, a half-million person cohort established by research charities and government agencies in Great Britain. In a recent manuscript posted to bioRxiv, Loic Yengo of the University of Queensland and GIANT colleagues noted that the addition of UK Biobank's data to the consortium's published results will raise those numbers to about 3,000 signals for height in around 700 genetic loci.

23andMe's data will push the GIANT catalog to a new level of scope by nearly doubling the size of the cohort. “Putting our data together with those from 23andMe will garner another huge jump toward completing locus discovery using the GWAS approach," Hirschhorn said.

Once combined, those data may represent the final word, biologically speaking, on height. "Say we find 1,000 or 1,500 loci for height," Hirschhorn said. "We're going to start seeing signals from the same loci over and over again; we're already starting to see it with our existing height data. It could be that by next year, we'll be just about done discovering the genetic loci where common variants contribute to the biology of height."

The same cannot be said, he added, for BMI and obesity. "Because it's such a major public health problem, we'd love to learn as much as we can about the biology of BMI and related traits as we can, which means gathering as large a sample size as possible. That's where more data from a broad scope of public and corporate partners, including 23andMe, would be really helpful."

In the long run, Hirschhorn thinks that GIANT and its collaborators' efforts could also produce prediction or screening tools that help patients like his, many of whom come to him with concerns related to stature.

"Once we identify enough of the heritability of height, for example, we would be able to genotype a child and get a better sense of whether their slow growth is related to normal polygenic variation or to some other cause," he said.

And paradoxically, the more data GIANT collects and the more loci and variants they reveal, the easier it may be to connect the dots between GWAS-flagged loci and variants and the biology they actually affect.

"The more loci and variants you have to work with, the more you can look for commonalities that will give you clues about the biology of individual variants," Hirschhorn explained. "And some loci are easier to drill down on than others because just by chance there are only one or two variants that could plausibly explain the association. Being able to have a hundred examples of loci like that could be immensely helpful in proceeding from variant to function."