One of the world’s largest pharmaceutical companies has launched a massive effort to compile genome sequences and health records from two million people over the next decade. In doing so, AstraZeneca and its collaborators hope to unearth rare genetic sequences that are associated with disease and with responses to treatment.

It’s an unprecedented number of participants for this type of study, says Ruth March, vice-president and head of personalized health care and biomarkers at AstraZeneca, which is headquartered in London. “That’s necessary because we’re going to be looking for very rare differences among individuals.”

To achieve that ambitious goal, AstraZeneca will partner with research institutions including the Wellcome Trust Sanger Institute in Hinxton, UK, and Human Longevity, a biotechnology company founded in San Diego, California, by genomics pioneer Craig Venter. AstraZeneca also expects to draw on data from 500,000 participants in its own clinical trials, and medical samples that it has accrued over the past 15 years.

In doing so, AstraZeneca will be following a burgeoning trend in genetics research. For years, geneticists pursued common variations in human DNA sequences that are linked to complex diseases such as diabetes and heart disease. The approach yielded some important insights, but these common variations often accounted for only a small percentage of the genetic contribution to individual diseases.

Researchers are now increasingly focusing on the contribution of unusual genetic variants to disease. Combinations of these variants can hold the key to an individual's traits, says Venter.

The hunt for important rare variants has led AstraZeneca to partner with the Institute for Molecular Medicine Finland, says Aarno Palotie, who heads the Human Genomics Program there. Finland’s population was geographically isolated until recently, he notes, which makes for a unique genetic make-up. As a result, some variations that are very rare in other populations may be more common in Finland, making them easier to detect and study.

Familiar road

AstraZeneca did not disclose exactly how much it would be investing in the project — “hundreds of millions of dollars” over the course of ten years was all that Menelas Pangalos, executive vice-president of the company's innovative medicines programme, would say. The company intends to use the data to inform drug development in all of its major disease areas, from diabetes to inflammation to cancer, says March.

It is not the first time that a large drug company has poured money into genomics in hopes of fuelling drug discovery, notes David Goldstein, who studies human genetics at Columbia University in New York City and is an adviser to AstraZeneca. “Genomicists have for decades now been promising that genomics is going to revolutionize the way that medicines are developed and the way that medicines are used,” he says. “We are now here saying it again.”

Those past efforts often disappointed, but the field has turned a corner, Goldstein adds. Genome sequencing is faster and cheaper than ever before, and researchers are armed with better bioinformatics tools to interpret the data. Advances in stem-cell biology and genome-editing methods such as CRISPR–Cas9 are making it much easier for researchers to determine how a particular change in a DNA sequence affects living cells.

In all, the project should generate about 5 petabytes of data. “If you put 5 petabytes on DVDs, it would be four times the height of the Shard,” said Pangalos, referring to a nearly 310-metre London skyscraper. “If you wanted to put it on your iPod, it would take about 5,000 years to listen to it all.”

Refined predictions

Much of that data will come from Human Longevity. The company, which ultimately hopes to accrue 10 million human genomes, already has 26,000 completed and paired with medical records. Its databases also contain additional partial genome sequences. “We’re adding one about every 15 minutes on average,” Venter says.

Using DNA sequence alone, Venter says that his company can now predict a person’s height, weight, eye colour and hair colour, and produce an approximate picture of their face. Much of that detail is lurking in rare sequence variations, says Venter, whose own genome has been in public databases for more than a decade.

Human Longevity's databases are kept locked behind layers of security. “If I were advising a younger Craig Venter, I’d say, ‘Think carefully before you just dump your genome on the Internet’,” Venter says. “The levels of prediction are getting much more interesting.”