By Aaron Krol

April 17, 2014 | By the standards of the young field of genetic diagnostics, Sherri Bale is a seasoned veteran. In 2000, she left the NIH, where she worked as a geneticist studying skin disease, with her colleague John Compton to form GeneDx, a commercial lab in Maryland originally centered on rare hereditary disorders. “There was virtually nothing for rare disease [at the time],” she tells Bio-IT World. In the years since, GeneDx has expanded into cancer, cardiomyopathy, and a variety of other complex disease areas that badly need better diagnostic options.

Genetic diagnostics is often heralded as the future of medicine: it can be used to make diagnoses no other method can confidently capture, help predict health risks before symptoms develop, and personally tailor treatments to every patient’s unique profile. But in key ways, it struggles to keep pace with traditional testing. Instead of looking at one or a few well-understood biomarkers common across all samples, genetic tests have to contend with variants that may occur in less than one patient out of ten thousand.

“You can find things that have never been seen before, and there’s not much data in the literature,” says Bale. More crucially, the clinical meaning of these variants is rarely unambiguous. Each variant found in the course of testing – and in a mid-size gene panel there will be hundreds – has to be individually interpreted, taking into account factors like its population frequency, similarity to other variants known to be pathogenic, location within the gene, and, if samples from a patient’s family members are available, whether it segregates with disease.

As an industry veteran, Bale coauthored the widely-followed recommendations from the American College of Medical Genetics (ACMG) that clinical labs use to classify variants in five categories: benign, likely benign, likely pathogenic, pathogenic, or uncertain. (The board of the ACMG is currently reviewing a revised document, also written with Bale’s input, which was first introduced at the annual ACMG meeting this March.) These guidelines provide some clarity and consistency across labs, but few calls in genetic diagnostics can be made with absolute confidence, and there is room for reasonable disagreement on where a given variant falls on the spectrum.

Too often, however, these differences of opinion never even come to light. While new variants are being discovered at an ever-increasing rate, this information is frequently siloed in the vaults of a single lab, leading to redundant work interpreting the same variant over and over again in different locations, with no opportunity to resolve discrepancies.

Labs like GeneDx rely heavily on online databases that try to gather information on variants in a single place. “We consult all public variant databases that we can get out hands on,” says Bale. “We use dbSNP, although I hate it. We use HGMD, although it’s dirty. We use all the LOVD databases that might be out there.”

Yet these resources are rarely curated to a clinical standard, and are prone to disagreement with one another. Worse, it has been a constant struggle to convince clinical labs to share their data in these open forums, where it could help geneticists reach consensus on which variants are involved in disease. For genetic diagnostics to become as reliable and accountable as traditional forms of testing, the whole ecosystem of variant data is in need of reform.

“If we cannot share our data, talk about what we find, and come to conclusions on what things mean, the whole genomics area is going to fall flat on its face,” says Bale. “We have to share the data.”

“The Gold Standard”

As a member of the ClinGen project, an NIH-funded consortium of genetic researchers that seeks to curate the reams of variant data contained in clinical labs and databases around the world, Heidi Rehm has been a high-profile advocate for data sharing. Rehm is a lead contributor and spokesperson for one of the newest variant databases, ClinVar, formed a little over a year ago under the umbrella of the National Center for Biotechnology Information (NCBI).

ClinVar is a top priority for the ClinGen project, because it is the first comprehensive genetic database designed explicitly for use in a clinical setting. Its aim is to catalogue every variant that has been found in the course of genetic diagnostics and research studies, along with a consensus decision on whether that variant is pathogenic, benign, or somewhere in between, and supporting evidence for that call.

Rehm is a prominent clinical geneticist in her own right, the director of the world-class Laboratory for Molecular Medicine (LMM) at the Partners HealthCare Center for Personalized Genetic Medicine in Cambridge, MA. She’s accustomed to wading through troves of disorganized data, trying to make calls that physicians can use to inform patient care. Like GeneDx, the LMM researches and classifies every variant they find individually, a process that may take days for a given test. The results returned to the physician list, and explain, any variant not deemed “benign.”

Heidi Rehm, Chief Laboratory Director of the Laboratory for Molecular Medicine at Partners HealthCare Center for Personalized Genetic Medicine. Image credit: Partners HealthCare

Rehm says that in a broad gene panel, her group could find as many as 25 variants in the “clinical range,” from uncertain to pathogenic. Even when she thinks the case for a genetic culprit is clear, she stresses the importance of issuing an in-depth report on each variant. “The interpretation of this is really tricky for physicians, even those who are well-seasoned geneticists,” she says.

Knowing that a wealth of detailed clinical evidence exists in the test results her team produces at the LMM, Rehm is beginning to shift these analyses into ClinVar, where labs around the world can refer to them. She encourages other ClinVar submitters to include their evidence alongside variants, making it easier to resolve how each variant was classified and build on that knowledge over time.

“We knew that we couldn’t create solely a clinical-grade database, because unfortunately most variants aren’t understood at a clinical grade,” says Rehm. “That doesn’t mean the information we know about a variant isn’t still useful.” To help users understand the quality of the data in ClinVar, the NCBI has set up a four-star rating system. At three or four stars, a variant has been vetted by a multi-institutional expert panel focused on a key disease area, and a collaborative decision reached on how the variant should be classified. At one star, a variant has only been interpreted by a single submitter. An entry can also receive zero stars, if two different interpretations have been submitted and the discrepancy has not been resolved.

GeneDx is one of the early contributors that have followed Rehm’s lead, submitting interpretation paragraphs along with hundreds of variants. “We are making an attempt to make this a highly curated database,” says Bale. “We are trying to encourage laboratories, and make it easier for laboratories, to submit the data that they have in their own internal databases – commercial labs like mine.”

“We have tons of data,” she adds, “and who has time to publish papers? We’re working very hard, and our hope is that ClinVar is going to be the gold standard.”

Conflicting Calls

When ClinVar was founded, it was unclear how consistent submissions would be from one lab to another. Since different labs have different evidence at hand when making variant calls, there is every reason to believe that one lab might call a variant pathogenic, while another calls it uncertain. It’s a situation Bale has encountered at GeneDx in the course of testing.

“We do find conflicts,” she says. “We resolve them by reading everything we can and making our own best decision. Sometimes we call the author of a paper. Sometimes I’ll call Heidi, and say, ‘This is in the literature this way, we see it that way, what do you have?’”

To gauge the extent of this issue, when the first data was fed into ClinVar, Rehm helped coordinate a study that looked at three different labs’ interpretations of variants in the same set of genes. The LMM and GeneDx were both participants.

“The goal was to both test out the process of data submission, and then to actually compare around a set of genes that we all did testing on,” says Rehm. The RASopathy genes, implicated in several developmental disorders, were chosen for the test run.

Altogether, two labs had submitted interpretations of the same RASopathy variant on 269 occasions. On 53 of these occasions – about one time out of five – the labs disagreed on how the variant should be classified.

Most of these disagreements are fairly indifferent to care – one lab said “pathogenic” and another said “likely pathogenic,” or one said “benign” and another said “likely benign.” But for more than twenty variants, one lab called “uncertain” while another put the variant somewhere on the benign or pathogenic spectrum. “There we know that medical decisions can differ,” says Rehm, “when a physician is dealing with a variant of uncertain significance, versus a reasonably confident assertion.”

A visualization in ClinVar of the genome location of a single-base substitution implicated in hearing loss

Rehm suspects that this pilot study actually understates the problem in ClinVar as a whole. “I would say those rates are much, much better than what we deal with in the literature,” she says. “Comparing what clinical labs say with each other is a lot more consistent than what we deal with in the literature.”

Much of the data currently in ClinVar was aggregated from earlier databases, such as OMIM (Online Mendelian Inheritance in Man), which were not designed for active clinical use. While they contain large volumes of information, and are valuable for linking specific genes to diseases, these databases only repeat findings from the medical literature, which are rarely as detailed as the clinical reports generated at a lab like LMM or GeneDx.

A common pattern is for a researcher to publish a large set of variants that have all been detected in a group of patients with a common disorder. The variants haven’t been individually interpreted, but the implication is that they are all pathogenic – an assumption that is carried into OMIM. In ClinVar, this is a far cry from the necessary level of rigor, and bound to come into conflict with new interpretations, either from clinical labs or elsewhere in the literature. Rehm estimates that, of 420 discrepancies currently flagged in ClinVar, around half involve data from publications referenced in OMIM. All of these discrepancies have to be examined and resolved.

A Web of Databases

While Rehm is hopeful that more and more labs will become involved in improving ClinVar, it is far from the only database where information could be funneled. There are literally hundreds of online repositories for data on genetic variants, each with different standards and a different guiding purpose. Large ones like OMIM, HGMD and dbSNP serve as preliminary references for research, collecting together as many variants as possible.

Other databases are locus-specific, focusing on a specific gene or related set of genes. Rehm estimates that between six and seven hundred locus-specific databases exist – each a potential source of knowledge, but also a potential dead end for data.

“Many of them are defunct,” she says. “Somebody put data in them, and they’ve never submitted data again, and they’re not accepting submissions. They’re just sitting there. Others are actively being used.” To try to make some sense of this proliferation of databases, Rehm has reached out to Johan den Dunnen.

Den Dunnen maintains one of the oldest and most far-reaching variant databases, the Leiden Open Variation Database (LOVD), which seeks to catalogue every known variant, whether clinically significant or not. In his lab at Leiden University Medical Center in the Netherlands, den Dunnen researches the genetic causes of muscular dystrophy, and was one of the first geneticists to share his data online.

“As soon as the Internet was there, in 1995 or so, I decided to put all our findings, and the protocols we used, online,” he tells Bio-IT World. “I thought it would be great to have a resource that you could access within a minute, where you could find all knowledge about variants in the gene, and their relation to the disease.” As he adopted more sophisticated software, and became active with organizations like the Human Variome Project (HVP), he extended his curation efforts to more genes and other labs’ data.

In the process, he also became a harsh critic of researchers who are unwilling to share their data, and of small databases that don’t make an effort to vet their information and connect with the wider genetics community. “Everybody just starts a new database, without looking around at what’s available and suggesting collaborations to improve the existing databases,” he says. “I do think these small databases have a value, but they should move toward the future. A lot of them are bad on standards, on the software they use, so they would be even more useful when they are standardized more.”

Den Dunnen doesn’t agree perfectly with every aspect of ClinVar’s approach. For example, his work on non-clinical genes has given him a new perspective on terms like “pathogenic,” which he warns is easily misunderstood. “Everybody thinks ‘pathogenic’ means disease-causing,” he says, although even a variant firmly labeled pathogenic in ClinVar may not always cause disease, especially when dealing with recessive traits. “There’s a lot of meaning behind what you write that laypersons don’t have in mind.”

This becomes even more urgent in a database like LOVD, which also contains variants involved in traits like blood type and hair color. To avoid assigning any of these traits a “pathogenic” label, LOVD has adopted the neutral term “affects function.”

Nevertheless, the goals of large, rigorously-curated databases like LOVD and ClinVar align more often than not. Den Dunnen and Rehm have now teamed up to evaluate all of the small, locus-specific databases out there, reach out to their curators, and try to convince them to connect with their larger counterparts. “Hopefully we can clean up some of the problems that exist there, and for databases that don’t want to maintain the data, we can simply stick it into a centralized place,” says Rehm. “In other places, there are locus-specific databases that are maintaining a lot of detail on genes and variants that is not available in ClinVar… In that case, what I think is a valuable process is that the overall assertions are submitted into ClinVar, but we’re linking back to their databases where that richer detail can be found.”

“The model that I would like to support, and move towards, is one where there are a small number of well-maintained databases,” she adds. “It should be few enough that those databases can have real-time interfaces for data sharing between them, so they’re always kept in sync.”

“Competing on Knowledge”

Writing to the curators of each locus-specific database is a massive effort, but den Dunnen has always made this kind of outreach a priority. His biggest concern centers on labs that are discovering and studying new variants, but keep their data locked in private servers.

“I use a lot of my spare time writing mail asking people to share what they know, or inviting people to become a curator of a database,” he says. “I email people saying, ‘I know you have 10,000 variants in your database – why not share it with us, and make it known to the world?’… I’m always surprised at people who do DNA diagnostics, who don’t realize that the reason they can give an opinion on a variant is because others shared information on what they know.”

This frustration with data hoarding is a common theme in clinical genetics, especially because labs stand to personally gain from providing information to databases like ClinVar, which provide quality control by flagging discrepancies. Rehm recalls a precursor project to ClinVar that found some labs had discrepancies within their own databases, classifying the same variant differently on separate reports.

Despite these advantages, it does take some investment for a lab to get started in data sharing. “It’s a question of resources,” says Rehm. “It does require effort to get your data out of your system, and put it into a format that is shareable, and that is not a simple and straightforward process. When a lab has limited resources, and they’re trying to decide how to spend their energy, sharing data may not be at the top of the list.”

Den Dunnen also suggests that researchers may not always realize their data is valuable. “For most people, it’s standard to only report new findings, or variants that have not been seen before,” he says. “They forget that reporting a variant for the second or third or fifth time is also evidence, which is useful.”

In a few cases, labs may be trying to gain a proprietary edge by keeping their data to themselves. The most famous case of this is Myriad Genetics, which held a patent on the BRCA1 and 2 genes involved in breast cancer until that patent was overturned by the Supreme Court last year. BRCA mutations are among the most popularly known disease-risk variants, but surprisingly little consensus has been reached on which ones contribute to cancer risk, because so little data has been shared among researchers. Asked about BRCA testing, Bale told Bio-IT World, “We’re all on a learning curve with BRCA.”

“Because so many people are putting their data into ClinVar now, I think it’s going to get better,” she adds. Den Dunnen also notes that the LOVD has been receiving more data on BRCA recently from labs in the Netherlands and Belgium.

The traditional approach to convincing reluctant labs to contribute their data has been to reach out individually, and appeal to the benefits of data sharing. Lately, however, both LOVD and ClinGen have been trying more stringent measures. Den Dunnen has been contacting scientific journals on behalf of HVP, to ask that they make data sharing a requirement for publishing papers. Human Mutation was the first journal to adopt the requirement, in 2010, and the European Journal of Human Genetics has gone a step further, hiring LOVD curators to check that each paper’s variant descriptions have been accurately transmitted to a public database.

Rehm, meanwhile, is negotiating with the College of American Pathologists (CAP), a leading accreditation organization for diagnostic labs, to make data sharing a condition of membership. “I think it will be a stepwise process,” says Rehm, where CAP first surveys its members on data sharing, then recommends the practice, and eventually requires it. Healthcare providers like Geisinger Health System have also pitched in, agreeing to only order tests from labs that share their data.

If all else fails, open data advocates are prepared to take the nuclear option. “Some of the insurance companies are considering making it a requirement for reimbursement,” says Rehm. “So that’s going to be a big motivator.”

She cautions that, for genetic diagnostics to thrive, clinical labs like hers can’t afford to turn data silos into a sales pitch. Despite growing curation and outreach efforts, resistance to data sharing is still a tenacious obstacle to understanding the thousands of known variants that affect human health.

“How you go about creating a service – the turnaround time, the quality of your reports, what you charge, how you combine panels, what your offerings are – that’s what we should be competing on as laboratories,” she says. “We should not be competing on knowledge that saves patients’ lives.”