Three families were identified by the Manton Center for Orphan Disease Research to serve as test cases for the CLARITY challenge on the basis of having a child with clinical manifestations and/or pedigree structure suggestive of a likely genetic disease (Table 1). The clinical study reported here was performed under the auspices of the Boston Children’s Hospital Institutional Review Board (IRB) under Protocol IRB-P00000167. The organizing team worked closely with the IRB to define a protocol that protected the families’ interests, as well as the patients’ rights and prerogatives, yet allowed them to share their de-identified medical histories and DNA sequences with teams of qualified competitors around the world.

Table 1 Clinical findings in challenge families Full size table

DNA samples and medical records from 12 individuals in total were collected under informed consent. Probands and their parents (i.e., trios) were enrolled from Families 1 and 3, and two affected first cousins and their parents were enrolled for Family 2. WES for all 12 participants was performed and donated by Life Technologies (Carlsbad, CA, USA), using standard protocols for the LIFE Library Builder, and sequenced with Exact Call Chemistry on SOLiD 5500xl machines. Both raw reads (XSQ format) and aligned reads (BAM format, generated with LifeScope [22]) were provided.

WGS for ten individuals (excluding an affected male cousin of the Family 2 proband and the cousin’s unaffected mother, for whom sufficient DNA was not available) were donated by Complete Genomics Incorporated (Mountain View, CA, USA) utilizing their standard proprietary protocols and generated using their Standard Pipeline v. 2.0. Variant call files along with aligned reads in Complete’s proprietary format, ‘masterVarBeta’, were provided.

Comprehensive clinical summaries providing clinical and diagnostic data for the presenting complaints and significant secondary findings were prepared by Manton Center staff from the primary medical records and made available on a secure server to the contestants, together with the genomic data described above.

Contestants were solicited from around the world via professional contacts, word of mouth, and an external website [14]. Forty teams applied to participate in the Challenge, 32 of the most experienced multidisciplinary groups were invited to compete, and 30 accepted the offer. Participants – working either independently or as teams – were tasked with working toward an analysis, interpretation, and report suitable for use in a clinical setting.

At the conclusion of the Challenge, 23 teams successfully submitted entries that included descriptive reports of their bioinformatic analytical strategies with rationale, examples of data output and tables of variants, and clinical diagnostic reports for each family. Some groups also provided examples of their patient education materials, informed consent forms, preference setting documents, plans for revisable reporting, and protocols for dealing with incidental findings. Reasons given by four of the seven non-completing teams for dropping out were: technical and management issues, personnel changes within the team, inability to finish on time, or difficulty re-aligning the WES datasets (N = 1 each). The other three teams gave no reason.

The 23 completed entries represented a diverse group of approaches and treatments, with some groups focusing almost entirely on bioinformatic issues, others on clinical and ethical considerations. The most compelling entries including a detailed description of the bioinformatic pipelines coupled with clear, concise, and understandable clinical reports. Among the 23 entries, multiple genes were listed as possibly causative for all families (25 for Family 1, 42 for Family 2 and 29 for Family 3). Nevertheless, a consensus was achieved regarding probable pathogenic variants in two of the families. In Family 1, mutations of the titin gene, TTN [Online Mendelian Inheritance in Man (OMIM) 188840/603689], recently reported to cause a form of centronuclear myopathy [23], were identified as possibly or likely pathogenic by 8/23 groups, and 6/23 groups reported GJB2 (OMIM 121011/220290) variants as the likely cause of the hearing loss in the proband. Similarly, 13/23 groups identified and reported a variant in TRPM4 (OMIM 606936/604559) [24] as likely responsible for the cardiac conduction defects in Family 2. Although no convincing pathogenic variants were identified for Family 3, there were two plausible candidates requiring further study, OBSCN and TTN, mentioned by six groups each (Table 2).

Table 2 Genetic variants Full size table

Following the independent review and discussion by the panel of judges, one ‘winner’, the multi-institution team led by Brigham and Woman’s Hospital, Division of Genetics, et al. (Boston) was selected, largely on the basis of having a solid pipeline that correctly identified most of the genes judged to be likely pathogenic, as well as for having clear and concise clinical reports that were judged to be best at conveying the complex genetic information in a clinically meaningful and understandable format. Two runners-up were also cited. The first was a combined team from Genomatix (Munich, Germany), CeGaT (Tübingen, Germany) and the University Hospital of Bonn (Bonn, Germany), which had a robust pipeline that correctly identified every relevant gene in clear clinical reports. The second was a team from the Iowa Institute of Human Genetics at the University of Iowa, which had an outstanding array of patient education materials, procedures for patient preference setting and dealing with incidental findings, and policies for transfer of results of uncertain significance to an appropriate research setting if so desired by the patients. The content of the three winning entries is available as Additional files 1, 2 and 3. Five additional teams were cited for ‘honorable mention’ for having pipelines that identified one or more of the likely ‘correct’ genes and for providing clear clinical reporting (Table 3). These eight teams recognized by the judges are defined as ‘finalists’ in the text and for purposes of statistical analysis.

Table 3 Challenge participants Full size table

Criterion 1 (pipeline): what methods did each team use to analyze and interpret the genome sequences?

Bioinformatic analysis

The particulars of the bioinformatic pipelines, variant annotation and report generation approaches employed by the contestants are summarized in Table 4.

Table 4 Pipeline elements and characteristics of successful CLARITY entries Full size table

Alignment

The majority of contestants chose to use the supplied alignments of the data. This is not surprising since the read data from Complete Genomics and SOLiD require special handling due to the nature of sequencing, split reads in the former, and potential for color-space reads in the latter. However, three teams were unable to read the data formats provided and did not submit complete entries.

Alignments were recomputed for the Complete Genomics data by 5 out of 21 teams, with only one team reporting use of the aligner DNAnexus (Palo Alto, CA, USA), while 8 out of 21 teams recomputed alignments for the SOLiD data. For the SOLiD data, five teams recomputed alignments with software aware of color-space, and two teams indicated that they compared their color-space results against a base-space aligner. Reported aligners used for SOLiD data included the LifeScope aligner, BFAST [25], BWA [26–28], Novocraft’s novoalignCS (Selangor, Malaysia) and the Genomatix aligner (Munich, Germany), with some teams utilizing multiple tools for comparison. One team performed error correction prior to alignment for the SOLiD data using LifeScope’s SAET (SOLiD Accuracy Enhancement Tool, Carlsbad, CA, USA).

Prior to variant calling, many teams removed read duplicates using Picard [29] or SAMtools [30], while some teams omitted this step due to the danger of removing non-duplicate reads from single-end data. Using WGS and WES data together gave an additional way to account for PCR duplication. Limited quality control (QC) was performed prior to variant calling, with a single team using BEDTools [31] to analyze coverage QC metrics, and one other team reporting custom mapping QC filters.

Variant calling

O’Rawe et al. suggested that the choice of pipeline might be a significant source of variability in the outcome of NGS analyses [32]. Of the teams, 40% used both the Gene Analysis Toolkit (GATK) [33, 34] and SAMtools [30] for variant calling, with the majority using at least one or the other. This indicates that while there is not complete consensus, using GATK, SAMtools or both resulted in acceptable results for the challenge. While GATK and SAMtools are the most popular variant callers used today and reported in this survey, their relative performance has been shown to vary with the sequencing depth [35, 36], and direct comparison of variant calls resulting from a parallel analysis of the same raw data by different variant-calling pipelines has revealed remarkably low concordance [32], leading to words of caution in interpreting individual genomes for genomic medicine.

SAMtools was used by some teams to jointly call SNPs and indels while recalibrating quality scores, while other teams used GATK to call SNPs and indels separately. Teams using GATK typically followed the Broad Institute’s best practice guidelines, performing indel realignment prior to indel calling, base quality score recalibration prior to SNP calling, and variant-calling score recalibration after variant calling. Some teams ignored GATK’s base quality score recalibration, mentioning that at the time GATK did not support SOLiD error profiles. LifeScope software containing DiBayes was also used on SOLiD data to call SNPs, and with local realignment to call small indels. In some cases, multiple variant-calling methods were used and compared, with all but one using GATK, SAMtools or some combination thereof. Other tools used with one mention each include: the DNAnexus variant caller, FreeBayes [37] and Avadis NGS (v1.3.1). A number of teams utilized the WGS results from Complete Genomics to look for potentially pathogenic de novo copy number variants, but none were found.

A significant source of variation among the different entries was the number of de novo mutations reported. Less than five de novo mutations per exome, and only about 75 de novo mutations per genome, are expected for each trio [38, 39], yet some groups reported much higher numbers, recognizing that many of these changes fell within areas with low or poor coverage. Groups that used a family-aware zygosity calling approach, such as the GATK module ‘Phase by Transmission’, developed much more refined lists of only a few potential de novo variants per proband, demonstrating the importance of this approach. However, several teams reported problems using the SOLiD data for this analysis as the BAM format provided by SOLiD was different from that expected by GATK, limiting the analysis to Complete Genomics data in those cases.

Variant filtering or recalibration after initial variant calls was performed by 16 out of 20 teams. Six teams used GATK variant quality score recalibration, with other teams reporting use of custom tools. Some teams used BEDTools for coverage QC metrics, but there was no consensus on tools to report sequencing and analysis QC metrics for post-alignment and variant calling.

Teams were asked if they employed any reference datasets in calling variants or comparing datasets to known variants (e.g., batched variant calls, known variant lists, etc.). The most common reference data reported included variants from the 1000 Genomes Project, dbSNP [40], HapMap Project [41], NHLBI Grand Opportunity Exome Sequencing Project (Bethesda, MD, USA), and the GATK Resource Bundle (distributed with GATK). Other reference datasets mentioned were the Mills Indel Gold Standard [42], NCBI ClinVar (Bethesda, MD, USA) as well as public sequencing data produced from the technologies used in this challenge.

Coverage analysis

One limitation of exome and genome sequencing is that the low/no coverage regions can lead to false positive or false negative results (sometimes 7% to 10% of the exons of the genes of interest have insufficient sequence reads to make a variant call [43]). Only 42% of teams quantified and reported on regions with insufficient coverage or data quality, though 50% of the finalists and two of the top three teams did.

Variant validation

Many clinical diagnostic protocols still require independent confirmation of NGS results, often by Sanger-based resequencing studies, to validate clinically relevant findings. Although this was not possible in the context of a competition where the contestants did not have access to DNA from the participants, 11 groups took advantage of the independently derived WES and WGS datasets to cross-check and validate their findings. In every instance except two, the teams reported concordance between the variant calls for the TTN, GJB2, and TRPM4 mutations that were considered likely pathogenic. The exceptions were both related to calls that were considered false positives in the SOLiD data due to poor quality or coverage at the GJB2 and TRPM4 loci, respectively. The GJB2 findings had previously been clinically confirmed and the contest organizers subsequently arranged for independent research and clinical testing, which confirmed the TTN and TRPM4 variants as well.

Medical interpretation of variant lists

The most frequent methods used to annotate variants reported were Annovar [44] (52%), in-house developed software (17%), and Ingenuity (Redwood City, CA, USA) (12%). Other tools reported were Variant Tools [45], KggSeq [46], SG-ADVISER (Scripps Genome Annotation and Distributed Variant Interpretation Server, La Jolla, CA, USA), Genome Trax (Wolfenbüttel, Germany), VAAST (Variant Annotation and Search Tool) [47], Omicia Opal [48], MapSNPs [49], in-house pipelines, and combinations thereof. There were a large variety of annotation sources (see Table 4), including but not limited to: OMIM [50], Uniprot [51], SeattleSeq [52], SNPedia [53], NCBI ClinVar, PharmGKB [54], Human Gene Mutation Database [55], dbNSFP [56], and in-house annotations. More importantly, most teams (14/20, 70%) performed their own curation of annotations, for example, by performing a medical literature review or by checking for errors in externally accessed databases. Thus, a manual review of annotations was deemed necessary by most contestants. Many teams considered the family pedigree structure as an important input for evaluating variants, as this allowed identification of potential de novo mutations, filtering for dominant inheritance in Family 2, ensuring Mendelian segregation and carrier status in parents for recessive mutations, etc. The function was largely performed manually, but use of automated tools such as the GATK module ‘Phase by Transmission’ was considered by some groups although the underlying structure of the SOLiD data led to problems with the analysis.

Reasons given for why teams did not report each of the likely pathogenic variants in Families 1 and 2 varied by gene and by team, but in many instances, were due to decisions made during the medical interpretation phase of analysis. Of the 15 teams that did not report the TTN variants for whom survey data were available, the variant calls generated by three failed to identify them. Twelve groups reported that their variant callers identified the two variants, but in six of these, automatic filters eliminated the gene from further consideration because the frequency of potentially pathogenic variants in this enormous gene was considered too high to be credible as a likely disease gene. Of the six instances where the automated pipelines reported the variants as potentially pathogenic, five were subsequently manually eliminated from further consideration because medical consultants lacked the clinical expertise or did not believe the published association with cardio- or skeletal-myopathy because of the high frequency of missense changes in the normal population. Notably, in none of the exclusions based on the high degree of heterogeneity of the gene was a distinction made between predicted truncating mutations, which are much rarer, versus more common missense changes. In one instance, a simple programming error prevented TTN from rising to the top of the candidate gene list in an automated expert system, and subsequent correction of this mistake resulted in a correct call of likely pathogenicity for the TTN variants in Family 1.

Seventeen teams reported not flagging the GJB2 mutations as likely causative for hearing loss in the proband of Family 1. Remarkably, the variant callers employed by ten teams failed to identify these changes despite the fact that seven of these teams used either GATK and/or SAMtools. Among the remaining seven teams, two ignored the findings because they were considered irrelevant to the ‘primary phenotype’ of skeletal myopathy and two reported a lack of clinical expertise necessary to recognize that hearing loss was a distinct phenotype. The remaining three teams reported that one of the previously published known pathogenic variants was automatically filtered out due to its high minor allele frequency in normal populations.

The TRPM4 variant in Family 2 was clinically reported by 13 of the 23 teams. Only two teams cited failure of their variant callers to identify this mutation, but five more reported that the variant was discarded due to poor quality data (low depth and noisy location with multiple non-reference alleles at that location in the SOLiD data) in one of more of the individuals, which led to inconsistent calls among the different affected family members. Two groups failed to recognize the likely pathogenicity of this variant; one reported it as a variant of unknown significance while the last one’s computational genetic predictive scoring simply failed to weight this gene highly enough to pass the cutoff given their entered phenotypic parameters. The remaining group identified the TRPM4 variant, but strongly favored another variant in the NOS3 gene as a better explanation for the structural heart defects.

Pathogenicity prediction of missense variants

The most common tools to tackle the problem of determining the effect of amino acid substitutions on protein function for missense mutations were SIFT [57] and Polyphen [49]. While 80% of teams used both SIFT and Polyphen to predict pathogenicity, there was no significant difference in the success of the teams using both SIFT and Polyphen and those who used one or the other or some other tool entirely. Other tools listed by teams were PhyloP [58], likelihood ratio test scores (LRT) [59], MutationTaster [60], GERP [61], and in-house developed tools. Also of note: 45% of teams attempted to assess the statistical confidence of assignment of pathogenicity (63% of finalists). Methods named included custom in-house methods (N = 3), considering gene size (N = 2), utilizing known predictions of pathogenicity (N = 3) and allele frequencies (N = 2), assessing commonly mutated segments (N = 2), and using true positive and neutral datasets within a Bayesian framework (N = 1).

Use of splice prediction tools is particularly important, as approximately 14% to 15% of all hereditary disease alleles are annotated as splicing mutations [55]. Groups that utilized a suite of splice prediction tools, such as the maximum entropy model MAXENT [62], ExonScan [63] or positional distribution analysis [64, 65], were more likely to have identified potentially pathogenic mutations, particularly in the TTN gene in Family 1.

It was well recognized by all groups that allele frequency is an important consideration in assessing pathogenicity (though specific cutoffs were not mentioned). All groups also agreed that conservation of amino acid sequence across species is useful for interpretation of missense variants. Half of the teams (63% of finalists) took advantage of the whole genomic sequences to analyze non-coding variants, but none of the teams reported potential pathogenic changes in deep intronic or intergenic regions, even for Family 3, likely largely due to the undefined and uncertain status of such variants. Of teams that reported methods for predicting pathogenicity of non-coding variants, the most frequently used methods were splicing prediction algorithms (85%) and transcription factor binding site prediction (46%), with 23% also considering changes in known promoter/enhancer elements, and one team each assessing evolutionary conservation, DNase hypersensitivity sites and microRNA-binding sites.

Medical interpretation and correlation of pathogenic variants with the clinical presentations

Almost all entrants performed a clinical correlation at the level of a single general diagnosis such as ‘myopathy’, ‘centronuclear myopathy’ or ‘nemaline myopathy’ with a list of predetermined candidate genes. From a clinical perspective, this reduces clinical diagnostic decision support to a list or panel and counts on that subset being complete for maximum sensitivity. However, in the case of Family 1, for example, the likely pathogenic gene was not generally recognized as causative for centronuclear myopathy at the time of the contest. In contrast, one entrant used clinically driven diagnostic decision support [66] in which the clinical analysis was carried out based on a description of the patient’s various pertinent positive and pertinent negative findings, including their age of onset. This was then paired to the genome analysis in a way that used a novel pertinence calculation to find the one or more genes among those with described phenotypes that best explains the set of pertinent positive and negative findings [66]. As they become refined and validated, such automated approaches will become a critical aid in the future for reducing the analysis times to a manageable level necessary to support the higher throughputs required in a clinical diagnostic setting. Indeed, the reported range of person-hours per case required for medical interpretation of each case was 1 to 50 hours, with the automated approach requiring less than 4 hours on average to complete.

Attitudes and remarks

Three teams were unable to read the data formats provided and did not submit complete applications. This likely reflects the unique nature and format of SOLiD and Complete Genomics data and suggests that greater adoption of standard formats (FASTQ, SAM/BAM and VCF) for bioinformatics tools is required.

We observed that finalists were significantly more likely to express a preference for generating their own sequencing data instead of having it generated by an external sequencing provider (75% versus 27%, P = 0.041). The main reason expressed for in-house data generation was control over the sequencing process to ensure production and assessment of high quality data. Other reasons expressed included cost, turnaround time, and ability for reanalysis. This preference may also reflect a tendency for the most experienced groups to have a legacy capacity to generate sequence data, and thus a bias towards using their own capacity. However, it also raises the reasonable possibility that integrated control of the process from sequence generation through variant calling is important for producing the highest quality variant calls.

Overall, the teams when asked for reasons for their preference in their preferred sequencing technology mentioned accuracy and standardized software tools, highlighting the need for standard methods and tools for primary bioinformatics analysis. Furthermore, the majority of teams (13/18) felt that NGS should be combined with classical techniques (e.g. Sanger sequencing and PCR methods) for confirmatory testing in clinical situations. However, a few recognized that with increasing depth of coverage and accuracy of alignment, NGS, particularly of less complex libraries such as gene panels and possibly exomes, had potential to be utilized as a stand-alone test once QC studies demonstrate sufficient concordance with traditional methods.

Interestingly, all four of the finalists that did not report low-coverage or uncallable regions reported that they were going to begin doing so, whereas one of the non-finalists mentioned that they were going to add coverage quality to their reports. Regions in which sequencing technology or reference-genome-specific difficulties exist are important considerations for accurate variant detection. Moreover, it is critical to provide locations in which variant calling is not possible due to lapses in coverage.

Teams had different opinions on the level of coverage they felt was necessary for accurate variant calling from NGS of whole genomes. The finalists reported that they felt a higher level of coverage was necessary (59× average) than the rest of the teams (38× average). Similarly, the finalists differed on the coverage required for whole exomes (74× versus 49×) or gene panels (121× versus 69×).

A large majority of the teams used SIFT and Polyphen to predict the pathogenicity of a variant, which is a sound strategy given the programs do not always agree in protein predictions, and in both, specificity is reported to be high but sensitivity low [67].

When asked about their process used to validate pathogenicity predictions, 58% of teams reported that they did not use any validation method, or did not have any datasets to compare estimates against. The finalists were more likely to have had in-house datasets to work against, which may be due to differences in analytical resources that could be devoted to this problem. Overall, this process was reported as manual for the majority of the teams.

The diversity of approaches to preparing the contest entries made direct comparisons of methods difficult, so the post-contest survey was designed to elicit a more homogeneous dataset. Nevertheless, several contestants neglected to respond to some of the questions, and the responses to others was variable, indicating some confusion on the part of respondents regarding the intent of the query.

Criterion 2: were the methods used efficient, scalable and replicable?

There are still some manual elements to many pipelines that inhibit scalability. For an average case, teams reported that the interpretation process ranges from 1 to 50 hours (mean 15 ± 16 hours). For the CLARITY challenge, the time spent was much greater: each case took from 1 to 200 hours (mean 63 ± 59 hours). The average CPU time required for the analyses was difficult to estimate as contestants utilized different approaches, and not every entry was normalized for the number of parallel processors, but contestants reported utilizing 306 ± 965 CPU hours per case (range 6 to 8,700 hours). Reported costs to run the pipeline also varied considerably ranging from USD 100 to USD 16,000 (average USD 3,754 ± 4,589), but some contestants were unable to calculate salary costs leading to some lower estimates. Although costs have fallen dramatically, and computational resources are becoming increasingly available, the requirement for manual curation and interpretation of variant lists remains a considerable barrier to scalability, which could inhibit widespread use of NGS exome and genome diagnostics in the clinic if well-validated and substantially automated annotation tools do not emerge.

Criterion 3: was the interpretive report produced from genomic sequencing understandable and clinically useful?

Consent and return of results

When asked about their approach to consenting and return of results in the survey, teams’ responses varied considerably. The question was irrelevant for a number of contestants (9/21) whose activities were restricted to research or contract sequencing without direct patient contact. Finalists were more likely to ask patients undergoing WES/WGS to sign a specific consent form or provide specific explanatory materials for the methodology (P = 0.057). Finalists were much more likely to detail how they were going to handle incidental (i.e., unanticipated) results (P = 0.002). However, only 35% of teams reported that their consent materials include an option for patients to express their preferences around the return of incidental results. Most teams (76%) reported that they did not provide examples of consent and/or explanatory materials for patients with their CLARITY submissions, and since patient interaction was not allowed for the challenge, a number of contestants simply considered the issue moot. However, upon reflection, many teams agreed that including consent and explanatory materials would have strengthened their entries.

Overall, it is notable that most teams’ submissions did not include specific consent and explanatory materials, did not detail a predetermined approach for handling incidental results, and did not describe any options for patient preferences. In some cases, survey responses indicated that such materials and plans are used in practice but were not included in the CLARITY Challenge submission because it was not clear that such content was in the scope of the challenge. In other cases, teams reported that they have not developed these materials and plans or they do not routinely focus on this aspect of the process. These findings highlight the fact that these components, though they are essential for the patient-facing implementation of clinical sequencing, are not consistently prioritized or highlighted by many groups involved in the clinical use of NGS.

Reporting methods

Reporting methods were not uniform amongst teams. Reporting the accession number for cDNA reference sequences was significantly more frequent in finalists than in non-finalists (87% versus 22%, P = 0.009). However, teams did converge on some items: reporting zygosity was standard, with 88% of responding teams doing so. Reporting the genome build was also specified by 72%. That said, the genome build reporting was problematic even among the winning teams; two of the finalists submitted elegant reports, clearly stating the variants found, summarizing the location, the classification and the parental inheritance, with a short interpretation (Figure 1). However, the accession numbers reported were different: a different build was used in each report and not specified, so it would take considerable effort to discern whether the two reports were truly referring to the same variants.

Figure 1 Representative clinical report from two of the finalist teams (A and B). Desirable elements include subject demographics, indication for testing, use of HUGO-approved gene symbols, specification of the relevant variants at the genomic DNA, cDNA and protein levels including reference sequences and dbSNP identifiers, description of zygosity, estimation of insufficient coverage for candidate genes, and succinct clinical interpretation and interpretative summary. Note the use of different reference sequences, and the lack of specification in (B) makes direct correlation between reports difficult. Full size image

Clinical reports

Finalists were more likely to present a clinical summary report with their entry, with the trend approaching significance (100% versus 69%, P = 0.089). Perhaps in response to recently published guidelines [68], there was striking concordance in interpretation and reporting philosophy, with all finalist and most non-finalist teams gearing their reports towards a clinical geneticist, genetic counselor or non-geneticist clinician. Almost all teams agreed that a non-geneticist clinician should be the target audience of clinical summary reports (75% of finalists and 89% of non-finalists). Finalists were more likely to feel that their clinical summary report could be used in clinical care (100% versus 67%, P = 0.08), though there was overall agreement that it was important that NGS studies produce a clinical summary report that can be implemented in the clinic (95% ranked this as ‘important’ or ‘extremely important’). Most of the teams (80%) filtered their variant list by relevance to phenotype, with more successful teams more likely to do so (P = 0.074). All teams but one finalist (95%) agreed that filtering the variant list by relevance to phenotype is an appropriate method for communicating information to clinicians.

It is still not commonplace to consult with an expert physician during report preparation, but doing so clearly correlated with success. Only 61% of teams routinely consult with a medical doctor in a relevant disease area. Finalists were significantly more likely to involve clinicians on a regular basis (100% versus 36%, P = 0.001). Perhaps related, in their reports prior to the survey, all but one of the finalists considered the hearing loss to be a separate phenotype from the myopathy in Family 1, while only 36% of the less successful teams did (P = 0.059). Of those who considered the separate phenotype, 75% of finalists and 63% of non-finalists considered its genetic basis.