Abstract The role of rare genetic variation in the etiology of complex disease remains unclear. However, the development of next-generation sequencing technologies offers the experimental opportunity to address this question. Several novel statistical methodologies have been recently proposed to assess the contribution of rare variation to complex disease etiology. Nevertheless, no empirical estimates comparing their relative power are available. We therefore assessed the parameters that influence their statistical power in 1,998 individuals Sanger-sequenced at seven genes by modeling different distributions of effect, proportions of causal variants, and direction of the associations (deleterious, protective, or both) in simulated continuous trait and case/control phenotypes. Our results demonstrate that the power of recently proposed statistical methods depend strongly on the underlying hypotheses concerning the relationship of phenotypes with each of these three factors. No method demonstrates consistently acceptable power despite this large sample size, and the performance of each method depends upon the underlying assumption of the relationship between rare variants and complex traits. Sensitivity analyses are therefore recommended to compare the stability of the results arising from different methods, and promising results should be replicated using the same method in an independent sample. These findings provide guidance in the analysis and interpretation of the role of rare base-pair variation in the etiology of complex traits and diseases.

Author Summary There is now evidence that rare variants can contribute to the etiology of complex disease. Next generation sequencing technologies have enabled their detection in large cohorts, and new statistical methods have been proposed to ascertain their association with complex diseases and traits in order to improve power over single-marker analysis. Each of these new methods assumes a particular nature of the relationship between rare variants and complex disease, yet these hypotheses have been largely unverified. Therefore we sought to compare the power of commonly used and novel statistical methods for rare variants using Sanger sequencing data from 1,998 individuals sequenced at 7 genes by simulating several phenotypes under models spanning a spectrum of the common hypotheses concerning such associations. While all methods perform reasonably well under their own model-specific hypotheses, no single method gives consistently acceptable power when these hypotheses are violated. Unlike GWAS, wherein all variants can often be tested using the same method across the entire genome, the analysis and interpretation of sequencing studies will therefore be considerably more challenging.

Citation: Ladouceur M, Dastani Z, Aulchenko YS, Greenwood CMT, Richards JB (2012) The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals. PLoS Genet 8(2): e1002496. https://doi.org/10.1371/journal.pgen.1002496 Editor: Eleftheria Zeggini, Wellcome Trust Sanger Institute, United Kingdom Received: May 26, 2011; Accepted: December 8, 2011; Published: February 2, 2012 Copyright: © 2012 Ladouceur et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was support by grants from the Canadian Foundation for Innovation, the Canadian Institutes of Health Research (CIHR), Fonds de la recherche en santé du Québec, Ministère du Développement économique, Innovation et Exportation du Québec, the Lady Davis Institute and the Jewish General Hospital. ML, JBR, and ZD are supported by the CIHR. Data were provided by GSK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction There is growing evidence that rare variants contribute to the etiology of complex diseases [1], [2], [3], [4]. A striking difference in the distributions of the odds ratios (ORs) for common and rare variants has been illustrated in a wide range of recent publications, favoring higher ORs for some rare variants (reviewed elsewhere [5], [6], [7]). As well, it has been demonstrated that rare coding variants associated with complex traits are sometimes causal through amino acid substitution [3], [8], [9]. For these reasons, rare variants hold promise as a source of heritability which is not explained by common base-pair variants. Identifying rare variants associated with disease requires large sample sizes since few individuals harbor such polymorphisms. In addition, for rare variants, the power of single-marker tests, such as those performed by genome-wide association studies (GWAS), is poor. Development of alternative methods is thus essential. Over the past two years, a growing body of methods [2], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20] seeking to overcome this limitation has emerged. These methods generally employ three main strategies: collapsing markers across a region, weighting and/or prioritizing markers, and distribution-based approaches. Li and Leal [20], for example, proposed a method to collapse rare variants across a region. This and other collapsing methods are based upon the hypothesis that low-frequency variants are rare, but in aggregate, they may be common enough to account for variation in common traits. Under such models, it is assumed that the probability of being diseased increases with the number of rare minor alleles. However, this might not always be the case [21]. Weighting methods assign more importance to alleles based on many possible criteria, such as minor allele frequency (MAF) in the control population [17], or possible alterations in protein function, including measures produced by SIFT and Polyphen2 [11], [22]. More recently, methods examining changes in distributions associated with rare variants [2], [23] have been proposed. Liu and Leal [2] based their novel method on multi-locus genotypic configurations, where each unique pattern of genotypes is tabulated, and the associated risk of disease for each configuration is modeled using a mixture distribution. Liu and Leal refer to their method as a kernel-based approach (KBAC), since part of the mixture distribution is modeled by nonparametric kernel density estimation. Neale et al. [23] showed that a test of association can be based on binomial over-dispersion of variance, conditional on the number of rare variants present in a region. Another innovative and flexible method has been developed by Wu et al [24]. These authors proposed the sequence kernel association test (SKAT), a supervised, flexible, and computationally efficient regression model (with the possibility of adjusting for covariates), to test the association between rare and common variants and traits or disease status. SKAT is similar to a classical mixed model, and is based on a score test for non-zero variance associated with the effects of all the rare variants under consideration. These recently proposed models have often relied upon unverifiable (and sometimes unnecessary) hypotheses in order to simulate sequence data. Certainly, simulation of large sets of sequence data is a complex task and depends on hypotheses concerning the evolution of human genomic regions. The validity of any particular set of evolutionary hypotheses is unlikely to be consistently true across the [4] genome, as each gene demonstrates a large variance in these parameters [25], [26]. The performance of these newly proposed models using real sequence data in a large sample has not been independently evaluated. We therefore tested the power of commonly-used statistical methods designed to assess the impact of rare variants on continuous and dichotomous traits in 1,998 individuals Sanger-sequenced at seven genes. We employed a variety of possible relationships between genotype and phenotype in order to fully investigate the performance of such models under different realistic scenarios. We selectively chose some of the recently proposed statistical methods for rare variant association. These included: collapsing methods (with and without a variable minor allele frequency [MAF] threshold for defining rare variants), a weighting method (which assigns weights variants inversely proportional to their MAF), a variance-based approach [2], [11], [17], as well as a regression method using the Kernel association test (SKAT) [24]. We used the software provided by [11] to implement the collapsing and weighting methods. Four models were first investigated: a collapsing method using a threshold of 1% (T1) and 5% (T5), a weighted approach (WE), and a variable-threshold approach (VT) (see http://genetics.bwh.harvard.edu/rare_variants). (Note that while the WE method was implemented by [11], the model was proposed by Madsen and Browning [17]). In addition, we developed an approach for detection of rare-variant association with continuous traits that was inspired by KBAC [2], that we call “weighted outlier detection” (WOD). Two different MAF thresholds were applied to this new WOD method at 1% (WOD1) and 5% (WOD5) (see Text S1 for details). The last method we tested is the regression model (SKAT) developed by [24]. The relative power of each of these methods was then compared assuming different possible relationships between rare variants and continuous traits or disease status.

Discussion While many large-scale sequencing studies are now underway to identify rare variants associated with complex diseases and traits, our results demonstrate that assessing the association between rare variants and complex disease is a challenging task. Standard single-marker association methods exhibit low power and the power of the statistical methods tailored for rare variants varies tremendously depending on the true nature of the relationship between the rare genetic variants and the phenotype. These findings provide guidance in the design, analysis and interpretation of sequencing studies for complex disease. As it is still unknown how rare variants influence complex disease, we have simulated several phenotypes under models spanning a spectrum of the common hypotheses concerning such associations. It is likely that the nature of the relationship between rare variants and a phenotype varies from gene-to-gene. Our findings suggest that no single method gives consistently acceptable power across the range of these relationships, even in a large sample size. Analysis using different methods clearly imposes an additional multiple testing burden, which cannot be easily addressed. One, though somewhat cumbersome, way to solve this problem would be by derivation of empirical P-values taking into account the variety of methods tested. Another, more straightforward, approach would be to undertake replication in an independent sample, using the method which demonstrated best results at the discovery stage. In this paper, we have also developed a new method conceptually based on Liu and Leal's KBAC method [2] to detect the association between rare variants and quantitative traits. Our extension of [2] is implemented in R and is available from the authors. We have also developed a simulation framework to compare all major novel statistical methods to identify the contribution of rare variants to continuous phenotypes under identical conditions. Our new approach performs poorly if all rare variants act in the same direction, but performs well when variants can either increase or decrease phenotype and have large effect. We note that the presence of randomly assigned rare variants of smaller effect in size, all tests have a distribution of test statistics that follows the null distribution (see Text S1). Collapsing methods demonstrate increasing power when the trait varies with an increasing number of rare alleles. However, examples exist where protective and deleterious rare alleles are present in a gene [21], and in such situations, collapsing methods do not perform well. On the other hand, SKAT and WOD performed extremely well compared to other methods in the continuous traits scenarios, and dichotomous traits (SKAT only) scenarios, respectively. SKAT in particular, was the only method that performed well for dichotomous traits when variants could be protective or deleterious. Methods like WE that assign more weight to rarer alleles are promising, but only if the gene harbors several causal variants whose effects are each inversely proportional to their MAF. However, we note that the VT method still outperforms WE even when employing this assumption. Our study also provides empirical data to judge the value of dichotomizing a continuous trait and sequencing only its extremes. While our design included the extreme quarters of the distribution, thereby eliminating the need to sequence half the study population and consequently reducing sequencing costs substantially, we note that power was similar to that derived from the entire distribution particularly only when the proportion of causal variants was high and the effect sizes moderate. Nonetheless, sampling of the extremes remains an attractive study design, particularly if the sampled population is large and a more extreme sub-population is selected. Methods have been proposed to weight the relative importance of rare variants based on various parameters including their estimated deleterious effect on protein function [17], [27]. For example, the incorporation of estimated functional information, such as the potential effect of an amino acid change as estimated by Polyphen or SIFT, might improve power. However, these scores have been criticized for their high level of misclassification [22]. Moreover, functional prediction is more challenging when the variants are non-coding. The spectrum and frequencies of rare genetic variants are known to depend on ancestry and age of the population studied [28]. In this work, we have assumed that our sample consists of a homogeneous population without stratification into population subgroups. All the methods that we have examined could find false associations if population sub-strata existed and were associated with the phenotype, therefore particular attention must be paid to population structure when designing rare variant studies. One of the strengths of our study is the use of Sanger sequencing data, rather than simulated genotyping data. We have been able to avoid the simulation of such data by using fully Sanger-sequenced data on nearly 2,000 individuals at seven genes. Therefore, no genotypic hypotheses were made to generate the sequence data. Furthermore, the sample size employed is among the largest sequenced datasets in the world at present. Despite the fact that gene 3 had more missing data and fewer variants, we note that the power results derived from this gene are similar to all other genes. We note that our simulations assumed no additive effects when an individual carries multiple rare variants. However, we note that very few individuals carry 2 or more rare variants (Table 3). In addition, we assumed that rare variant effects take precedence over common variant effects. In light of our results, we recommend that single-marker tests should not be used alone when rare variants are present and are assumed to have small-to-moderate effects on the trait of interest. On the other hand, as power across all novel rare variants methods is generally low, the potential for identifying rare variant associations using gene-based analysis strategies requires improvement. Ideally, the true underlying nature of the association between the gene and the phenotype should determine the choice of statistical method, however, this relationship is almost always unknown. Therefore, performing sensitivity analyses, i.e., assessing different methods that perform differently under various conditions might be a helpful option in order to interpret the results. Furthermore we suggest that if one method identifies a gene of interest that replication of this result should be performed in an independent sample using the same statistical method. All methods seemed to perform adequately under their specific model hypotheses, but do not perform as well when these hypotheses are violated. In the next few years, advances in sequencing technology will enable the production of large quantities of sequence data on large numbers of individuals, allowing for the cost-effective identification of rare variants. These data will enable researchers to investigate the role that rare variants play in disease etiology, in addition to uncovering functional variants. Our results may provide guidance in the planning, analysis and interpretation of these large-scale initiatives.

Materials and Methods Ethics statement The work described in this manuscript represents a re-use of data and no new human interventions were conducted. No additional IRB approvals were sought for this specific portion of the work. The Committee on Ethics in Clinical Research, CHUV, Lausanne University, Lausanne, Switzerland approved the original protocols for sample collection. Study sample The subjects used in this paper are a subset of the CoLaus study, a population-based study of 6,188 Lausanne residents aged 35 to 75 years [29]. Sanger sequencing data Sanger sequence data for the exons and flanking regions of seven genes including PLA2G7 from 1,998 individuals were provided by GlaxoSmithKline (GSK). Methods for performing the sequencing for the PLA2G7 gene and the additional 6 genes have been described [30]. The identity of the remaining genes was not disclosed for proprietary reasons. Sanger sequencing has a low error rate and is considered a gold-standard for comparison to high-throughput sequencing studies [31], [32]. For simplicity, and since rare variants are not expected to be in high linkage disequilibrium (LD) with surrounding variants, we imputed the missing values of each rare variant independently from others based on the computed MAF. The percentage of missing genotypes per variant in a gene ranged from 3% to 11%, with an average of 5.5% individual missing genotype information per variant, across all genes (Table 2). All non-polymorphic base-pair markers were removed from the sequence data. All seven genes contained both rare and common variants: the number of polymorphic variants ranged from 29 to 128, and the proportion of variants with a MAF≤1% ranged from 81% to 93%. The majority of these variants were extremely rare, with an average of 55% of all variants across all genes being singletons. Table 2 and Table 3 describe the allelic frequencies, and rare variant distribution of all seven genes. We used these known genotypes combined with phenotype simulations to compare several commonly-used and novel statistical methods developed for rare variants and continuous phenotypes. Parameters influencing rare variant associations with complex traits We developed two simulation sets to illustrate the power of a variety of commonly-held hypotheses about the possible effects of rare variants on complex traits. In the first set, we tested collapsing and weighting designs and a range of general concepts about the potential role of rare variants, whereas in the second set, we varied the effect and the proportion of causal variants in across a grid of values. Control simulation sets We proposed different phenotype simulation scenarios to explore popular hypotheses regarding the mechanism by which rare variants could influence complex disorders, namely (a) the assumption that risk of disease increases with more rare alleles (collapsing design), (b) the assumption that the magnitude of the effect depends on MAF (such as equation (1) in [17] for the weighting design), and (c) performance when a mixture of deleterious and protective causal rare variants influences phenotypes (Table 1). Here we describe the motivation behind our choice of scenarios. Scenario 1, the null model, contains no causal variants. Scenario 2 assumes that any rare variant increases the risk of disease, which reflects the hypothesis underlying many of the proposed statistical methods. Scenario 3 investigates a mixture of common and rare causal variants, Scenario 4 investigates a mixture of deleterious and protective effects, and Scenarios 5 and 6 explore the assumption that variants with lower MAF have larger effect. In these cases, the effects were derived from equation 1 in [17]. In our simulation of phenotypes, the following rules were applied in all scenarios. We assumed that all non-carriers of a causal allele (deleterious or protective) variant have a normally distributed trait with mean zero and variance of one, using a standard normal random variable. When one or more common variant(s) is/are assumed to have a deleterious effect, and an individual is carrying at least one of these causal alleles, we randomly drew a phenotypic value from a normal distribution having a mean of −0.07 and a standard deviation of 1.01, which allows for an effect typically identified in GWAS studies of continuous traits [33], [34], [35]. When a rare variant is assumed to be deleterious, carriers of at least one rare causal allele had a phenotypic value randomly sampled from a normal distribution with mean at −1.64, and standard deviation of 0.2. Relative to the phenotype distribution of individuals with no causal variants, these means correspond to the bottom 5% of the distribution. Similarly, to model protective effects of a rare variant, the assigned effect was normal with mean +1.64 and a standard deviation of 0.2. Such effects for rare variants have been observed in the lipid literature [36], [37]. Deleterious variants were randomly sampled from the pool of variants for each simulation. Rare variants were defined as those having a MAF 1%, and common variants were defined as >1%. While other thresholds can be used, GWAS have often used a 1% threshold to define rare variants [35]. We allowed all rare variants to be possibly causal, including singletons. Table 1 summarizes the parameters investigated. By varying hypotheses about the sampling of causal variants and their effect, we created these 6 simulation scenarios. We randomly generated a set of 250 phenotypes per individual, per scenario, per gene. In each case, we randomly selected causal variants associated with the traits, and then randomly generated a set of phenotypes based on the corresponding parameters for each iteration. Systematic simulation sets In our second series of simulations, we varied the proportion of causal rare variants and their average effect on the phenotype across a grid of values, i.e., where proportions (10, 15, 20, and 30%) of causal rare (MAF≤1%) variants were combined with values (0.5, 0.75, 1, 1.25, 1.5, 2, and 2.5 standard deviations) for the mean effects. We also report in Table 4 the proportion of variance explained by rare variants for each combination of proportion of causal rare variants and their effect. An individual carrying at least one rare causal allele has their phenotype value chosen randomly from a normal distribution with one of these seven means and with a standard deviation of 0.2. All 28 combinations between the proportion of causal (four values) and effect (seven values) were simulated for the seven genes. Two hundred and fifty sets of phenotypes were generated. Multiple-testing was taken into account for single-marker test analyses, using a conservative approach with Bonferroni correction for the number of single-nucleotide polymorphisms (SNPs) tested. As for other rare variant methods, permutation was used to control for type-I error in all statistical methods. Alpha level was set to 0.05. We also simulated dichotomous phenotypes by assuming selection from the extremes of a quantitative distribution. In each of the 196 scenarios presented above for continuous traits, we have defined cases as being the 500 individuals with the lowest continuous phenotypes, and the controls as being the 500 individuals with the highest continuous phenotypes. This study design allows direct comparison of the relative utility of sequencing only the extremes of a distribution, as compared to the entire distribution, which has considerable financial ramifications

Supporting Information Text S1. Details on the WOD method, additional details on how the phenotypes were simulated for each scenario, and QQ plots under the null hypothesis for all methods, and for genes 6 and 7. https://doi.org/10.1371/journal.pgen.1002496.s001 (DOCX)

Acknowledgments The authors are indebted to GSK for the provision of data. The authors thank GlaxoSmithKline and co-PIs of the CoLaus study, Gerard Waeber and Peter Vollenweider, for the use of the resequencing data, as well as Drs. Matthew R. Nelson and Margaret G. Ehm for their helpful suggestions. We would also like to thank Ms. Renee Atallah for her efforts with the writing and correction of the manuscript.

Author Contributions Conceived and designed the experiments: ML YSA CMTG JBR. Performed the experiments: ML CMTG. Analyzed the data: ML. Contributed reagents/materials/analysis tools: ML ZD YSA CMTG JBR. Wrote the paper: ML ZD YSA CMTG JBR. Designed the software used in analysis: ML CMTG. Developed the WOD algorithm: CMTG. Obtained permission for the data: JBR. Performed statistical analysis and simulations: ML.