Persistently low white blood cell count (WBC) and neutrophil count is a well-described phenomenon in persons of African ancestry, whose etiology remains unknown. We recently used admixture mapping to identify an approximately 1-megabase region on chromosome 1, where ancestry status (African or European) almost entirely accounted for the difference in WBC between African Americans and European Americans. To identify the specific genetic change responsible for this association, we analyzed genotype and phenotype data from 6,005 African Americans from the Jackson Heart Study (JHS), the Health, Aging and Body Composition (Health ABC) Study, and the Atherosclerosis Risk in Communities (ARIC) Study. We demonstrate that the causal variant must be at least 91% different in frequency between West Africans and European Americans. An excellent candidate is the Duffy Null polymorphism (SNP rs2814778 at chromosome 1q23.2), which is the only polymorphism in the region known to be so differentiated in frequency and is already known to protect against Plasmodium vivax malaria. We confirm that rs2814778 is predictive of WBC and neutrophil count in African Americans above beyond the previously described admixture association (P = 3.8×10 −5 ), establishing a novel phenotype for this genetic variant.

Many African Americans have white blood cell counts (WBC) that are persistently below the normal range for people of European descent, a condition called “benign ethnic neutropenia.” Because most African Americans have both African and European ancestors, selected genetic variants can be analyzed to assign probable African or European origin to each region of each such person's chromosomes. Previously, we found a region on chromosome 1 where increased local African ancestry completely accounted for differences in WBC between African and European Americans, suggesting the presence of an African-derived variant causing low WBC. Here, we show that low neutrophil count is predominantly responsible for low WBC; that a dominant, European-derived allele contributes to high neutrophil count; and that the frequency of this allele differs in Africans and Europeans by >91%. Across the chromosome 1 locus, only the well-characterized “Duffy” polymorphism was this differentiated. Neutrophil count was more strongly associated to the Duffy variant than to ancestry, suggesting that the variant itself causes benign ethnic neutropenia. The African, or “null,” form of this variant abolishes expression of the “Duffy Antigen Receptor for Chemokines” on red blood cells, perhaps altering the concentrations and distribution of chemokines that regulate neutrophil production or migration.

Funding: Research support for JHS was provided by R01-HL-084107 (JGW) from the National Heart, Lung, and Blood Institute and contracts N01-HC-95170, N01-HC-95171, and N01-HC-95172 from the National Heart, Lung, and Blood Institute and the National Center on Minority Health and Health Disparities. Research support for Health ABC was provided by the Intramural Research Program of the National Institute on Aging, and contracts N01-AG-6-2101, N01-AG-6-2103, and N01-AG-6-2106. The Atherosclerosis Risk in Communities Study is a collaborative study supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022. Support for the ARIC admixture mapping studies was provided by R21DK073482 and K01DK067207 (WHLK). Genotyping for both the JHS and Health ABC was supported by grant U54 RR020278 from the National Center for Research Resources to the Broad Institute of Harvard and MIT; a subsidy from this grant covered half the cost of Health ABC genotyping. DR was supported by a Burroughs Wellcome Career Development Award in the Biomedical Sciences, and methodological and statistical analysis was supported by grant U01-HG004168.

This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

In the present study, we narrowed the region of association from 900 kb to a single base pair substitution that is likely to have a strong effect on variation in WBC. To achieve this, we increased our sample size from 1,550 in the initial study to 6,005, by pooling samples from the Jackson Heart Study (JHS), the Health ABC Study, and the Atherosclerosis Risk in Communities (ARIC) Study. We found that neutrophil count is responsible for the vast majority of the WBC association at the locus, and therefore focused on neutrophil count in the current analysis. We also showed that the genetic change that is probably responsible is the Duffy Null polymorphism (rs2814778, also called FY+/−), which is already known to protect individuals of African descent against Plasmodium vivax malaria infection [7] , [8] , and which has recently been associated with susceptibility to HIV infection and rate of progression to AIDS [9] . Our identification of this polymorphism as the probable cause of benign ethnic neutropenia should prompt further investigation of its effects on hematopoiesis and immunity.

A large proportion of healthy African Americans have been observed to have a white blood cell count (WBC) that is persistently lower than the normal range defined for individuals of European ancestry [1] – [5] . This condition, called “benign ethnic neutropenia”, can have important effects on medical decision-making, since WBC is a valuable indicator of immunocompetence, infection, and inflammation. To seek the genetic basis of benign ethnic neutropenia, we recently carried out an admixture mapping analysis in which we identified a locus on chromosome 1 where local inheritance of African or European ancestry is sufficient to account entirely for the epidemiological differences in WBC levels between African Americans and European Americans [6] . By genotyping samples from two epidemiological cohorts—the Health Aging and Body Composition Study (Health ABC) and the Jackson Heart Study (JHS)—at a panel of markers that were extremely differentiated in frequency between Africans and Europeans, we identified an approximately 900 kilobase locus on chromosome 1 (99% credible interval of 155.46–156.36 Mb) where individuals with low WBC had increased African ancestry compared with the average in the genome.

Results

Merging Samples across the JHS, ARIC, and Health ABC Studies We pooled 6,005 African American samples from three cohort studies: the Jackson Heart Study (JHS), the Atherosclerotic Risk in Communities (ARIC) Study, and the Health, Aging and Body Composition (Health ABC) Study. For each sample, we required a high quality genome-wide admixture scan (Materials and Methods), a genotype at SNP rs2814778, body mass index (BMI), age, gender, and a full differential white blood cell count (with the exception that for Health ABC samples we did not require a measurement of bands). To explore correlations between the genetics and the phenotype, we first used the genotype at SNP rs2814778, which occurs at position 155,987,755 in Build 35 of the human genome reference sequence, within the 99% credible interval defined by our previous admixture mapping study [6]. This SNP is also known as the “FY+/−” or “Duffy” variant, and the FY− allele is very highly correlated to West African ancestry. For example, it is completely fixed in frequency in West African and European American samples from the International Haplotype Map [10] (although it is not completely fixed in larger sample sizes from these populations; see below). For Figure 1 and Tables 1 and 2, we used the genotype at rs2814778 as a surrogate for ancestry because the genotype can be conveniently read out as a discrete value (0, 1 or 2 copies) rather than as a continuous value, and is extraordinarily correlated to ancestry (r2>0.99). Later, we demonstrate that there is in fact a slightly stronger association to neutrophil count for rs2814778 than for ancestry, which is important in showing that the FY− allele at this polymorphism may actually be responsible for low neutrophil counts, and is not just in admixture linkage disequilibrium with the causal allele. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Relationship between ancestry and the distribution of neutrophil count. (A) Distribution of normally transformed absolute neutrophil count for the three classes of genotype at rs2814778. Individuals who are homozygous for the null allele have distinctly lower neutrophil count (−0.35±0.89 standard deviations compared with the mean) than individuals who are carriers for the functional allele (0.76±0.89). We were able to place constraints on the frequency of the high neutrophil count allele in (B) West Africans, and (C) European Americans by assuming that the observed distributions of neutrophil count for each ancestry class (which we marked in practice by the genotype at rs2814778) are a mixture of distributions specified by the underlying allele frequency. The results indicate a 99% probability that the frequency is <4.9% in Africans and also a 99% probability that the frequency is >95.2% in Europeans. https://doi.org/10.1371/journal.pgen.1000360.g001 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Comparison of phenotypic characteristics for the four sets of samples used in this study. https://doi.org/10.1371/journal.pgen.1000360.t001 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Effect of the chromosome 1 locus on white blood cell counts. https://doi.org/10.1371/journal.pgen.1000360.t002 To test for heterogeneity in the strength of the genetic association to WBC among the different sample sets that comprised our study, we divided the samples into four groups. There were 658 samples from the Health Aging and Body Composition Study (“Health ABC”), 1,969 samples from the JHS cohort only (after randomly dropping samples until there was only one from each pedigree; “JHS only”), 2,476 samples from the ARIC cohort only (“ARIC only”), and 902 samples that overlapped between JHS and ARIC (“JHS-ARIC overlap”). For the JHS-ARIC overlap samples, we averaged all phenotype measurements, taken an average of 14 years apart at the time the participant entered each study (and having a correlation coefficient of r2 = 0.37), to provide a more precise estimate of the phenotype than would be available from either measurement alone. Table 1 presents the characteristics of each of the groups of samples. We found that all sets of samples showed quantitatively similar associations to the chromosome 1 locus. In particular, for neutrophil count, individuals carrying at least one European-type (“FY+”) allele of rs2814778 had 1.58–1.65 times higher values, depending on the study, than individuals homozygous for African ancestry (FY−/−), a tight enough range that we decided to pool all four groups of samples for subsequent analyses. Despite the similar correlation of neutrophil count to local ancestry across studies, we observed that the correlation coefficient to “European carrier status” was significantly higher for JHS-ARIC overlap samples, ρ = 0.57, than for the samples for which only one measurement was made: ρ = 0.52 for JHS-only (P = 0.03 for a reduction) and ρ = 0.50 for ARIC-only (P = 0.002 for a reduction) (Table 1). This is likely to reflect a more accurate assessment of basal neutrophil count when it was measured twice and averaged over different environmental conditions (the baseline measurements in JHS and ARIC studies taken an average of 14 years apart) than when it was measured only once. In support of this hypothesis, the JHS-ARIC overlap samples contributed more per sample to the statistical signal than those measured in only one cohort: 28% more per sample on average, which we calculated by dividing the LOD score they contributed by the total number of samples.

Dominant Effect of European Ancestry on WBC Counts Combining all samples (n = 6,005 in total) and working with normally transformed cell counts for each white blood cell lineage (Materials and Methods) we explored how counts of total WBC and each of the 6 differential counts were associated with ancestry at the locus, using the genotype at rs2814778 as a surrogate for ancestry (Table 2). There was strong evidence that the allele at the locus that contributed to high white blood cell count had an almost purely dominant effect. As shown in Table 2, there was no significant difference in leukocyte counts between 247 African Americans with two copies of European ancestry at this locus (FY+/+) and 1,647 African Americans who were FY+/− (P>0.08 for WBC and all differential counts). By contrast, being FY+/+ or +/− (1,894 African Americans) vs. FY−/− (4,111 African Americans) was strongly associated to counts of all white blood cell types except bands (P<<10−4; Table 2). The dominant effect of European ancestry on white blood cell count is also visually apparent in Figure 1, which shows the distribution of neutrophil count for individuals grouped according to genotype at rs2814778. Persons carrying at least one FY+ allele had a distribution of neutrophil counts that was shifted by 1.3 standard deviations above that of persons who were FY−/− (this was extraordinarily statistically significant: Z = 49.7). By contrast, there was no significant difference between individuals who carried either one or two FY+ alleles (Z = 0.6). For further analysis, we pooled individuals who were carriers of the FY+ allele at this locus.

Low Neutrophil Count Is the Main Phenotype Underlying the WBC Association The differential white blood cell count that was most significantly associated with ancestry was absolute neutrophil count (calculated as total WBC multiplied by the percentage of neutrophils). The correlation (ρ) of normally transformed absolute neutrophil count to carrier status for the FY+ allele was 0.519, which was higher than that of the general WBC phenotype originally mapped to the locus [6] (ρ = 0.458). In the 952 African Americans who had absolute neutrophil counts at least 1 s.d. below the mean (roughly <1,800 /mm3), the proportion of FY+ allele carriers was reduced by more than an order of magnitude compared with the genome wide average. Neutrophil count was responsible for the vast majority of WBC association at the locus. After controlling for neutrophil count in a regression analysis, only monocyte count (ρ = 0.025, P = 0.05) and basophil count (ρ = −0.034, P = 0.009) remained nominally associated, and these associations were not significant after correcting for the 6 hypotheses tested (Table 2). The weak evidence of association to monocyte and basophil counts may reflect a real effect, or may be a false-positive due to multiple hypothesis testing. It is also possible that the result may be an experimental artifact related to the Coulter Counter technology used to measure differential WBC. In these measurements, the positions of monocytes and basophils were near those of neutrophils in the plots used for cell classification. Even a small amount of misclassification among neutrophils, monocytes, and basophils (a couple of percent) could cause their counts to be artifactually correlated, contributing to the signals we observe in the context of measurements in large sample sizes. Since neutrophil count appears to drive at least the great majority of association, we focused on this WBC phenotype for all further analysis

Epidemiological Impact of the Chromosome 1 Locus on Neutrophil Count To assess whether the higher neutrophil count observed in European Americans compared with African Americans can be entirely accounted for by ancestry at the chromosome 1 locus, as is the case with total WBC [6], we examined samples from the Health ABC study (1,331 European Americans and 658 African Americans). Among African Americans who could be classified with confidence as carrying at least one chromosome of European ancestry at the locus, the absolute neutrophil count did not differ from that of European Americans (P = 0.99). Thus, genetic variation at the chromosome 1 locus was sufficient to account for the entire epidemiological difference across these populations. The predictive effect of ancestry at the chromosome 1 locus was profound. Carrier status for the European-type (FY+) allele at the rs2814778 variant predicted 26.95% of the variance in normally transformed neutrophil count, which was far more than the 3.37% predicted by genome-wide European ancestry proportion. After controlling for rs2814778 genotype, there was no longer any association to genome-wide European ancestry. Similarly, after controlling for rs2814778 genotype, BMI and gender only predicted 0.79% and 0.14% of variance in neutrophil count respectively, while smoking (analyzed in JHS only) only predicted 0.8% of the variance. Age was not significantly associated to neutrophil count in our data (P = 0.25). We did not analyze other phenotypes like hypertension and coronary artery disease status for their correlation to neutrophil count. Because of the relatively weak contributions of all the non-genetic predictors we analyzed, we focused subsequent analyses on genotype at the chromosome 1 locus uncorrected for covariates.

The Causal Allele must be >91% Differentiated between Africans and Europeans We were able to place strong constraints on the frequency of the variant affecting neutrophil count by analyzing the distributions of neutrophil count for individuals with 0, 1 and 2 copies of European ancestry at the chromosome 1 locus, which in practice we marked by the genotype at rs2814778. The analysis in Figure 1A provides strong evidence of a dominant allele of European origin contributing to high neutrophil count. We modeled the frequency of the variant that causes high neutrophil count by defining 6 parameters. The frequency of this variant in Africans was specified as P A and its frequency in Europeans as P E . Individuals who were homozygous for the other allele were assumed to have a normal distribution of neutrophil count with mean μ L and standard deviation σ L , and carriers of the “high neutrophil” allele were assumed to have a normal distribution of neutrophil count with mean μ H and standard deviation σ H . Studying a grid of values of P A (Figure 1B), and another grid of values of P E (Figure 1C), we found the combination of the remaining variables that provided the best fit to the data, as assessed by a chi-square goodness-of-fit statistic. Given each set of 6 model parameters, we calculated a likelihood of the data for all 6,005 individuals. This resulted in a marginal likelihood surface for P A (Figure 1B) and P E (Figure 1C), which we used to place constraints on these parameters. Fitting this 6-parameter model to the data, we inferred that the frequency of the allele contributing to high neutrophil counts was <4.9% in Africans and >95.2% in Europeans (Figure 1B, C), and that the difference in frequency between populations was >91.9%. Compared to 3.54 million autosomal SNPs in the November 2006 Phase2 HapMap data set [10], there were only 115 SNPs with a frequency differentiation at least this extreme, and only one in the region of admixture association: the SNP rs2814778 (at position 155.99 Mb), the same SNP we used as a marker of ancestry. This variant already has a known phenotype—susceptibility to Plasmodium vivax malaria—but it had not been hypothesized to be associated with low white blood cell count until it was found to lie within this locus [6]. While rs2814778 is a plausible candidate, the locus we described previously [6] spans 900 kb, and there could in principle be other variants within this span—unreported in the literature or in genome variation databases—that have a high enough frequency differentiation to explain the signal. In what follows, we present additional lines of evidence to rule out the great majority of sites other than rs2814778 as consistent with explaining the signal.

rs2814778 Is Significantly More Predictive of Neutrophil Count than Is Ancestry We exploited the large sample size (6,005 individuals) to test whether the rs2814778 variant predicted low neutrophil count more than would be expected from the association to ancestry [6]. This is a difficult problem since the genotype at this SNP is highly correlated to ancestry. By using the ANCESTRYMAP software and the data from all 6,005 African Americans, we estimated that the frequency of FY+ allele at rs2814778 is 0.2±0.1% in Africans and 99.3±0.4% in Europeans (this frequency distribution is consistent with the allele frequencies inferred for the causal allele based on modeling of neutrophil counts in Figures 1B and 1C). Thus, if rs2814778 is the causal variant, there should be a small handful of individuals for whom the genotype at rs2814778 is discrepant with ancestry, who will be informative for our analyses. To estimate the number of individuals who we expect to be informative for testing association of rs2814778 above and beyond ancestry, we used the fact that the cohort has 18.2% European ancestry on average (Table 2). Thus, we expected there to be about 13 individuals who are homozygous for the Duffy null allele at rs2814778 but heterozygous for European ancestry: 13 = (6005)×(2×18.2%×81.8%)×(0.7%). Similarly, we expected there to be about 8 individuals who are heterozygous at rs2814778 but homozygous for local African ancestry: 8 = (6005)×(81.8%×81.8%)×(0.2%). To test for association to rs2814778 above and beyond ancestry, we first obtained estimates of European ancestry at the position of the SNP using the ANCESTRYMAP software [11]. We included rs2814778 in the ancestry estimation so that we could explicitly test whether the genotype at this SNP alone was more predictive of neutrophil count than this SNP plus flanking markers. This would be evidence that it was more associated than African ancestry itself. Our power to detect a signal was highest for JHS samples, which were genotyped at a high density at the chromosome 1 locus. Consistent with this observation, the 7 samples for which we could state with >50% confidence that the local ancestry was discrepant with the expectation from the rs2814778 genotype were all from JHS. We performed three regression analyses (Table 4) to explore whether rs2814778 or ancestry status at the chromosome 1 locus was a better predictor of neutrophil count. (a) First, we obtained a χ2 statistic for association of carrier status for the rs2814778 FY+ allele to neutrophil count; (b) second, we obtained a χ2 statistic for association of carrier status for European ancestry to neutrophil count (using the rs2814778 genotype in the estimate); and (c) third, we obtained a χ2 statistic for association of both predictors together. We found that there was a significant difference between the strength of association of ancestry alone and ancestry and genotype together: (c)-(b) = 15.7 (P = 3.8×10−5). Testing for the reverse effect of ancestry above and beyond the genotype of rs2814778 produced no signal: (c)-(a) = 0.4 (P = 0.74). These results confirm that rs2814778 is predictive of neutrophil count, above and beyond the effect of ancestry. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 4. Reduced neutrophil count is more associated to the Duffy null polymorphism than to ancestry. https://doi.org/10.1371/journal.pgen.1000360.t004

A Fine-Mapping Scan Fails to Find Additional Signals of Association To search for additional alleles in the admixture peak that might be associated to neutrophil count beyond the main effect, we genotyped a dense panel of 193 SNPs across the region in 148 individuals with low neutrophil count (<−0.7 standard deviations below the mean) and 74 individuals with high neutrophil count (1.3–2.8 standard deviations above the mean). We chose only individuals for whom we were >99% confident of all African ancestry at the locus, based on genotyping information at flanking markers excluding rs2814778, so that ancestry would not be a confounder of the analysis. We genotyped these individuals for a set of SNPs chosen using Tagger [12] to capture the great majority of common variation across the admixture peak in both West Africans and Europeans (Materials and Methods). After the genotyping was complete, we had captured 94% of SNPs of >5% minor allele frequency in West Africans, and 96% of SNPs of >5% minor allele frequency in European Americans, both at a correlation of r2>0.8 (Figure 3B). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Fine mapping reveals rs2814778 as the only significant association. (A) Results of case-control association analysis for 193 SNPs genotyped in 148 individuals with low neutrophil count (<2,100/mm3), which we compared with 74 controls with high neutrophil count (5,000–9,000/mm3). All samples were selected to have a confident estimate of all African ancestry at the chromosome 1 locus (>99% probability) based on ANCESTRYMAP analysis at flanking markers outside the admixture peak. (B) HapMap SNPs of >5% minor allele frequency are well captured by this genotyping. We find that 94% of West African SNPs and 96% of European American SNPs are correlated with r2>0.8 to one of the SNPs we genotyped. https://doi.org/10.1371/journal.pgen.1000360.g003 Case-control association analysis of these 193 SNPs identified only one, rs2814778, that was significantly associated (nominal P = 2.1×10−5; Figure 3A) after a Bonferroni correction for 193 multiple hypothesis tests. Thus, there was no evidence of any allele in the region that is associated to neutrophil count beyond the effect that is already captured by rs2814778.

Genotyping of rs2814778 in >10,000 European Americans with a Neutrophil Count We genotyped 10,062 self-identified European Americans in the ARIC study for rs2814778, searching for a decreased neutrophil count in association with the null allele. This analysis should have little power if the European American population is in Hardy-Weinberg equilibrium, since FY−/− homozygotes are expected to occur very rarely among Europeans: less than 1/10,000 based on the observed frequency of the null allele in this population (0.34 = 10,062×0.58%×0.58%). Interestingly, we observed 7 European Americans with FY−/− genotypes, a significant excess compared with expectation (P<4×10−9) suggesting that European Americans harbor population substructure with variable levels of African ancestry. Among the FY−/− homozygotes we found a non-significant reduction in WBC associated with the null allele: WBC was observed to be 5.9±2.6 for the 7 FY−/− homozygotes, 5.9±1.8 for the 103 FY+/− heterozygotes, and 6.3±1.9 for the 9,952 FY+/+ homozygotes (P = 0.06 with an additive model and P = 0.35 with a dominant model using 1-sided tests). Genotyping of rs2814778 in 1,339 self-identified European Americans from the Health ABC study identified 26 heterozygous individuals, and none homozygous for FY−/−.