Study sample and ordinal phenotypes

Our study sample is part of the CANDELA cohort collected in Latin America13 (Supplementary Table 1). Using facial photographs of 6,275 individuals, we assessed 14 facial features on an ordered categorical scale reflecting the distinctiveness of each trait (Fig. 1, Supplementary Table 2). We included features of the lower face: chin shape, chin protrusion and upper/lower lip thickness; the middle face: cheekbone protrusion, breadth of nasal root, bridge and wing, columella inclination, nose protrusion, nose profile and nose tip shape; and the upper face: brow-ridge protrusion and forehead profile. These features were selected based on their documented variation in Europeans5. We found them to be reliably scored (Supplementary Table 3) and to also show extensive variation in the CANDELA sample (Supplementary Fig. 1). Individuals were genotyped on Illumina’s OmniExpress BeadChip and imputation performed using 1000 Genomes data. After quality-control filters, final analyses were carried out on 671,038 genotyped single-nucleotide polymorphisms (SNPs) and 9,117,642 imputed SNPs in 5,958 individuals. On the basis of the genome-wide SNP data, average autosomal admixture proportions for the full sample were estimated as: 50% European, 45% Native American and 5% African (Supplementary Fig. 2).

Figure 1: Overview of GWAS for facial features in the CANDELA sample. We first carried out a GWAS using data for 14 ordinal facial features from the lower, middle and upper face in 5,958 individuals. For follow-up, we obtained quantitative proxies for 9 of the 14 ordinal traits initially examined (and also obtained a measure of nasion position) in a subset of 2,955 individuals, and performed another GWAS. For convenience, we summarize results across traits on a single ‘composite’ Manhattan plot shown at the bottom of the figure (ordinal traits on the left and quantitative traits on the right). Each Manhattan plot displays all the SNPs with P values exceeding thresholds for genome-wide suggestive (10−5, blue line) or genome-wide significance (5 × 10−8, red line) for any trait. To avoid cluttering the figure, P values not reaching the suggestive threshold (that is, whose significance can be disregarded) are shown only for one trait (upper lip thickness). The names of the candidate genes closest to each association peak are provided (Table 1). These genes are connected with the list of associated facial features via lines of different colour. The location of these features is illustrated on the face drawings shown at the top of the figure. Face drawings were prepared by Emiliano Bellini. PAR, pseudo-autosomal region. Full size image

Significant correlations were observed between the ordinal phenotypes (using a Bonferroni-adjusted permutation P value threshold for significance of 6 × 10−4, Supplementary Table 4A). Strongest correlation was observed between upper and lower lip thickness (r=0.72), followed by forehead profile and brow ridge protrusion (r=0.57). The three traits related to nose width (root, bridge and wing breadth) show positive correlations among them (r=0.16–0.37) and negative correlations with nose protrusion (r=−0.08 to −0.25). Several of the facial traits examined also show moderate (and significant) correlations with age, sex, body mass index (BMI) and genetic ancestry (Supplementary Table 4B). The strongest correlation with sex was seen for brow ridge protrusion and forehead profile (r=−0.62 and r=−0.47, respectively). Age correlates most strongly with upper and lower lip thickness (r=−0.19 and r=−0.24, respectively), while the strongest correlation for BMI was seen with brow-ridge protrusion (r=0.17). Genetic ancestry has strongest correlation with lip thickness (European ancestry being negatively correlated with upper and lower lip thickness, r=−0.25 and r=−0.16, respectively). European ancestry is also significantly correlated with all the nose features examined, particularly with nose protrusion (r=0.18) and nose wing breadth (r=−0.15). On the basis of a kinship matrix derived from the SNP data14, we estimated narrow-sense heritability for the facial traits using GCTA15. We found moderate (and significant) values for all traits, with the highest heritability being estimated for nose protrusion (0.47) and the lowest for columella inclination (0.20; Supplementary Table 5). Similar (or higher) heritabilities have been estimated for a range of facial traits using family data7,8,16.

GWAS for ordinal phenotypes

We performed genome-wide association tests using multivariate linear regression, as implemented in PLINK17, using an additive genetic model adjusting for: age, sex, BMI and the first five principal components (PCs, Supplementary Fig. 3) computed from the SNP data. The resulting statistics showed no evidence of residual population stratification for any of the traits (Supplementary Fig. 3). Three of the nose traits examined (columella inclination, nose bridge and wing breadth) showed genome-wide significant association (P values<5 × 10−8) with SNPs in at least one genomic region (Fig. 1, Table 1). Columella inclination and nose bridge breadth show association with SNPs in a single region (4q31 and 6p21, respectively), while nose wing breadth shows association with SNPs in two genomic regions (7p13 and 20p11). To account for the multiple phenotypes tested, we performed a global false-discovery rate test across all traits and SNPs and identified the same significantly associated regions (Supplementary Table 6). We examined association for each index SNP (the variant with the lowest P value in a chromosomal region; Table 1) in all countries sampled separately and combined results as a meta-analysis using METAL (Supplementary Table 7) (ref. 18). For all associations, significant effects were in the same direction in all countries, the variability of effect size across countries reflecting sample size (Fig. 2). There was no significant effect size heterogeneity across countries for any of the associations. To exploit the correlations observed between various facial traits, we performed a multivariate GWAS19, but this approach did not identify any additional associated regions (Supplementary Table 8).

Table 1 Properties of index SNPs in chromosomal regions showing genome-wide significant association to ordinal facial traits. Full size table

Figure 2: Effect sizes (regression coefficients) for the derived allele at index SNPs in the genome regions associated with ordinal face traits. (a) 4q31 rs12644248, (b) 6p21 rs1852985, (c) 7p13 rs17640804, (d) 20p11 rs927833. Estimates obtained in each country are shown as blue boxes. Red boxes indicate estimates obtained in the meta-analysis. Box size is proportional to sample size. Horizontal bars indicate confidence intervals representing 2 × standard errors. Intervals that include zero (that is, non-significant effects) are shown in light blue. Full size image

Follow-up analyses

Subsequent to the GWAS described above, we obtained data from an additional set of 501 individuals from the same countries as for the GWAS and used this as a replication sample (descriptive features of this sample are presented in Supplementary Fig. 4). These individuals were phenotyped and genotyped as for the GWAS sample. Association tests for the four index SNPs in Table 1 were performed using the same regression model as for the GWAS, with a Bonferroni-adjusted threshold for significance of 0.05/4=0.0125. All tests were found to be significant in this replication sample (Table 1).

We also followed-up the ordinal facial trait GWAS by obtaining facial measurements (distances and angles) related to the ordinal traits initially examined and performing a GWAS on these quantitative data. These measurements were obtained mainly using three-dimensional (3D) anatomical landmark coordinates available for 2,955 of the individuals included in the ordinal trait GWAS20 (Supplementary Fig. 5a). These landmarks allowed us to define quantitative proxies for seven of the ordinal facial traits, the other traits having no appropriate 3D landmarks allowing related measurements to be obtained (Supplementary Table 9). Since the ordinal assessment of nose root and bridge breadth produced genome-wide significant associations (but could not be measured with the 3D landmarks available), we carried out 2D landmarking of the frontal photographs of these 2,955 individuals and also obtained measurements for these two traits (Supplementary Table 10, Supplementary Fig. 5b). In addition, we used the 3D landmark coordinates to obtain a measure of nasion position so as to evaluate in our sample the reported association of this feature with SNPs in the PAX3 gene region9,11.

The ordinal variables showed a moderate-to-high (and significant) correlation with the quantitative variables (all permutation P values<0.0005; Supplementary Table 11 and Supplementary Fig. 6). Correlation between ordinal and quantitative traits was strongest for nose wing breadth and lower lip thickness (both with r=0.70) and lowest for columella inclination (r=0.16). The pattern of correlation among quantitative traits was similar to that observed for the ordinal traits, as was the correlation between quantitative traits and covariates (Supplementary Table 12). As expected for continuous variables, heritability estimates based on the quantitative phenotypes (Supplementary Table 13) are higher than obtained for the ordinal phenotypes and more in line with published estimates7,8,16.

As before, we performed a GWAS for the quantitative traits using an additive multivariate regression model adjusting for age, sex, BMI and the first five PCs. We replicated the reported association of nasion position with SNPs in 2q35 overlapping the PAX3 gene region, with strongest association seen for rs7559271 (P value of 4 × 10−11, Fig. 1, Table 2, Supplementary Fig. 7a). This is the same SNP producing strongest association in the Paternoster et al.11 GWAS. In addition, we observed genome-wide significant association for six of the nine quantitative proxies of the ordinal traits initially examined (Fig. 1, Table 2). As for the ordinal assessments, the quantitative analysis of columella inclination, nose bridge breadth and nose wing breadth produced genome-wide significant associations with SNPs in 4q31, 6p21 and 7p13, respectively (Fig. 1, Tables 1 and 2). In addition, the 4q31 region also showed genome-wide significant association to two other measurements related to nose morphology: nose protrusion and nose tip angle, with strongest P values for SNPs rs2045323 of 1 × 10−8 and 2 × 10−8, respectively. SNPs in 4q31 produced small but not genome-wide significant P values in the ordinal assessment of nose protrusion and nose tip angle (strongest P values of 4 × 10−4 and 3 × 10−4, respectively). The 20p11 region, showing genome-wide significant association in the ordinal assessment of nose wing breadth, showed genome-wide suggestive association in the quantitative trait GWAS (strongest P value of 6 × 10−7 for SNP rs927833). Other than reproducing the associations detected with ordinal traits, the quantitative analyses detected a genome-wide significant association to chin protrusion for markers in 2q12 (strongest P value of 4 × 10−10, for rs3827760; Fig. 1 and Table 2). This marker had an association P value of 1 × 10−4 in the ordinal assessment of chin protrusion.

Table 2 Properties of index SNPs in regions showing genome-wide significant association to quantitative facial traits. Full size table

A regression model similar to the one used in the GWAS analyses explains up to ∼30% of the phenotypic variation for the traits with significant SNP associations, with each of the associated SNPs explaining about 1% of variation in the trait (Tables 1 and 2, Supplementary Table 14). The estimates of trait variance explained by associated SNPs are similar to those calculated for other anthropometric traits and are very close to the estimates obtained in a previous GWAS for facial features11.

To assess independent evidence of association for the regions implicated here, we examined SNPs that produced at least genome-wide suggestive P values in the two GWAS for facial features that have been published9,11. We found that SNP rs2108166, 5.5 kb from and in high LD (r2=0.77, D′=1) with the index SNP of the 7p13 region we found associated with nose wing breadth (rs17640804), produced an association P value of 5 × 10−7 with the same trait in the study of Liu et al.9 In addition, evidence of association between rs3827760 and chin shape has recently been reported in a candidate gene study of a Central Asian population21.

It has been suggested that gene regions associated with non-syndromic cleft lip and palate (NSCL/P) might impact on normal variation in facial morphology9,22. Although the regions reported to be associated with NSCL/P do not overlap with those identified here, we selected index SNPs in each NSCL/P region and tested for association of these SNPs with the facial traits that we examined (Supplementary Table 15). Few tests survived Bonferroni correction, mostly involving SNPs associated with quantitative nose-breadth traits (nose root, nose bridge and nose wing breadth; Supplementary Table 15A). A global one-sided Kolmogorov–Smirnoff test was significant both for ordinal and quantitative traits (P value ∼10−3; Supplementary Table 15B) and a polygenic risk score test combining all 15 index SNPs was significant for the nose-breadth traits (Supplementary Table 15C). A more precise evaluation of the impact of NSCL/P-associated variants on facial variation in the general population requires further investigation.

Candidate genes in regions associated with facial morphology

SNPs in 2q12 associated with chin protrusion show extensive LD and overlap the 3′-half of the EctodysplasinA (EDA) receptor gene (EDAR; Fig. 3a). The derived G allele at the index SNP in this region (rs3827760) encodes a functional substitution in the intracellular death domain of EDAR (370A) and is associated with reduced chin protrusion (Table 2). EDAR is part of the EDA signalling pathway (comprising EDA, EDAR and EDARADD (the EDAR-binding death domain adaptor protein)) which specifies prenatally the location, size and shape of ectodermal appendages (such as hair follicles, teeth and glands)23. The death domain has been shown to be involved in the interaction of EDAR with EDARADD, the 370A form having higher activity than the ancestral variant24. The G allele at rs3827760 is not present in Europeans and Africans but is seen at high frequency in East Asians and is essentially fixed in Native Americans (Table 3). This SNP has been associated in East Asians with characteristic tooth morphologies, hair type and sweat gland density25,26,27. Recently, we showed, in the same study sample examined here, that rs3827760 impacts on aspects of pinna morphology, including: lobe size and attachment, ear protrusion and helix rolling12. Mutations in the EDA pathway cause hypohidrotic ectodermal dysplasia28. This disorder is characterized by a reduced number of sweat glands, oligodontia, decrease in the amount of hair and facial dysmorphia, including a markedly protrusive chin29.

Figure 3: Genomic regions showing genome-wide significant association to face traits. For each facial feature we show the results that achieved strongest statistical significance regardless of the type of variable analysed (ordinal, O; or quantitative, Q). (a) 2q12 (Q), (b) 4q31 (O), (c) 4q31 (Q), (d) 6p21 (O), (e) 7p13(Q), (f) 20p11 (O). Plots not shown here are shown in Supplementary Fig. 7. Association results (on a −log 10 P scale; left y-axis) are shown for SNPs ∼500 kb on either side of the index SNP (purple diamond; Table 1) with the marker (dot) colour indicating the strength of LD (r2) between the index SNP and that SNP in the 1000 genomes AMR data set. Local recombination rate in the AMR data is shown as a continuous blue line (scale on the right y-axis). Genes in each region, their intron–exon structure, direction of transcription and genomic coordinates (in Mb, using the NCBI human genome sequence, Build 37, as reference) are shown at the bottom. Plots were produced with LocusZoom68. Below each region we also show an LD heatmap (using r2, ranging from red indicating r2=1 to white indicating r2=0) produced using a MATLAB59 implementation similar to Haploview69. Full size image

Table 3 Population frequency of derived alleles at index SNPs associated with facial features in the CANDELA sample. Full size table

Mouse Edar mutant and transgenic lines with either abolished or increased expression of Edar have been described and these mice show features related to several of the phenotypes associated with EDAR in humans12,30,31. Of particular interest, we recently documented that these mice show changes in ear morphology consistent with the effects of EDAR on human ear shape variation12. We therefore compared mandible length in Edar wild-type mice with EdardlJ and EdarTg951 mutant mice (Supplementary Figs 8 and 9), which have a loss and a gain of Edar function, respectively31,32. We found a significant association of mandible length with genotype, with the length decreasing at greater Edar function, consistent with the association of the 370A variant with decreased chin protrusion detected in the CANDELA sample (Fig. 4, Supplementary Table 16). Consistent with the mandible length changes we detect in Edar mutant lines, it has been reported that Eda mouse mutants also show mandibular morphology alterations33. The impact of the Eda pathway on mandibular morphology has been interpreted as resulting from epithelial–mesenchymal interactions during mouse craniofacial development33.

Figure 4: Effect of Edar genotype on mouse mandible length. We show boxplots of mandible length (y-axis) in mice with different Edar genotypes (x-axis). The measure of mandible length shown is the projected distance between head landmarks 5 and 10 (Supplementary Figs 8 and 9). Regression analysis indicates a significant effect of Edar genotype on mandible length (P value 1.7 × 10−4). Significant results were also obtained for other measurements of mandible length (Supplementary Table 16). Boxplot whiskers extend to data points within 1.5 times the interquartile range on both sides. The numbers in parenthesis below genotypic categories refer to the number of mice examined for each genotype. Full size image

SNPs in the 4q31 region with P values above the suggestive association threshold in the ordinal trait assessment of columella inclination extend over ∼400 kb from the 3′-half of the Dachsous Cadherin-Related 2 gene (DCHS2) into the DCHS2–SFRP2 (Secreted Frizzled-related protein 2) intergenic region (Fig. 3b), with strongest association seen for SNP rs12644248 within DCHS2 (P value 7 × 10−9). Noticeably, although association analyses based on the quantitative assessment of columella inclination also show genome-wide significant association for rs12644248 (P value of 4 × 10−8), the quantitative analyses show that SNPs in the DCHS2–SFRP2 intergenic region have an even stronger association, peaking at rs2045323 (P value of 3 × 10−9, Table 2, Fig. 3c). A similar pattern of association is seen for the quantitative assessments of nose protrusion and nose tip angle, with strongest association for both traits being observed for rs2045323 (P values of 1 × 10−8 and 2 × 10−8, respectively, Table 2, Supplementary Fig. 7), association with rs12644248 only exceeding the genome-wide suggestive threshold (P values of 8 × 10−6 and of 6 × 10−6 for nose protrusion and nose tip angle, respectively). SNP rs2045323 is not in strong LD with rs12644248 and tests conditioned on either SNP attenuate the signal of association at the other SNP but do not abolish it entirely (Supplementary Fig. 10). These observations suggest that the signal of association around rs2045323 in the DCHS2–SFRP2 intergenic region is somewhat independent from that peaking at rs12644248 within DCHS2. Intergenic SNP rs2045323 is located in an evolutionarily conserved region (Supplementary Fig. 11), suggesting that this SNP could play a role in the regulation of genes in the region. DCHS2 is a calcium-dependent cell-adhesion protein which has recently been shown to participate in a regulatory network controlling cartilage differentiation and polarity during vertebrate craniofacial development34. This network includes SOX9, a well-known regulator of cartilage differentiation, mutations of which lead in humans to Campomelic Dysplasia (OMIM #114290) a disorder characterized by a range of craniofacial defects. Although DCHS2 seems the strongest candidate in the 4q31 region, SFRP2 is also an interesting candidate, in that it has been shown that this gene is expressed in osteoblasts, participates in the regulation of Wnt signaling35 and craniofacial malformations have been reported in Sfrp2 mutant mice36.

The 6p21.1 region associated with nose bridge breadth extends across ∼500 kb overlapping the suppressor of Ty 3 homologue (S. cerevisiae; SUPT3H) gene and the 5′-half of the Runt-related transcription factor 2 (RUNX2) gene (Fig. 3d). Strongest association is seen for SNPs in the region of SUPT3H/RUNX2 overlap, peaking at SNP rs1852985 for both the ordinal and the quantitative assessment of nose bridge breadth (Fig. 3d, Supplementary Fig. 7). This region is known to contain key RUNX2 regulatory elements37 (Supplementary Fig. 12). Rare mutations in RUNX2 cause Cleidocranial dysplasia, an autosomal dominant disorder involving alterations of cranial ossification (OMIM #119600). Runx2 has been shown to participate in the differentiation of mouse osteoblasts, chondrocyte and mesenchymal stem cells and bone development38, null Runx2 mutants showing a range of chondrocyte proliferation and maturation defects39. Interestingly, the length of a functional glutamine/alanine repeat in RUNX2 has been shown to correlate strongly with the evolution of facial length in dog breeds and, more broadly, in Carnivora40.

SNPs in the 7p13 region associated with nose wing breadth extend over ∼80 kb within the third intron of the GLI Family Zinc-Finger 3 gene (GLI3; Fig. 3e), a DNA-binding transcription factor. Strongest association for both the ordinal and quantitative assessments of nose wing breadth is observed for SNP rs17640804 (Tables 1 and 2, Fig. 3e, Supplementary Fig. 7), located in a genomic region with strong evolutionary conservation (Supplementary Fig. 13). Chromatin immunoprecipitation experiments have shown that rs17640804 can affect the binding of regulatory proteins41. GLI3 is known to act both as activator and repressor in the sonic hedgehog signalling pathway, a key regulatory of chondrocyte differentiation42. Interestingly, it has been shown experimentally that Gli3 interacts with Runx2 in the regulation of mouse osteoblast differentiation43. We therefore tested for statistical interaction between the GLI3 and RUNX2 index SNPs on nose bridge breadth and found it to be significant (P value=0.004, Supplementary Table 17), even though the GLI3 index SNP by itself does not have a significant effect on nose bridge breadth. Mutations in GLI3 have been shown to cause several Mendelian disorders associated with craniofacial and limb abnormalities, including GCPS (Greig cephalopolysyndactyly syndrome). GCPS is characterized by a range of craniofacial abnormalities including a broad nose44. A mouse null Gli3 mutant has been reported to show a range of craniofacial abnormalities, including a wider nose45.

Strongest association in 20p11 with the ordinal assessment of nose wing breadth was observed for SNP rs927833 located in LOC100270679, a long intergenic non-protein coding RNA (LINC01432). There is substantial LD around this SNP and suggestive evidence of association (that is, P values <10−5), for SNPs over a region of ∼400 kb extending to the Paired-box gene 1 (PAX1; Fig. 3f), a strong candidate gene in this region. PAX1 is a key developmental transcription factor which has been shown experimentally to affect chondrocyte differentiation through its participation in a regulatory pathway that also includes RUNX2 and SOX9 (ref. 46). More broadly, a Pax-Six-Eya-Dach (Dachshund) network, involving protein–protein and protein–DNA interactions impacting on a range of basic developmental processes has been described47. As indicated above, another PAX gene (PAX3) has been twice reported to impact on nasion position9,11, and we replicate that association here. A missense mutation in PAX1 has been shown to cause autosomal recessive oto-facio-cervical syndrome, a disorder characterized by various skeletal and facial abnormalities48. It has also been reported that mouse embryos with Gli3-null mutations display drastically reduced Pax1 expression, possibly mediated through Gli3’s involvement in the sonic hedgehog signalling pathway49. Consistent with these experimental findings, we observe a significant statistical interaction of the GLI3 and PAX1 index SNPs on nose wing breadth (P value=0.005, Supplementary Table 17).