Abstract Facial attractiveness is a complex human trait of great interest in both academia and industry. Literature on sociological and phenotypic factors associated with facial attractiveness is rich, but its genetic basis is poorly understood. In this paper, we conducted a genome-wide association study to discover genetic variants associated with facial attractiveness using 4,383 samples in the Wisconsin Longitudinal Study. We identified two genome-wide significant loci, highlighted a handful of candidate genes, and demonstrated enrichment for heritability in human tissues involved in reproduction and hormone synthesis. Additionally, facial attractiveness showed strong and negative genetic correlations with BMI in females and with blood lipids in males. Our analysis also suggested sex-specific selection pressure on variants associated with lower male attractiveness. These results revealed sex-specific genetic architecture of facial attractiveness and provided fundamental new insights into its genetic basis.

Author summary Facial attractiveness is a complex human trait well integrated into people’s daily life experience with profound influence on human behavior. Despite being widely studied in sociology, psychology, and related fields, its genetic basis remains poorly understood. Using carefully-measured facial attractiveness and dense genotyping data from the Wisconsin Longitudinal Study, we identified novel genes for facial attractiveness, assessed the selection signature, and dissected the shared genetic architecture between facial attractiveness and various human traits. Interestingly, sex-specific genetic architecture of facial attractiveness was a recurrent pattern observed in almost all our analyses. Our results provided new insights into the genetic basis of facial attractiveness and have broad implications for the complex relationships between attractiveness and various human traits.

Citation: Hu B, Shen N, Li JJ, Kang H, Hong J, Fletcher J, et al. (2019) Genome-wide association study reveals sex-specific genetic architecture of facial attractiveness. PLoS Genet 15(4): e1007973. https://doi.org/10.1371/journal.pgen.1007973 Editor: Seth M. Weinberg, University of Pittsburgh School of Dental Medicine, UNITED STATES Received: September 10, 2018; Accepted: January 17, 2019; Published: April 4, 2019 Copyright: © 2019 Hu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Genotype data from WLS are available to the research community through the dbGaP controlled-access repository at accession phs001157.v1.p1. Phenotypic data in WLS can be accessed via the WLS data portal (https://www.ssc.wisc.edu/wlsresearch/). Summary statistics for facial attractiveness are available at (ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/Attractiveness). Funding: This study was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist

Introduction Facial attractiveness is a complex human trait of great interest in sociology, psychology, and related fields due to its profound influence on human behavior. Although variability exists across individuals and cultures, it has been suggested that some commonly agreed cues are used by people everywhere to judge facial beauty [1, 2]. As a trait that is well integrated into people’s daily life experience, it is unsurprising that facial attractiveness influences a variety of sociological outcomes. Studies have suggested that facial attractiveness is associated with job-related outcomes [3–6], academic performance [7], and economic mobility [8]. It affects human psychological adaptations and serves as an important influence of mate preference and reproductive success [9–14]. Even attractive babies receive more nurturing from their mothers than unattractive babies [15]. Further, people all over the world highly prize beauty. The annual revenue of the cosmetic industry is around 18 billion dollars in the US [16]. Fashion and beauty dominate daily discussions on traditional media as well as social media posts. Understanding the perception of attractiveness is a great interest in both academia and industry. The genetic basis of facial attractiveness may provide new and mechanistic insights into this complex human trait. Evidence suggested that attractiveness is heritable and genetic variations explain a substantial fraction of its variability [17, 18]. However, no genetic variant or gene underlying the biology of facial attractiveness has been identified to date. Our understanding of its genetic architecture is certainly far from complete. In this study, we utilized data from the Wisconsin Longitudinal Study (WLS), a longitudinal study of a 1/3 random sample of over ten thousand Wisconsin high school graduates in 1957. Facial attractiveness in WLS was measured by 12 coders using an 11-point rating scale based on each individual’s 1957 high school yearbook photo. These well-characterized facial attractiveness data from WLS have expanded our knowledge on the complex relationship between facial attractiveness and various sociological traits including educational aspirations and occupation [19–22]. Recently, dense genotype data have been made available in WLS. These advances make it possible for the first time to identify specific genetic components associated with facial attractiveness and probe its genetic architecture. We performed a genome-wide association study (GWAS) on 4,383 samples from WLS to identify single-nucleotide polymorphisms (SNPs) associated with facial attractiveness. In addition, sex-stratified analyses suggested distinct genetic architecture between the perception of male and female attractiveness. Integrated analysis of GWAS results and transcriptomic and epigenetic functional annotations also provided mechanistic insights into how genetics may influence facial attractiveness.

Discussion Despite tremendous interests from both academia and industry, the genetic basis of facial attractiveness is poorly understood, partly due to the scarcity of well-phenotyped facial attractiveness in large-scale cohorts with genetic information. Carefully-measured human facial attractiveness, in conjunction with dense genotype data in WLS, made it possible to identify specific genetic components for facial attractiveness. In this paper, we conducted a GWAS to identify DNA variants associated with human facial attractiveness. We identified two genome-wide significant loci on 10q11.22 and 2p22.2 and highlighted several genes via eQTL analysis and transcriptome-wide association analysis. Human tissues involved in reproduction and hormone production were implicated in heritability enrichment analysis. Additionally, we identified evidence for shared genetics between attractiveness and other complex traits. Top SNPs for attractiveness were enriched for associations with dermatological traits related to skin and hair pigmentation. Via a genome-wide genetic covariance estimation approach, we identified strong evidence for shared genetic architecture of facial attractiveness with BMI and lipid traits. Of note, sex-specific genetic architecture of facial attractiveness was a recurrent pattern observed in almost all our analyses. The loci that reached significance in analyses based on all samples showed consistent effects between males and females, but sex-specific analyses revealed a list of new loci. The leading SNP at genome-wide significant locus 2p22.2 showed a significant interaction effect with sex. In multi-trait analyses, SNPs associated with female coders’ attractiveness ratings were enriched for associations with hair color while top SNPs for male coders’ ratings were enriched for associations with skin pigmentation. Additionally, only female attractiveness (especially MC-FS) showed strong and negative genetic correlation with BMI while male attractiveness was more strongly correlated with blood cholesterol levels which are known to be involved in the synthesis of testosterone and other steroid hormones [48]. Finally, variants and genes were identified for both male and female attractiveness but the selection pressure on negative associations of male attractiveness seemed to be particularly strong. These results not only provided fundamental new insights into the genetic basis of facial attractiveness, but also revealed the complex relationship between attractiveness and a variety of human traits. This study was not without limitations. First, although WLS provided a great opportunity to study the genetics of facial attractiveness, the sample size was moderate and we did not find an external cohort to replicate our association findings due to the uniqueness of this phenotype. Although heritability of facial attractiveness has been demonstrated in twin studies [17, 18], we were unable to obtain statistically significant results on chip heritability in our study. Due to weak effect sizes, extreme multiple testing, and ubiquitous confounding, external replication and validation are critical steps in studies of complex trait genetics. In our analysis, we used 455 samples with genetically confirmed European ancestry in WLS to replicate genome-wide significant findings and performed a variety of analyses to assess the heterogeneity of identified associations, including comparing association signals between males and females as well as across different coders. The effect directions of both genome-wide significant SNPs were consistent between the discovery and replication stages and the p-values became more significant in the meta-analysis. Still, spurious associations remain a possibility and the validity of our findings needs to be further investigated using independent samples. Second, attractiveness measurements in WLS were based on high-school yearbook photos. Although it is a common practice to use photos as the basis of attractiveness measurements [11, 18, 49], our phenotyping approach does not cover every aspect of attractiveness and the results need to be interpreted with caution. In our study, each photo was rated by 12 different coders and the rating scores were consistent across coders. These results suggest the robustness of the phenotypic measurements in our study, but many questions remain unanswered. What are the roles of age, physical body shape, facial expression, and make-up in the perception of attractiveness? What is the impact of assortative mating on the genetics of attractiveness [50]? And what is the shared and distinct genetics between attractiveness and closely related facial phenotypes such as symmetry, averageness, and sexually dimorphic features [14]? These are just a handful of questions beyond the scope of this study. We also note that since each yearbook photo in WLS was rated by 6 female and 6 male coders, we were able to obtain robust phenotypic measurements based on male and female coders separately. This stratification proved critical for some analyses we conducted in this study. However, for future replication using other cohorts without sufficient numbers of male and female-specific ratings, it may be necessary to conduct additional GWAS by combining all coders’ ratings. Additionally, we note that both the raters and the people being rated were from one state that was racially and ethnically quite homogeneous and we only included samples with European ancestry in this study. Further, the yearbook photos in WLS were collected more than sixty years ago. It is unclear how well our results can be generalized to other populations, age groups, and generations. If the same high school yearbook photos were to be rated for facial attractiveness by a more ethnically or racially diverse set of raters, and if the findings were to be replicated, then the inference regarding genetic association of attractiveness would be strengthened. Nevertheless, this study was a successful attempt to pin down genetic components of human facial attractiveness. Many of these unanswered questions will be exciting directions to explore in the future. We have little doubt that robust and comprehensive phenotypic measurements, coupled with larger sample sizes from diverse populations, will further advance our understanding of this interesting human trait.

Methods WLS data details WLS is a longitudinal study of a 1/3 random sample of over ten thousand Wisconsin high school graduates in 1957. Facial attractiveness in WLS was measured based on each individual’s 1957 high school yearbook photo by 12 coders (six females and six males) selected from the same cohort in 2004 and 2008. The subjects in the photos were of the same age and the photos had the same yearbook format. In total, 80 coders were involved in the study and not all photos were rated by the same group of coders. An 11-point rating scale was used to quantify attractiveness. End-points of rating were labeled as “not at all attractive” and “extremely attractive” for 1 and 11, respectively. In this study, we used normalized average ratings from female coders and normalized average ratings from male coders as two quantitative traits for facial attractiveness. Normalization was performed in a prevailing fashion as subtracting mean and then dividing by standard deviation. Genetic data were obtained from saliva samples in the years 2006 and 2007 using Oragene kits and a mail-back protocol. All participants provided informed consent under a protocol approved by the Institutional Review Board of the University of Wisconsin-Madison. Genotyping was conducted using the Illumina Human Omni Express Bead Chip. 713,014 SNPs were genotyped. The quality control process was previously conducted for a published GWAS on educational attainment [51]. We used genotype data imputed against the Haplotype Reference Consortium (HRC) panel. Individuals were removed if they met one of the following criteria: 1) genotype missingness rate > 0.05; 2) surveyed sex did not match genetic sex; 3) surveyed relatedness did not match genetic relatedness; 4) the individual was an outlier in respect of heterozygosity or homozygosity (F statistic > 0.03 or < -0.03); 5) the individual was an ancestral outlier–we iteratively dropped individuals with nearest neighbor z-score < -5 until no more individuals with a z-score < -5 remain. In addition to these quality control criteria, we only included individuals with available attractiveness ratings, self-reported European ancestry, and birthday between 1937–1940 in the study. SNPs were removed if: 1) call rate < 0.95; 2) Hardy-Weinberg exact test p-value < 1.0e-5; 3) minor allele frequency < 0.01; or 4) imputation quality score < 0.8. After quality control, 7,251,583 autosomal SNPs and 3,928 individuals remained in the discovery stage. GWAS analysis In the analysis using all samples (i.e. MC-AS and FC-AS), we applied linear mixed model (LMM) implemented in the GCTA software [52] to perform association analysis while correcting for relatedness among samples. In addition, sex, round of coding (i.e. was attractiveness rated in 2004 or 2008), dummy variables for birth year were included in the model as covariates. In sex-stratified association analyses, we applied linear regression instead of LMM due to the reduction in sample size and the consequent non-convergence of the restricted maximum likelihood algorithm and add the first two principal components into covariates. We used the prevailing p-value cutoff 5e-8 to claim genome-wide significance and 8.3e-9 as the study-wise significance cutoff to further adjust for 6 traits we analyzed. In addition, we used 1e-6 as a suggestive significance cutoff. WLS data were collected on high school graduates of the same year and distant cousins may be involved due to the study design. To adjust for family structure in linear regression analysis, we kept only one individual in each pair of samples with relatedness coefficient greater than 0.05. Relatedness coefficients among samples were estimated using PLINK [53]. After these additional quality control steps, 1,792 male samples and 2,062 female samples remained in sex-stratified analyses, PLINK was used to perform association analysis with sex, round of coding, birth year, and the first two principal components (PCs) included as covariates. We also used PLINK to test the interaction between genome-wide significant SNPs and sex using all samples. Males were coded as 1 and females were coded as 2. Significance was determined by a Bonferroni-corrected p-value cutoff (i.e. 0.05/4) which adjusted for two SNPs and two traits (i.e. attractiveness ratings based on male and female coders). Replication We replicated genome-wide significant findings from the discovery stage using WLS samples who did not report ancestry information but had genetically confirmed European ancestry. Scatter plot based on top two PCs for WLS and 1000 Genome samples [54] is shown in S17 Fig. Quality control procedure in the replication dataset is the same with that in the discovery stage. 455 individuals passed quality control and were used to replicate the association of rs2999422 with FC-AS, and 213 female samples were used to replicate the sex-stratified association between rs10165224 and MC-FS. We used linear regression implemented in PLINK to perform association analysis. Inverse variance-weighted method was applied to meta-analyze results from the discovery and replication stages. XWAS SNPs on the X-chromosome were imputed against HRC panel using the Michigan Imputation Server [55]. Variants were removed if 1) missing call rates > 0.1; 2) MAF < 0.005; 3) significant deviation from Hardy-Weinberg equilibrium in women (p<1e-6); 4) imputation quality score < 0.8. 5) located in the pseudo-autosomal regions (PARs), or 6) MAF between males and females was significant (p<0.001). Additionally, individuals were removed if their reported sex did not match the heterozygosity rates observed on chromosome X. After these quality control steps, 156,615 SNPs and 3,921 samples (2,102 females and 1,819 males) were included in our analyses. We used XWAS software [56, 57] to perform sex-stratified tests on X-chromosome. We added the first two PCs calculated from autosomes as covariates to adjust for population stratification. Round of coding, birth year were also included in the model as covariates. Fisher’s method and Stouffer's method implemented in XWAS were used to meta-analyze male and female samples (i.e. FC-AS and MC-AS). eQTL data and transcriptome-wide association analysis Multi-tissue gene expression and eQTL data were acquired from data portal of the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org). We applied UTMOST [43] to perform cross-tissue transcriptome-wide association analysis for six facial attractiveness traits. We used cross-tissue gene expression imputation models trained from 44 tissues in GTEx [58]. Gene-level association meta-analysis was performed using generalized Berk-Jones test [59] implemented in UTMOST software. Statistical significance was determined using a Bonferroni corrected p-value cutoff 3.2e-6. Heritability and multi-trait analysis The GREML method implemented in GCTA software was used to estimate heritability of facial attractiveness [60, 61]. We also used GEMMA as an alternative approach to validate the results [28]. Sex, round of coding, and dummy variables for birth year were included as covariates. We applied stratified LD score regression [62] implemented in the LDSC software to perform heritability enrichment analysis and identify biologically relevant tissues for facial attractiveness. Tissues with sample sizes greater than 100 in GTEx were included in the analyses. In sex-stratified analyses, non-existent tissues (e.g. testis for females and ovary for males) were removed from the analyses. For each tissue, functional annotation was defined as regions near highly expressed genes (within 50,000 bp up- or downstream). We used median transcripts per million (TPM) as the criterion to select top 10% highly expressed genes. We then estimated partitioned heritability using functional annotation for each tissue while including 53 baseline annotations in the model. P-values were calculated using z-scores of regression coefficient as previously suggested [62]. GWAS summary statistics for six dermatological traits in the UK biobank were downloaded from GWAS atlas (http://atlas.ctglab.nl). After clumping the data using an LD cutoff of 0.1, we tested if top SNPs associated with each attractiveness trait (p < 0.05 in attractiveness GWAS) were enriched for SNPs associated with skin and hair pigmentation (p < 0.05 in the corresponding GWAS). We used hypergeometric test to assess enrichment and a Bonferroni-corrected p-value cutoff to claim statistical significance (p<0.05/36 = 0.0014). We used the GNOVA method [63] to estimate genetic covariance between complex traits. Association statistics of six facial attractiveness traits were jointly analyzed with publicly accessible GWAS summary statistics for 50 complex traits (S11 Table). Since samples in WLS were not used in those 50 published datasets, uncorrected genetic covariance estimates were reported in our analyses. Additionally, due to numerically unstable estimates for heritability, we report genetic covariance instead of genetic correlation throughout the paper. We used MR-Egger [64] approach implemented in ‘MendelianRadomization’ R package to perform causal inference between complex traits. We selected instrumental SNP variables by applying a LD cutoff of 0.05 and a p-value cutoff of 1.0e-9. Based on these criteria, 31 top SNPs for BMI were included in our analysis. Other bioinformatics tools Manhattan plots and QQ plots were generated using ‘qqman’ package in R [65]. Locus plots for GWAS loci were generated using LocusZoom [66]. Data availability Genotype data from WLS are available to the research community through the dbGaP controlled-access repository at accession phs001157.v1.p1. Phenotypic data in WLS can be accessed via the WLS data portal (https://www.ssc.wisc.edu/wlsresearch). Summary statistics for facial attractiveness are available at (ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/Attractiveness).

Acknowledgments We thank Pamela Herd, Geoffrey Hayes, Joe Savard, Carol Roan, and Kamil Sicinski for their assistance in WLS data curation. We also thank Maryam Zekavat for her assistance in interpreting the involvement of cholesterols in steroid hormone synthesis.