Abstract Compelling evidence from many animal taxa indicates that male genitalia are often under postcopulatory sexual selection for characteristics that increase a male’s relative fertilization success. There could, however, also be direct precopulatory female mate choice based on male genital traits. Before clothing, the nonretractable human penis would have been conspicuous to potential mates. This observation has generated suggestions that human penis size partly evolved because of female choice. Here we show, based upon female assessment of digitally projected life-size, computer-generated images, that penis size interacts with body shape and height to determine male sexual attractiveness. Positive linear selection was detected for penis size, but the marginal increase in attractiveness eventually declined with greater penis size (i.e., quadratic selection). Penis size had a stronger effect on attractiveness in taller men than in shorter men. There was a similar increase in the positive effect of penis size on attractiveness with a more masculine body shape (i.e., greater shoulder-to-hip ratio). Surprisingly, larger penis size and greater height had almost equivalent positive effects on male attractiveness. Our results support the hypothesis that female mate choice could have driven the evolution of larger penises in humans. More broadly, our results show that precopulatory sexual selection can play a role in the evolution of genital traits.

Male genitalia show great variation among closely related species (1). This variation is typically attributed to copulatory and postcopulatory sexual selection to increase male fertilization success under sperm competition (2) or cryptic female choice (3). There might, however, also be premating sexual selection on male genitalia. Precopulatory processes can influence genital morphology (4, 5), but it is unknown whether these results are due to direct female choice or sexual conflict. In species where genitalia are externally visible, sexual selection might also act if females prefer males with specific genital morphology. Despite this potential effect, relatively little research has tested whether primary sexual characters influence male attractiveness (6⇓–8).

How female choice acts on any given male trait, and hence the strength and direction of selection, can be influenced by several, nonmutually exclusive factors. First, females use multiple cues during the mate choice process (9). Overall male attractiveness is unlikely to be determined by individual traits (e.g., refs. 10 and 11), so manipulating traits in isolation can lead to faulty conclusions about net male attractiveness (but see also ref. 12). Second, traits within individuals are phenotypically and genetically correlated. These relationships can influence evolution via correlational selection (13). Third, there might be a size contrast effect such that female assessment of attractiveness varies if the trait of interest is viewed differently in relation to other traits, analogous to the Ebbinghaus–Titchener effect (14). For example, the same sized penis might be perceived differently on short and tall men. Finally, a female’s own phenotype might influence her mate choice decisions. Humans mate assortatively based on numerous traits, including height (15), facial symmetry (16), and body shape (17, 18). Hence, it is likely that how a female rates a male’s attractiveness will partly depend upon her own phenotype.

The upright body posture and protruding, nonretractable genitalia of male humans make the penis particularly conspicuous, even when flaccid. This observation has generated suggestions by evolutionary biologists that the comparatively large human penis evolved under premating sexual selection (19, 20). Furthermore, novels, magazines, and popular articles often allude to the existence of a relationship between penis size and sexual attractiveness or masculinity (21, 22). Many cultures have fashion items, like penis sheaths and codpieces, that draw attention toward male genitalia (20), highlighting the potential for female choice to influence the evolution of male genitalia. There are numerous psychological studies directly asking females for their preference regarding male penis size. The results are mixed, with studies finding that females prefer longer penises (23), wider penises (24, 25), or that penis size is unimportant (26). These studies, however, all use self-reported, direct questioning and are therefore susceptible to biases of self-censorship and pressure to conform to socially desirable responses to sensitive issues (e.g., refs. 27⇓–29).

The only scientific studies to attempt to test experimentally whether flaccid penis length affects male attractiveness asked women to rate five images created by modifying a single drawing of a male figure so that the test figures differed only in penis length (30⇓–32). These important studies were not designed to quantify directly the relative effect of penis length on attractiveness compared with other sexually selected male traits, such as height and body shape (30⇓–32). Therefore, it is still unknown whether penis size affects attractiveness when there is substantive variation in other, arguably more important, body traits, or whether interactions between these traits and penis size determine net attractiveness. For example, does a given increase in penis length have an equivalent effect on the attractiveness of a short and tall man? In addition, the use of small photographs to quantify size-based preferences might lead to different estimates than those obtained when viewing fully life-sized male bodies.

To address these issues, we presented a sample of heterosexual Australian women with projected life-size, computer-generated male figures (Fig. 1). Each figure was an animated 4-s video in which the figure rotated 30° to each side to allow participants to more easily evaluate the figure. We tested for the effects of flaccid penis size, body shape (shoulder-to-hip ratio), and height on male sexual attractiveness. The latter two traits have regularly been investigated and are known to influence male attractiveness or reproductive success [height (15, 33⇓–35), shape (18, 36, 37)]. Each trait had seven possible values that were within the natural range (±2 SD) based on survey data (36, 39). We generated figures for all 343 (= 73) possible trait combinations by varying each trait independently. This process eliminated any correlation between the three traits across the set of figures. Penis width did, however, covary positively with length in the program used to generate the figures, so we refer to overall “penis size” (but see also Materials and Methods). The women (n =105), who were not told which traits varied, were then asked to sequentially view a random subset of 53 figures, including 4 of the same control figure, and to rate their attractiveness as sexual partners (Likert scale: 1–7). Figure rating was conducted in the absence of an interviewer and was completely anonymous. We then used a standard evolutionary selection analyses to estimate multivariate linear, nonlinear, and correlational (interactive) selection (using the attractiveness score as a measure of “fitness”) arising from female sexual preferences (e.g., ref. 38).

Fig. 1. Figures representing the most extreme height, shoulder-to-hip ratio, and penis size (±2 SD) (Right and Left) in comparison with the average (Center figure) trait values.

Results Selection Analysis. There were highly significant positive linear effects of height, penis size, and shoulder-to-hip ratio on male attractiveness (Table 1). Linear selection was very strong on the shoulder-to-hip ratio, with weaker selection on height and penis size (Table 1). There were diminishing returns to increased height, penis size, and shoulder-to-hip ratio (quadratic selection: P = 0.010, 0.006 and < 0.0001) [“B” in Table 1] and, given the good fit of the linear and quadratic models, the optimum values appear to lie outside the tested range (i.e., maxima are >2 SD from the population mean for each trait) (Fig. 2). A model using only linear and quadratic selection on the shoulder-to-hip ratio accounted for 79.6% of variation in relative attractiveness scores (centered to remove differences among women in their average attractiveness scores). The explanatory power of height and penis size when added separately to this model was almost identical. Both traits significantly improved the fit of the model (log-likelihood ratio tests: height: χ2 = 106.5, df = 3, P < 0.0001; penis: χ2 = 83.7, df = 3, P < 0.0001). Each trait, respectively, explained an extra 6.1% and 5.1% of the total variation in relative attractiveness. Table 1. Linear selection gradients and the matrix of quadratic and correlational selection gradients based on average rating for each of the 343 figures and means of gradients generated separately for each participant Fig. 2. Relationship between attractiveness and penis size controlling for height and shoulder-to-hip ratio (95% confidence intervals) indicating quadratic selection acting on penis size. The effects of the three traits on relative attractiveness were not independent because of correlational selection (all P < 0.013) [“B” in Table 1]. Controlling for height, there was a small but significant difference in the rate of increase in relative attractiveness with penis size for a given shoulder-to-hip ratio (Fig. 3A). More compellingly, after controlling for shoulder-to-hip ratio, greater penis size elevated relative attractiveness far more strongly for taller men (Fig. 3B). Fig. 3. Contour map of the fitness surface (red: more attractive) for (A) penis length and shoulder-to-hip ratio (height controlled) and (B) penis length and height (shoulder-to-hip ratio controlled) (1 = mean attractiveness). Participant and Response Time Analysis. The average age of female participants was 26.2 ± 6.8 SD y old. The participants were 71.8% European, 20.9% Asian, and 7.3% from elsewhere with respect to ethnic origins. Female height was positively correlated with the linear effect that male height had on her rating of his relative attractiveness (i.e., the linear selection gradient for height calculated separately for each female) (Pearson’s r = 0.292, P < 0.0001) (Table 2). Females that were heavier than expected for their height (i.e., high relative weight/body mass index) showed a stronger linear effect of penis size on their rating of a male’s relative attractiveness (Pearson’s r = 0.227, P < 0.021) (Table 2). Female age was not correlated with the linear effect that any of the three male traits had on her rating of a male’s relative attractiveness (all P > 0.164) (Table 2). There was no effect of either the use of hormonal contraception or menstrual state on the linear effect of any of the three male traits on how a female rated relative attractiveness (all P > 0.166) (Table S1). We note, however, that these tests have limited power to detect a cycle effect, as women were not repeatedly surveyed during both the high and low fertility phases. Table 2. Correlations between female traits and the strength of linear selection on male traits The average latency to respond and rank a figure when pooled across all trials was 3.08 ± 0.028 s (mean ± SD) (n = 5,142). Controlling for baseline variation in response time among women, the response time was significantly greater for figures with a larger penis (F 1, 5034 = 15.099, P < 0.001), greater height (F 1, 5034 = 23.819, P < 0.001), and a greater shoulder-to-hip ratio (F 1, 5034 = 316.878, P < 0.001). Given that all three male traits were positively correlated with relative attractiveness, it is not surprising that, on average, there was also a significant positive correlation between a female’s attractiveness rating for a figure and her response time (mean correlation: r = 0.219, t 104 = 8.734, P < 0.001, n = 105 females). Controlling for differences among women in their average attractiveness scores (i.e., using relative attractiveness), we found significant repeatability of the ratings given to the 343 figures (n = 14–16 ratings per figure) (F 342, 4799 = 6.859, P < 0.001; intraclass correlation: r = 0.281). For example, the absolute difference in the rating score for the first and last (fourth) presentation of the control figure to the same female was 1.21 ± 0.10 (mean ± SE) (n = 105) on a seven-point scale. This is a high level of repeatability, as most figures had six adjacent figures that were identical except that they differed for one trait by 0.66 of a SD.

Discussion We found that flaccid penis size had a significant influence on male attractiveness. Males with a larger penis were rated as being relatively more attractive. This relationship is nonlinear, however, indicating that the proportional increase in attractiveness begins to decrease after a size of ∼7.6 cm (Fig. 2), which is an under-average penis size based on a large-scale survey of Italian men (39). Although we detected quadratic selection on penis size, any potential peak (i.e., the most attractive penis size) appears to fall outside the range used in our study. A preference for a larger-than-average penis is qualitatively consistent with some previous studies (30⇓–32), but our results differ in showing that the most attractive size appears to lie more than 2 SDs from the mean (i.e., no evidence for stabilizing sexual selection, in contrast to refs. 30⇓–32). Our results are further supported by the analysis of response time. We found a significantly positive, albeit small, correlation between penis size and response time. This finding is consistent with a pattern in adults whereby attractive stimuli are viewed for a longer periods (40). A tendency to view attractive stimuli for longer is a generalized phenomenon that starts in infancy (41, 42). Height and shoulder-to-hip ratio also influenced a male’s relative attractiveness with taller men and those with a greater shoulder-to-hip ratio being rated as more attractive by women. As with penis size, the proportional increase in attractiveness declined as both male height and their shoulder-to-hip ratio increased. These results are consistent with previous findings of sexual selection on male height based on evidence from attractiveness rankings and patterns of actual mate choice (15, 37; but see also refs. 43 and 44). Our results corroborate previously reported quadratic relationships between male height and reproductive success (34, 45; but see also refs. 33 and 35). Our results for shoulder-to-hip ratio are also broadly consistent with previous attractiveness studies on body shape (36, 46⇓–48). Again, the correlations between response time and height and shoulder-to-hip ratio, respectively, were both significantly positive, indicating the females made quicker decisions when viewing less attractive figures (40). Our study found no significant difference in the proportion of variance accounted for in our model by penis size and height (6.1% vs. 5.1%), indicating that both traits had equivalent effects on relative attractiveness. This finding is intriguing given that height is one of the most widely investigated and well-documented traits known to affect male reproductive success (15, 33⇓–35, 37, 43, 44). The finding suggests that selection on penis size is potentially as strong as selection on stature. The shoulder-to-hip ratio, however, accounted for a much larger proportion of variance in attractiveness in our model (79.6%). This result might be because of our figures extending too far into the feminine range of body shapes (36), as those with a low shoulder-to-hip ratio were highly unattractive. However, given increasing waistlines (49), the values we used are well within the range now seen in many Western countries. We detected correlational selection between all three traits, so the effects of each trait on attractiveness were not independent of one another. The effect of penis size on attractiveness varied with both height and body shape (Fig. 3B). After controlling for the shoulder-to-hip ratio, larger penis size had a greater effect on attractiveness for taller men. This result could be because perceived penis size was smaller when assessed relative to the height of a taller man; or because of general discrimination against short men irrespective of the value of other traits, so that even a larger penis did little to increase their net attractiveness. A similar relationship between penis size and shoulder-to-hip ratio was also detected (Fig. 3A). Attractiveness scores were not independent of the female participant’s phenotype. Most importantly, a female’s height was significantly positively correlated with the strength of her tendency to rate taller men as being relatively more attractive. This result is consistent with evidence that humans mate assortatively based on height (15). There was also a weak, albeit significant (P = 0.021), positive relationship between a female’s relative weight (comparable to body mass index) and the effect that penis size had on her assessment of male attractiveness. This relationship was far stronger if we included two outliers (>4 SD from mean; r = 0.333, P = 0.001, n = 105). The relationship was also stronger if we used a more stringent criterion to exclude four outliers (>2 SD from mean; r = 0.296, P < 0.01, n = 101). This result is intriguing but should be viewed with caution given that we conducted multiple tests. In sum, we show that flaccid penis size alongside its interaction with shoulder-to-hip ratio and height significantly influenced a male’s relative attractiveness. Our results directly contradict claims that penis size is unimportant to most females (22, 26, 50). Some studies indicate that preference for a larger penis might arise because penis size is associated with higher rates of vaginal orgasm (23, 51). In turn, vaginal orgasms are associated with higher levels of associated sexual satisfaction (52). The proximate basis of the decisions leading to the reported attractiveness scores is unknown. General preexisting aesthetic preferences, either innate or acquired through cultural norms, might account for the observed patterns. Another possibility is that females use previous sexual experiences to infer a link between penis size and desirable male properties [e.g., the likelihood of (vaginal) orgasm]. Arguing against this theory is the lack of a correlation between a woman’s age and the magnitude of the effect of penis size on her rating of male attractiveness. Regardless of the exact mechanism, however, our results show that female mate choice could have played a role in the evolution of the relatively large human male penis. More broadly, our study adds to growing evidence from several species that precopulatory sexual selection can influence the evolution of primary sexual traits in animals (4⇓⇓⇓–8).

Materials and Methods MakeHuman (v0.9.1RC1) was used to generate anatomically correct wire-frame figures. Body shape, height, and flaccid penis size were manipulated on each frame. We aimed to generate figures that encompassed the typical range of variation in these three traits in populations of Caucasian males. The penis and height values used stem from a large-scale study of an Italian male population, but these values fit within the standard range for Caucasians (reviewed in ref. 39). These values should capture ∼95% of the variation that females are likely to encounter, although they do not encompass the full range of variation, and the mean values are known to vary among different human populations. For height and penis length, seven values were evenly spaced between ±2 SD of the population mean (range: height: 1.63–1.87 m; flaccid penis: 5–13 cm) (39). Using this program we could not generate penises that only increased in length, so we refer to penis “size,” as there was a slight increase in width of 1.2 cm between the shortest and longest penis, whereas there was an 8-cm change in length. Body shapes were generated as seven evenly spaced values along the “masculinity” function of MakeHuman. We then summarized these figures using the shoulder-to-hip ratio in our analysis (range: 1.13–1.45; i.e., pear to V-shaped). These values fell within the natural range (36). Figures were imported into LightWave 3D (v9.6), colored gray, modified to reduce pixilation, and standardized for testicle size. We then generated videos where a forward facing figure took 4 s to rotate 30° to each side. Rotation increased the ability of participants to gauge penis size. Full details are available upon request. Female participants were recruited at Monash University and the Australian National University (students, staff, and nonuniversity). The experiment was briefly described to participants as a study of male attractiveness, but they were not told which male traits varied. Females were instructed to stand 6.5 m directly in front of a wall where figures were projected at full (life) size. Before data collection and after the interviewer left the room, participants filled out a questionnaire and were asked about their height, weight, and age (SI Text). A scale and tape measure (for height) were provided in the room. The participants were also asked whether they were using chemical/hormonal contraception and what stage of their menstrual cycle they were in. After the questionnaire, and before data collection began, all participants viewed the same set of 13 videos that spanned the range in male trait values to gain familiarity with the figures. Before testing, participants were then asked: “Please rate each figure based on how sexually attractive they are to you” (Likert scale: 1–7). During the test, each participant was shown a unique, randomly ordered set of 53 videos: 49 test videos and 4 control (all traits at mean) videos. After the participant entered a rating score (by pressing a keyboard button) the next figure in the sequence appeared. The system automatically recorded the time between the figure first appearing and a score for it being entered. We obtained data from 105 participants who self-identified as (i) heterosexual or (ii) exclusively attracted to men in a pretest questionnaire (data from other participants were excluded: n = 13). Hence, all 343 figures were each viewed by approximately 15 women (n = 5,145 ratings). Stimuli were displayed at life size using a digital projector in a private viewing room. Data were collected using SuperLab (v4.5). Data collection was anonymous so that no answers could be traced back to participants. Ethics approval was granted through Monash University (MUHREC Approval CF11/1378–2011000764). Data Analysis. Data on attractiveness were analyzed using standard multivariate selection procedures (13, 53). Our analyses clearly showed strong nonlinear and correlational selection, so we did not conduct canonical rotations of the data to generate eigenvectors (e.g., refs. 53⇓–55). We conducted two analyses. First, we used a standard analysis based on a multiple regression of “relative attractiveness” on standardized trait values (mean = 0, SD = 1). We centered the rating scores from each participant (i.e., the mean rating for each participant was then zero). This process generated participant-corrected scores to control for variation among participants in their tendency to give higher or lower than average scores. For relative attractiveness we then calculated the mean participant-corrected attractiveness score for each of the 343 figures (an average of 15 participants viewed each figure). The mean score of the 343 figures is 0, so we added 1 to each figure’s score to generate the final relative attractiveness score. This addition was done purely for presentation reasons, as the convention in selection analyses is that the average individual has a value of 1. Adding 1 does not change estimates of selection gradients (i.e., regression coefficients). The relative attractiveness score is the dependent variable that we used as a surrogate measure of “fitness.” We estimated selection gradients (13, 54) and associated P values from standard tests for regression coefficients (13) [see “A” in Table 1]. Because we present the results as a selection analysis, the regression coefficient for the squared product of individual traits are doubled (54). The selection gradients in Table 1 can therefore be read as the increase in attractive score (on the original 1–7 scale) with a one SD increase in the focal trait. Second, we used the same multiple-regression approach to calculate a unique fitness surface for relative attractiveness for each participant. We did this to control for the fact that our first analysis did not account for participant identity. The dependent variable was simply the centered attractiveness for each participant. The three traits were each standardized for the set of figures that the participants viewed. We then calculated the mean value for each selection gradient (i.e., each mean was based on 105 independent estimates) and used one-sample t tests to determine whether means differed from zero (all distributions were normal, Kolomogorov–Smirnov tests, P = 0.23–0.94) [see “B” in Table 1]. Both methods yielded very similar estimates of selection gradients [compare “A” and “B” in Table 1]. In Figs. 2 and 3 we present data based on the relative attractiveness of the 343 figures. We generated attractiveness contour maps (Fig. 3) with thin-plate splines in the fields package of R (56). To investigate the relationship between female traits and attractiveness scores, we used Pearson’s correlations to measure the relationship between the linear selection gradients (calculated using the second method) for each male trait (penis size, height, and shoulder-to-hip ratio) and each of three female traits (age, height, and weight). Weight and height are correlated (r = 0.322, P = 0.001), so to control for height, we used the residuals from a regression of weight on height. These parameters can be considered broadly equivalent to a measure of body mass index. We identified two females that showed a strong deviation from the regression line (residuals >4 SD). We excluded these participants from all of the results presented in Table 2. Finally, we used two-sample t-tests to compare selection gradients between females assigned to one of two categories for contraception (using or not using chemical/hormonal contraception) and stage in the menstrual cycle [peak of cycle (1–7 d after the start of menstrual cycle) or not peak cycle (8–28 d after the start of the menstrual cycle)], respectively (Table S1). Response Time and Repeatability Analysis. We analyzed the effect of penis size on female latency to rate a figure in two ways. First, we ran a general linear mixed model with response time as the dependent variable and the three standardized male traits as fixed covariates. We included female participant identity as a random effect to control for multiple trials per female. To improve the model fit, we log-transformed response time (analyses on untransformed data yielded the same conclusions). We also ran the model excluding all cases (n = 246 of 5,142) where the response time was less than 0.1 s (this was a natural break in the data, as the log-transformed response time then showed a very close fit to a normal distribution). Again, the model yielded the same conclusions. Second, to determine how figure attractiveness influenced response time, we calculated the Pearson’s correlation between the 53 attractiveness scores and log response time for each female. These 105 correlations were then compiled and a one-sample t test conducted to test whether the mean correlation was significantly different from zero. Use of Spearman ranked-order correlations yielded the same conclusion. Data on response time were missing for 3 of the 5,145 trials. To determine the repeatability of ratings of a figure’s attractiveness across females, a repeatability analysis was performed for the 343 figures. We used participant-corrected attractiveness scores as the dependent variable in a one-way ANOVA (with figure identity as the categorical factor) to estimate the intraclass correlation. This correlation is measure of the agreement among females in how they rate a figure’s attractiveness. See Dataset S1 for the original data (n = 5,145 ratings from 105 participants), Dataset S2 for the relative attractiveness scores and trait values for the 343 figures, and Dataset S3 for selection gradients and questionnaire responses for the 105 participants.

Acknowledgments We thank J. Burchell, J. Irons, H. Kokko, E. McKone, and R. Reynolds for technical support; P. Backwell, I. Booksmythe, R. Catullo, and R. Lanfear for comments on previous drafts of the manuscript; and Geoff Miller and one anonymous referee for their thoughtful and constructive comments on our manuscript. This project was funded by the Australian Research Council; ethics approval was granted through Monash University (MUHREC Approval CF11/1378 – 2011000764).