We identified six regions containing SNPs significantly associated with breast size (using a threshold of 5·10−8 for genome-wide significance), see Figure 1. The genomic control inflation factor for this study was 1.047 (see Additional file 1 for the quantile-quantile plot).

Figure 1 Manhattan plot of association with breast size. − log 10 p -values across all SNPs tested. SNPs shown in red are genome-wide significant (p<5·10−8). Regions are named with the postulated candidate gene. Full size image

The SNPs with the smallest p-values in these regions are rs7816345 in 8p12 (a region that is amplified in breast tumors and contains the breast cancer oncogene ZNF703 (zinc finger protein 703)), rs4849887 near INHBB (inhibin, beta B), rs12173570 near ESR1 (estrogen receptor 1), rs7089814 in ZNF365 (zinc finger protein 365), rs12371778 near PTHLH (parathyroid hormone-like hormone), and rs62314947 near AREG (amphiregulin); see Table 1 for details and Figure 2 for plots of p-values in these regions. All SNPs with p-values under 10-4 are shown in Additional file 2.

Figure 2 Associations with breast size in six regions with genome-wide significant SNPs. Colors depict the squared correlation (r 2) of each SNP with the most associated SNP (which is shown in purple). Gray indicates SNPs for which r 2 information was missing. For the plot labeled with rs7816345, the gene ZNF703 lies about 400kb outside the region displayed. Full size image

First, rs7816345 (p=1.64·10−13), lies within the 8p12 region commonly amplified in the luminal B subtype of estrogen receptor (ER) positive breast cancers that have a poor clinical outcome [20]. ZNF703, the only gene in the minimal amplified region, is the likely oncogene driving this amplification [21]. ZNF703 is up-regulated by estrogen and is a co-factor for a nuclear repressor complex that plays a role in the regulation of ER activity. It has also been implicated in the regulation of cell proliferation, and its overexpression leads to an increase in breast cancer stem cells [21, 22]. Interestingly, ZNF703 exerts a downstream effect on the TGF beta signaling pathway [22] and also cooperates with a form of p53 [23].

rs4849887 (p=3.31·10−11) lies 140kb downstream of the closest gene, INHBB. INHBB is a subunit of both inhibin and activin, hormones in the TGF beta superfamily that are important for many endocrine functions. While both INHBA (inhibin, beta A) and INHBB are expressed in normal breast tissue, only INHBB is up-regulated by estrogen [24]. Activin A (an inhibin beta A homodimer) is more highly expressed in breast cancer [25], though INHBB has been implicated in the carcinogenesis of non-endometrial uterine cancer [26]. INHBB is also highly expressed in fat cells, and its expression is reduced by weight loss [27]. A conditional analysis in this region, controlling for rs4849887, revealed a second, independent association with breast size: rs17625845 (p-value 4.7·10−10 in the initial analysis and 5.85·10−10controlling for rs4849887). This SNP is located upstream of INHBB.

The next SNP association with breast size is rs12173570, located near ESR1, (p=5.58·10−11). rs12173570 is in LD with rs9397435 (r2=0.56), which is also associated with breast size (p=1.15·10−9). rs9397435 has previously been associated with breast cancer in European, Asian, and African populations and affects the expression of ESR1[28]. The G allele of rs9397435 corresponds to larger breast size, increased cancer risk, and increased expression. ESR1 is of great importance in normal breast development and cancer, and there is evidence that rs2046210 (which is in LD with rs9397435 in Asian populations (r2=0.73) but less so in European populations (r2=0.13)) may be associated with breast density [9].

The fourth region associated with breast size is centered around ZNF365 (rs7089814, p=3.30·10−9). This SNP lies in an intron of ZNF365, about 90kb away from rs10995190. rs10995190 has been associated with both breast cancer [29] and breast density [9]. rs7089814 and rs10995190 are not in LD (r2=0.035), and there is some evidence that rs10995190 is associated with breast size independently from rs7089814 (p=8.5·10−4 initially (Table 2) and 1.7·10−3after correction for rs7089814).

Table 2 Association with breast size for SNPs previously associated with breast cancer Full size table

rs12371778, near PTHLH, is associated with breast size (p=1.03·10−8), and is in LD (r2=0.82) with rs10771399, which has previously been associated with breast cancer [30]. The A allele of rs10771399 is the risk allele for breast cancer and corresponds to the C allele of rs12371778, which is associated with larger breast size. PTHLH encodes a member of the parathyroid hormone family that plays a key role in embryonic mammary development [31] as well as lactation [32].

Finally, rs62314947, p=4.79·10−8, near AREG, barely falls under our threshold for genome-wide significance. Amphiregulin is related to the epidermal growth factor and TGF alpha families. It mediates ER function in mammary development [33, 34].

Three SNPs have p-values under 10-6 but are not genome-wide significant (Additional file 3). First, rs4820792 (p=4.17·10−7) lies 25kb upstream of CHEK2 (checkpoint kinase 2), which is involved in the response to DNA damage. The 1100delC mutation in CHEK2 is strongly associated with breast cancer; however, 1100delC and rs4820792 are not in LD. Next, chr22:40779964 (p=5.47·10−7) lies in SGSM3 (small G protein signaling modulator 3), near MKL1, megakaryoblastic leukemia (translocation) 1. Finally, rs61280460 (p=8.30·10−7) lies near SERPINA6 (serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 6).

Motivated by the above overlaps between breast cancer and breast size SNPs, we analyzed 29 SNPs that have previously been associated with breast cancer (from [29, 30] and the supplement of [9]) for association with breast size in our data (Table 2). Of these 29 SNPs, only four were significant after correcting for 29 tests; these are the SNPs mentioned above near ESR1 (two SNPs), PTHLH, and ZNF365.

There is a strong relationship in our data between BMI and breast size—each additional BMI unit corresponds to an increase of about 0.1 cup sizes on average. However, the SNPs in Table 1 are not in LD with any variants previously associated with BMI [35]; this is expected due to the inclusion of bra band size (which is correlated with BMI) as a covariate. Furthermore, even if we did not control for BMI, the strongest associations with BMI (e.g., rs1558902 near FTO) have effects of about 0.4 BMI units per allele. This would correspond to an expected βof about 0.04 for breast size for these SNPs, which is below the effect sizes we are powered to detect here. Indeed, if bra band size is not included as a covariate, rs1558902 has an estimated β of 0.07 (95% CI: 0.04 – 0.10) for breast size and p-value of 8·10−6as compared to βof 0.04 (95% CI: 0.01 – 0.07) with bra band size included.

The covariates included in the analysis explain about 9.7% of the variance in breast size in our data; including the 7 SNPs in Table 1 that are genome-wide significant increases this to 10.9%. We used these 7 SNPs to compute a genetic propensity score for breast size by counting the number of alleles associated with larger size that each participant carried. The average cup size among women in the top 5% of this score (women carrying 9 or more of the 14 possible “large” alleles) was 0.83 cup sizes bigger (5.39 versus 4.56) than the average cup size among women in the bottom 5% of this score (women carrying 4 or fewer “large” alleles).

We note that the estimation of breast volume via self-reported bra size is likely to be far from perfect. Thus, it would be interesting to see what effects the SNPs found here would have in a more exactly phenotyped population. Likewise, many of the SNPs reported here were only imputed and not directly typed. While the estimated r 2 values are generally quite high, indicating good imputation quality, ideally these SNPs would be directly typed in a replication cohort.