Methods Summary

Our analysis was restricted to individuals of >80% estimated European ancestry based on a principal component analysis of population substructure. EWS cases were confirmed by medical record review, which included checking for the presence of EWSR1-ETS fusions when data was available. Principal component matching was performed to select a genetically homogeneous set of adult controls who were cancer-free as of age 50 for each EWS case. Sample and SNP quality control exclusions were carried out to ensure unrelated, high quality samples for association analysis with accurate genotype assays. Missing genotypes were imputed using 1000 Genomes Phase 3 haplotypes as a reference24. We combined results across studies using a fixed effects meta-analysis. Variants with minor allele frequencies <5% or significant evidence for heterogeneity were filtered from the final results. A more detailed description of our experimental methods and analysis technique is available in Methods.

Replication of prior EWS GWAS

Our analysis provided strong replication of three previously discovered EWS susceptibility loci22 and aided in refining the association signals. We observed rs113663169 as the most significant variant tagging the 1p36.22 locus (OR = 2.05, 95% CI = 1.71–2.45, P-value meta = 4.32×10−15). This variant is in high linkage disequilibrium (LD) with the original reported variant rs9430161 (R2 CEU = 0.97, D′ CEU = 1.00;25 OR = 2.03, 95% CI = 1.70–2.42, P-value assoc = 6.3×10−15)22 and is located upstream of TARDBP, a transcriptional repressor that shares structural similarities with EWSR1 and binds RNA regulatory elements. At the 10q21 locus, we observed rs10822056 with the strongest association (OR = 1.76, 95% CI = 1.54–2.02, P-value meta = 1.92×10−16). This variant is correlated with the reported variant from the original GWAS, rs224278 (R2 CEU = 0.52, D′ CEU = 0.92;25 OR = 1.71, 95% CI = 1.49–1.96, P-value assoc = 6.9×10−15)22, as well as the putatively functional variant, rs79965208 (R2 CEU = 0.24, D′ CEU = 0.57;25 OR = 1.42, 95% CI = 1.24–1.63, P-value assoc = 5.3×10−7)23. Interestingly, as indicated previously23, the conditional analysis at 10q21.3 suggests evidence for a residual independent signal in this region, although larger EWS GWAS are needed to confirm the presence of multiple independent signals. Finally, at 15q15.1 we observed rs2412476, a tagging variant strongly associated with EWS (OR = 1.73, 95% CI = 1.48–2.01, P-value meta = 1.45×10−12). This variant is in moderate LD with rs4924410 from the original GWAS (R2 CEU = 0.18, D′ CEU = 1.0025; OR = 1.62, 95% CI = 1.41–1.86, P-value assoc = 5.4×10−12)22 and is located near several genes including BMF, BUB1B and PAK6.

Newly identified EWS susceptibility loci

Our analysis identified suggestive evidence for novel genomic associations (P-value meta < 5 × 10−7) in four genomic regions (Table 1, Supplementary Table 1): 6p25.1, 8q24.23, and 20p11.22 and 20p11.23. To validate signals from imputed variants in these regions, we performed allele-specific TaqMan PCR for a subset of 335 GWAS samples on the following variants: rs7744366 (6p25.1), rs7832583 (8q24.23), rs12106193 (20p11.22) and rs6106336 (20p11.23). All PCR-validated genotypes had over 93–99% concordance with imputed genotypes indicating high accuracy of imputation in these regions (Supplementary Table 2). Additionally, these signals were replicated in two independent series of EWS cases and controls: a European set from the Institute Curie containing of 480 EWS cases and 576 controls22, and a German set from LMU Munich containing 177 EWS cases and 3502 controls. All combined association P-values (GWAS discovery+independent replication sets) were below genome-wide significance levels (P-value meta < 5 × 10−8, Supplementary Table 1, Supplementary Figures 2–5) except for the 8q24.23 locus (P-value meta = 1.44 × 10−7). The 6p25.2 and 20p11.22 signals were independently replicated in both German and European replication sets; however, the 8q24.23 signal was only significant in the European set (P-value assoc = 0.007) and the 20p11.23 signal was only replicated in the German set (P-value assoc = 0.036).

EWS susceptibility locus at 6p25

We identified a new locus on 6p25.1 tagged by rs7742053 (OR = 1.80, 95% CI = 1.48–2.18, P-value meta = 2.78×10−9) with the A allele being the risk associated allele (Supplementary Table 3). The marker variant rs7742053 is telomeric to RREB1, SSR1 and CAGE1. Expression quantitative trait locus (eQTL) analysis using rs1286037, a correlated surrogate for rs7742053 (R2 CEU = 0.49, D′ CEU = 1.00)25, identified allele specific expression differences in RREB1, with the risk A allele of rs7742053 corresponding to increased levels of RREB1 expression (P-value Wald = 0.01, Table 2). RREB1 encodes the RAS responsive element (RRE) binding protein 1, a zinc-finger transcription factor that binds to RRE in gene promoters26. RREB1 is expressed in EWS tumors at higher levels than other pediatric sarcomas (Supplementary Figure 6), suggesting regulation of RREB1 may be particularly important for EWS. In addition, the 6p25.1 locus shows evidence for an interaction between germline variation and EWSR1-FLI1 fusion proteins. ChIP-seq of acetylated H3K27 (H3K27ac) indicates an area of open chromatin that spans a polymorphic GGAA microsatellite near rs7742053 (Supplementary Figure 7-8, Supplementary Tables 4-5). ChIP-seq analysis of EWSR1-FLI1 in the A673 and TC-71 EWS cell lines confirm EWSR1-FLI1 binding to this GGAA microsatellite at 6p25.1. Further, knock down of EWSR1-FLI1 in xenografts derived from the A673/TR/shEF1 EWS cell line results in strong downregulation of RREB1 in vivo (Supplementary Figure 9). Several variants correlated with rs7742053 are in contiguity with the GGAA repeat and may be candidate functional variants that disrupt EWSR1-FLI1 binding (Supplementary Table 5). One such variant, rs10541084, a -/GAAG indel is located at the telomeric end of the nearest GGAA microsatellite, is in LD with rs7742053 (R2 CEU = 0.15, D′ CEU = 0.92)25, and is nominally associated with EWS (OR = 1.20, 95% CI = 1.04–1.37, P-value meta = 0.01). Interestingly, the rs7742053 risk A allele is correlated with the rs10541084 GAAG allele which is more common in Europeans, extends the microsatellite GGAA repeat sequence, and could enhance binding of EWSR1-FLI1. This evidence suggests that a similar mechanism as in the 10q21 locus23 may be acting at the 6p25.1 locus in which variation of a GGAA repeat affects EWSR1-FLI1 binding leading to altered expression of RREB1 or an alternative nearby gene. Further functional work at 6p25.1 is required to clarify which variants are functionally responsible for the susceptibility signal.

Table 2 Functional associations for newly identified EWS susceptibility loci Full size table

EWS susceptibility locus at 20p11

We identified an association signal spanning chromosome 20p11.22-23. The strongest association signal was on 20p11.22 tagged by rs6047482 (OR = 1.74, 95% CI = 1.49–2.04, P-value meta = 2.55×10−12). The A allele is the risk allele with a higher frequency observed in 1000 Genome Europeans than in Africans (Supplementary Table 3). While no statistically significant eQTL was observed between this locus and nearby genes (Table 2), the nearest transcript, NKX2-2, is of high interest; NKX2-2, NK2 homeobox 2, encodes a homeobox domain protein that is a likely nuclear transcription factor, which is overexpressed in the presence of EWSR1-FLI1 fusions in EWS tumors27,28. Our analysis did not detect significant allele specific expression differences for NKX2-2 in association with rs6047482 (eQTL P-value Wald with rs12106193 = 0.17, R2 CEU and D′ CEU between rs6047482 and rs12106193 = 0.67 and 1.00, respectively)25. We explored eQTLs for other tissue types in GTEx with surrogate SNPs in moderate to high linkage disequilibrium with rs6047482, but found no evidence for an eQTL with NKX2-2 in these tissues likely due to EWS specific expression of NKX2-2 (Supplementary Table 6)29. It is plausible that EWSR1-FLI1-induced elevated NKX2-2 expression levels in EWS cells hamper our ability to detect allele specific expression patterns of NKX2-2 that may be important for EWS transformation in the EWS progenitor cells. Further eQTL analyses in a large set of mesenchymal stem cells, the suspected EWS cell-of-origin, should enable this hypothesis to be tested. As with the 6p25.1 locus, ChIP-seq data show that EWSR1-FLI1 binds to one or more polymorphic GGAA microsatellites proximal to the tagging variants (Supplementary Figure 10) suggesting that variation in this region could exert an effect through NKX2-2 gene regulation in EWS progenitors and in turn through EWSR1-FLI1 binding in EWS cells. Importantly, the six lead SNPs are on average significantly closer to EWSR1-FLI1 bound elements than would be expected by chance on a chromosome-wide level (P-values Wilcoxon = 0.0025 and 0.0009 in A673 and TC71 cell lines, respectively) (Supplementary Figure 8, Supplementary Table 4).

Independent EWS susceptibility signal at 20p11

In the search for additional independent loci at each EWS susceptibility locus (Supplementary Figure 4), we identified a second, independent signal on 20p11.23 tagged by rs6106336 based on a conditional analysis using the discovery marker, rs6047482 (R2 CEU = 0.003, D′ CEU = 0.23; OR = 1.74, 95% CI = 1.43–2.12, P-value meta = 2.33×10−8, P-value conditional = 5.2×10−8, Fig. 2) with the G allele acting as the risk associated allele. A distinct eQTL was observed between a highly correlated surrogate for rs6106336, rs6047241 (R2 CEU = 1.00, D′ CEU = 1.00), and KIZ, kizuna centrosomal protein, (also known as PLK1S1) with the risk G allele associated with increased expression (P-value Wald = 0.01, Table 2). This eQTL at 20p11.23 with KIZ does not appear to be restricted to EWS and was observed in other GTEx tissues (e.g., artery, sun-exposed skin, testis and whole blood; Supplementary Table 6). KIZ localizes to the centrosomes and functions to strengthen and stabilize the pericentriolar region prior to spindle formation30. While limited evidence suggests EWSR1-FLI1 binding in this region, H3K27ac patterns suggest areas of open chromatin that may harbor variants important for regulation of nearby gene products (Supplementary Figure 11).

Fig. 2 Conditional analysis at the 20p11.22-23 region. Overall meta-analysis –log 10 P-values are plotted in gray in the background. In the foreground, meta-analysis –log 10 p-values when the top tagging SNP is the region (rs6047482) is conditioned on is plotted in blue. A second independent signal, tagged by rs6106336, remains Full size image

EWS genetic risk score

In light of the observed set of EWS loci, all with high estimated effect sizes, we generated a genetic risk score (GRS) combining risk alleles from the six EWS susceptibility loci to test the ability of an EWS GRS to discriminate between EWS cases and cancer-free adult controls (Supplementary Figure 12). On average, EWS cases carried 1.08 more risk alleles than controls (7.08 average risk alleles in EWS cases, 6.01 average risk alleles in controls; P-value T-test = 2.44 × 10−63). Due to the rarity of EWS and the relatively high frequency of these common susceptibility alleles, absolute risks of EWS associated with these six EWS susceptibility loci are low suggesting population-based screening using these six variants is unlikely to be effective.

Genetic architecture of EWS

Our new, expanded GWAS of Ewing sarcoma has identified three new loci and also validated the three previously reported susceptibility regions. In analyses of the new loci, there is evidence of informative eQTLs with nearby biologically plausible candidate genes that could be likely target genes for future functional investigations. Additionally, EWSR1-FLI1 ChIP-seq data suggest evidence for potential interactions of germline variation at the 6p25.1 and 20p11.22 loci with the EWSR1-FLI1 fusion protein as recently discovered at the 10q21 locus23. It is remarkable that six independent susceptibility regions with relatively large effect sizes (estimated OR > 1.7) have been discovered in a sample of 733 EWS cases. These results provide a strong contrast to GWAS findings for the vast majority of cancers that report estimated effect sizes less than 1.2. Interestingly, GWAS in two highly heritable cancers (e.g., testicular and thyroid)31,32 have also identified susceptibility alleles with effect sizes in the range of what is observed for Ewing sarcoma. The efficiency of our discovery as well as the higher estimated EWS odds ratios could be related to the lack of tumor heterogeneity in our Ewing sarcoma GWAS, because most EWS cases studied had a pathologically confirmed EWSR1-ETS fusion, a pathognomonic molecular feature of the EWS diagnosis. Furthermore, our results suggest the underlying EWS genetic susceptibility architecture harbors a substantial number of moderate effect common variants, which is striking because Ewing sarcoma has not been considered to be highly heritable. In conclusion, our study provides support for a strong inherited genetic component to EWS risk and suggests interactions between germline variation and somatically acquired EWSR1-FLI1 translocations are important etiologic contributors to EWS risk.