Identification of prognostic biomarkers

To profile genomic alterations of neuroblastoma, we genotyped 641 neuroblastoma tumor DNA samples, using Illumina HumanHap550 or 610 SNP arrays including 442 samples with matched blood DNA samples (Supplementary Data 1). Tumor-based copy number segments were first calculated by ASCAT and OncoSNP, and then curated manually. Tumor samples were excluded if the copy number results were suggestive of noise or if no amplification/deletion events were detected. Following thorough quality control, 437 tumor samples were kept for analysis, including 300 with matched blood DNA genotyping data. A total of 154 neuroblastoma tumors were identified with MNA, 245 with 11q deletion, and 167 with 1p deletion, respectively. The inverse correlation between MNA and 11q deletion events was also confirmed (Pearson correlation coefficient = −0.83), only 27 neuroblastoma tumors were identified with both MNA and 11q deletion. Besides the commonly observed MNA, 11q deletion and 1q deletion variations in neuroblastoma (Supplementary Fig. 1), we also identified a number of genomic regions targeted by low frequency somatic aberrations (Supplementary Figs. 2–7 and Supplementary Tables 1 and 2). In this regard, focal amplifications of ALK, CCND1, LIN28B, MDM2, and 19q13.42 observed in our study have been previously implicated in individual neuroblastoma studies (Supplementary Figs. 2 and 4)8,9,10,11,12. However, to our knowledge, recurrent focal amplifications of MYC, ZFHX3, KRAS, RRAS2, and CYTH1 have not been reported in neuroblastoma primary tumors before (Supplementary Note 1 and Supplementary Fig. 3). Interestingly, we observed a number of cases (56/628) with low-level amplification of CCND1 (8.9%). In most of those cases, the amplification of CCND1 co-occurred with 11q deletion (53/56, 94.6%), suggesting these two events are highly correlated (Supplementary Note 1).

Discovery stage

We subsequently identified 113 11q-deletion cases of European-American ancestry with available blood DNA genotyping data, and we performed GWAS in this subset together with 5109 ancestry-matched controls (Supplementary Fig. 8). The genomic inflation factor was 1.0 (Supplementary Fig. 9). Three SNPs within MMP20 at 11q22.2 (rs10895322, P = 2.62 × 10−9, OR = 2.858; rs3781788, P = 2.46 × 10−8, OR = 2.505; rs2280211, P = 3.11 × 10−9, OR = 2.604, logistic regression test) were found to surpass the conventional genome-wide significance threshold (P = 5 × 10−8, logistic regression test, Table 1 and Supplementary Fig. 10). All three SNPs map to the intronic regions of MMP20 and showed a high degree of linkage disequilibrium (Supplementary Fig. 11). Considering that three case subgroups were investigated in this study, we used an adjusted threshold by dividing the conventional threshold by three (P = 1.7 × 10−8, logistic regression test) with the top SNP rs10895322 remaining significant. The reported SNPs at previously identified neuroblastoma susceptibility loci, including 2q35 (BARD1), 6p22.3 (CASC15), and 11p15.4 (LMO1) were also nominally associated with 11q deletion (Supplementary Table 3). As the 11q22.2 region harboring MMP20 is commonly deleted in 11q-deletion neuroblastomas, we investigated 11q-deletion cases that are heterozygous (G/A) for rs10895322 and found that the risk allele (G) is preferentially retained in tumors (P = 2.09 × 10−3, binomial test). This result is consistent with the observation of preferential allelic imbalance in other cancers (Supplementary Fig. 12)13, 14. We did not observe evidence for epistasis between the previously reported loci and the newly discovered MMP20 locus (Supplementary Tables 4 and 5), suggesting that these susceptibility loci contributed to disease risk, independently.

Table 1 Association results for the top three genotyped SNPs at 11q22.2 Full size table

We also performed SNP association analysis in cases with MNA and 1p-deletion, respectively. The previously reported loci on 2q35 (BARD1) and 6p22.3 (CASC15)2, 3, reached genome-wide significance (Supplementary Fig. 3 and Supplementary Table 3) in the GWAS of 260 MNA cases (Methods) and 5109 controls and the 6p22.3 locus also showed a P value near genome-wide significance in the GWAS of 69 1p-deletion cases and 5109 controls. However, no significant associations at 11q22.2 was detected in either MNA or 1p-deletion neuroblastoma (Supplementary Table 3). When GWAS was performed in the 113 11q-deletion cases and 282 controls (78 undeleted 11q and 204 MNA neuroblastomas, Methods) of European-American ancestry, the 11q22.2 locus was still nominally significant (rs10895322, P = 5.50 × 10−5, OR = 2.811, logistic regression test, Supplementary Table 6), indicating a unique and independent role of the 11q22.2 locus in the 11q deletion cases. We subsequently applied the subset-based approach ASSET to investigate the impact of the 11q22.2 (MMP20), 2q35 (BARD1), 6p22.3 (CASC15), and 11p15.4 (LMO1) loci on the subtypes. ASSET is designed for a single case–control study in which cases are treated as distinct disease subtypes15. This permits both case–control and case–case comparisons (among subsets of disease subtypes) for the detection of the strongest association signals. Here, 11q deletion and MNA subtypes were included for the ASSET analysis, as they are negatively correlated and represent two distinct subtypes of neuroblastoma. However, 1p deletion was excluded as an independent subtype, since 1p deletion often co-occurs with 11q deletion or MNA. The ASSET results confirm the association between 11q22.2 and 11q-deletion subtype (Table 2).

Table 2 Results from subtype analysis of neuroblastoma risk loci Full size table

Replication stage

To replicate our findings, we genotyped the tumor and blood DNA samples of additional 192 neuroblastoma cases using OMNI-Express SNP array. Copy number analysis of the tumor samples identified 80 of them with 11q deletion, Among the 80 11q-deleltion cases, we identified 44 cases of European-American ancestry. The 44 cases and 1902 ancestry-matched controls were included for GWAS replication (Supplementary Fig. 8). All of the three genome-wide significant SNPs (rs10895322, P = 2.62 × 10−9, OR = 2.858; rs3781788, P = 2.46 × 10−8, OR = 2.505; rs2280211, P = 3.11 × 10−9, OR = 2.604, logistic regression test) at 11q22.2 in the discovery cohort showed the same direction of association in the replication set, and two of them had a P value <0.05 (Table 1). The association was further strengthened by a meta-analysis combing the discovery and replication studies (Table 1).

Imputation analysis

To evaluate additional variants not assayed directly on the genotyping arrays, we imputed unobserved genotypes at 11q22.2 using 1000 Genomes Project data as the reference. Imputation identified 13 additional SNPs that were significantly associated with neuroblastoma (Table 3). The top SNPs, both genotyped and imputed were all located in a strong linkage disequilibrium region (r 2 > 0.8) within MMP20 (Fig. 1 and Supplementary Fig. 11).

Table 3 Imputed SNPs surpassing genome-wide significance within and near MMP20 Full size table

Fig. 1 Regional association plot including both genotyped and imputed SNPs at 11q22.2. a Plotted are the regional association results from the meta-analysis of discovery and replication cohorts (−log10 transformed P values) and the recombination rate. SNPs are colored to reflect pairwise LD (r 2) with the most significantly associated SNP. The most significant SNP is shown in purple. The tracks of b CHROMHMM (Celline: HepG2, HSMM, K562, GM12878), c wgEncodeRegTfbsClusteredV3 and tfbsConsSites annotation are plotted on the bottom Full size image

Exome sequencing analysis

To investigate the possibility of association being a consequence of a rare variant, we analyzed exome sequencing data from 229 neuroblastoma samples and whole-genome sequencing data from 143 neuroblastoma samples from the TARGET database (https://ocg.cancer.gov/programs/target/). We examined for the presence of low frequency variants (<0.01 in 1000 Genomes Database) detected in the MMP20 coding region. Only one pathogenic variant was detected, which is associated with amelogenesis imperfecta. In addition, the frequency of rare variants presents in neurobalstoma cases is not significantly different in the 1000 Genome Database, indicating that the association we report here is unlikely a consequence of a rare variant (Supplementary Data 2).

eQTL analysis

To gain further insight into the biological mechanisms of the association at 11q22.2, we performed an expression quantitative trait loci (eQTL) analysis using gene expression data from 34 11q-deletion neuroblastoma cases. eQTL analysis showed that rs10895322 is significantly associated with MMP20 expression in samples with 11q deletion (P = 7.7 × 10−5, Student’s t-test, Fig. 2). We did not observe association between rs10895322 genotypes and the expression level of other genes near MMP20 in the 11q-deletion samples. In addition, no significant association was detected between rs10895322 and MMP20 expression levels in undeleted 11q neuroblastoma samples (P = 0.94, Student’s t-test, Fig. 2).