Study characteristics

We meta-analysed Illumina’s HumanMethylation450-array results from 17 independent cohorts with data on newborn DNA methylation status, and 10 cohorts with data on DNA methylation in older children (age 4 to 18 years), including 4 cohorts with DNA methylation data both at birth and at an older age (Fig. 1). Table 1 summarizes the characteristics of participating cohorts. A summary of methods used by each cohort is provided in Additional file 1: Tables S1 and S2. In our main “no complications” model, we excluded participants exposed to maternal pregnancy complications (maternal diabetes, hypertension or pre-eclampsia) and whose labour was induced or who were delivered by caesarean section. With continuous gestational age in the number of days as the exposure (gestational age range 186–294 days corresponding to 27–42 weeks), we analysed results from 3648 newborns and from 2481 older children. This model was selected as the main model because associations of DNA methylation with gestational age related to pregnancy complications or potentially influenced by obstetric interventions may be less reflective of normal developmental processes than newborns with spontaneous uncomplicated delivery. However, we also analysed a larger dataset of 6885 newborns from 20 independent cohorts, including pregnancies with pregnancy complications and obstetric interventions, referred to as the “all births model” (see below).

Table 1 Characteristics of each cohort included in the association meta-analysis between gestational age (GA) and DNA methylation in newborns and older children Full size table

Associations between gestational age and newborn DNA methylation

We identified 8899 CpGs in cord blood that were associated with gestational age (range 27–42 weeks), at Bonferroni significance, P < 1.06 × 10–7, of which 3343 were novel. These were annotated to 4966 genes. CpGs associated with gestational age had a modest predominance of negative (60%) versus positive (40%) direction of effect, with an overall absolute median difference in mean methylation of 0.36% per gestational week, IQR = [0.26%–0.49%] (Fig. 2a). In general, results were highly homogeneous; evidence of high between-study heterogeneity, using a criterion of I2 > 50%, was seen for only 319 of the 8899 CpGs (Additional file 1: Table S3). Leave one out analyses did not indicate an influential effect on meta-analysis results of any single study. However, we replaced fixed effects values with random effects estimates for those CpGs with between study I2 > 50%, as these are more conservative in the case of heterogeneity.

Fig. 2 A, B Volcano (A) and Manhattan (B) plots for the meta-analysis of gestational age and offspring DNA methylation association at birth, after adjustment for covariates and estimated cell proportions. The effect size represents methylation change per gestational week Full size image

Differentially methylated CpGs spanned all chromosomes (Fig. 2b). The CpG with the lowest P value (P = 2.7 × 10− 129 for cg16103712; Table 2) was annotated to MATN2 on chr 8, and the difference in mean methylation at this CpG was 2.13% lower per additional gestational week (equal to 0.30% per day). The CpG with the largest negative association was cg04347477, annotated to NCOR2 on chr 12 (Table 3), with a lower mean methylation of 2.53% per additional gestational week. B3GALT4 (chr 6) had the largest number of significant CpGs negatively associated with gestational age (21 out of 52 (40%) tested CpGs annotated to B3GALT4). The largest positive association was observed for cg13036381 annotated to LOC401097 (chr 3) (Table 3) with a difference in mean methylation of 1.95% per additional gestational week. DDR1 (chr 6) had the largest number of significant CpGs positively associated with gestational age (26/95 (27%) CpGs). A complete list of associated CpGs is presented in Additional file 1: Table S3 and the CpG variation across cohorts in Additional file 3: Figure S1 (top CpGs).

Table 2 The top 10 Bonferroni-significant CpGs from the meta-analysis on the association between continuous GA and offspring DNA methylation at birth adjusted for estimated cell proportions Full size table

Table 3 The top 10 Bonferroni-significant CpGs ranked by the magnitude of positive and negative effect (5 CpGs each) from the meta-analysis on the association between continuous GA and offspring DNA methylation at birth adjusted for estimated cell proportions Full size table

We performed a sensitivity analysis by excluding cohorts that were included in previous EWAS of gestational age [29, 30] (three cohorts: MoBa1, MoBa2 and ALSPAC) in order to evaluate associations not driven by previous results, and found a high correlation (r = 0.89) of effect estimates (Additional file 3: Figure S2) compared with results from all cohorts included in the no complication model.

Next, we performed a meta-analysis of the larger dataset of 6885 participants from 20 studies without excluding maternal complications and caesarean section delivery or induced delivery. In this “all births model”, 17,095 CpGs located in or near 7931 genes were associated with gestational age after Bonferroni correction (P < 1.06 × 10− 7). Not surprisingly given the higher levels of statistical significance in this much larger data set, we found somewhat more between-study heterogeneity than in the no complications model, but high levels (I2 > 50%) were observed for only 1784 out of these 17,095 CpGs (Additional file 1: Table S4). We also observed a considerable overlap of CpGs between the two models with 93% of the 8899 CpGs in the no complication model also reaching Bonferroni significance in the all birth model and showing the same direction of effect.

CpG localization and regulatory region analyses

The 8899 differentially methylated CpGs in relation to continuous gestational age in the no complications model were enriched for localization to CpG island shores (33% of the 8899 CpGs are in shores, whereas 23% of all CpGs on the 450 K array are in shores, P enrichment = 4.1× 10− 100, Fig. 3), open sea (45% versus 37%, P enrichment = 1.4 × 10− 63), enhancers (37% versus 22%, P enrichment = 1.05 × 10− 236), DNase hypersensitivity sites (18% versus 12%, P enrichment = 1.3× 10− 56) and CpG island shelves (12% versus 10%, P enrichment = 1.2 × 10− 11) (Fig. 3). In contrast, we found relative depletion in CpG islands (10% versus 31%, P enrichment = 2.2 × 10− 308), FANTOM 4 promoters (2.3% versus 6.7%, P enrichment = 6.7 × 10− 79) and promoter-associated regions (11% versus 19%, P enrichment = 2.2 × 10− 104).

Fig. 3 Position enrichment analyses for CpGs. Salmon: all CpGs in the Illumina450k annotation file, green: CpGs significantly associated with GA after Bonferroni correction (P < 1.06 × 10− 7) and blue: three or more adjacent CpGs associated with GA after Bonferroni correction (P < 1.06 × 10− 7). “**” represent significant two-sided doubling mid P value of the hypergeometric test Full size image

Analysis restricted to term-births

To evaluate whether observed DNA methylation differences in relation to continuous gestational age were driven by preterm birth, we repeated the no complication model including only infants born at term (gestational age 37 to 42 weeks). In this analysis, we meta-analysed results from 18 cohorts (one additional cohort with term-birth data only was included; GEN3G) (n = 3593). We identified 5930 sites significantly associated with gestational age at Bonferroni correction (P < 1.06 × 10− 7, median difference in mean methylation per additional gestational week = 0.43%, IQR = [0.32%–0.58%]). The vast majority (5399; 91%) of these differentially methylated CpGs overlapped with those found in the main analyses (no complications model) without exclusion of those born preterm (Fig. 4).

Fig. 4 Overlap between Bonferroni-significant CpG sites from two different analyses after exclusion of maternal and delivery start with induction or caesarean section (“no complication” model). The blue colour represents the continuous gestational age main model, and the green represents the continuous model restricted to term only. Overlap of findings alters the colour Full size image

Selection of CpGs for downstream analyses

Given the large number of significant associations in our main model (8899 CpGs), we focused subsequent analyses on loci including at least three adjacent CpGs that survived Bonferroni correction [43]. There were 1276 differentially methylated CpGs in 325 unique genes that fulfilled this criterion (Additional file 1: Table S5). As in the overall data, we observed a slight predominance of negative (n = 702; 55%) versus positive (n = 574; 45%) directions of effect (Fig. 2a). The lowest P value, P = 1.2 × 10− 93, was observed for cg04276536 (CCDC102A, chromosome 16). As for the full EWAS results, the largest negative and positive association effect sizes were observed for cg04347477 (NCOR2) and cg13036381 (LOC401097), respectively. These 1276 CpGs had the same CpG localization enrichment pattern as the full set of Bonferroni-significant CpGs (n = 8899), except that there was a relative depletion in CpG island shelves (7.6% versus 10% overall, P enrichment = 2.3 × 10− 12) and open sea (32% versus 37%, P enrichment = 2.4 × 10− 12) (Fig. 3).

Differentially methylated region (DMR) analyses

Using two different methods for DMR analysis of gestational age in relation to newborn DNA methylation, we identified 4479 significant (Šidák-corrected P < 0.01) DMRs from the comb-p method and 14,671 significant (FDR P < 0.01) DMRs from DMRcate, respectively, including 2375 DMRs (representing 11,861 CpGs) that were significant based on both approaches (Additional file 1: Table S6). Out of the 8899 Bonferroni significant single CpGs, 2289 CpGs overlapped with CpGs in identified in the combined DMR analyses (11,861 CpGs). Moreover, from loci included by the three or more adjacent CpG selection (n = 1276), 521 CpGs overlapped with those identified in the combined DMR analyses. Of note, out of the 1276 CpGs, 1223 and 1231 CpGs were captured by DMRs identified using the comb-p and DMRcate independent approaches, respectively.

Assessment of CpG methylation in earlier embryonic stages

We examined whether the CpGs detected in cord blood (that originate from embryonic germ layer mesoderm) were differentially methylated in relation to gestational age in other fetal tissues, lung and brain that originate from the two other embryonic germ layers, ectoderm and endoderm, respectively, collected prenatally [47, 48]. To this end, we performed look-up analyses in DNA methylation data for 74 fetal lung samples representing gestational age 59 to 122 days (~ 8 to 17 completed gestational weeks) [47]. Out of the 1276 CpGs, selected based on three or more adjacent CpGs from our no complications model, 1030 CpGs were available in the fetal lung dataset. We observed associations at Bonferroni look-up level correction significance (0.05/1030; P < 4.85 × 10− 5) between DNA methylation levels in fetal lung tissue and gestational age at tissue collection for 151 (15%) CpGs (Additional file 1: Table S7). Of these 151 (58 negatively and 93 positively associated), 78 showed the same direction of association with gestational age in cord blood and fetal lung tissue. The look-up analyses of fetal brain tissue were undertaken in 179 samples representing 23 to 184 days (~ 3 to 26 completed weeks) [48]. Out of the 1276 CpGs, we found significant associations (using Bonferroni correction P < 1.06 × 10− 7 cut-off since only this data was available for analyses; Additional file 1: Table S8) for 268 CpGs (21%) in relation to gestational age at tissue collection. Of these 268 sites, 227 had same direction of effect in the cord blood and fetal brain data. We found enrichment more than expected by chance for our cord blood gestational age associated CpGs (n = 1276) in fetal lung (P = 2.1 × 10− 4) and brain (P = 3.9 × 10− 57) tissue. Thirty CpGs showed significant associations with gestational age in all three tissues (cord blood, fetal lung and fetal brain).

Assessment of CpG methylation in older children

We examined whether the differentially methylated CpGs detected in cord blood samples were associated with gestational age at birth in whole blood from older children. We conducted three separate meta-analyses (no complications model) reflecting different age periods in a total of 2481 children: (i) Early childhood (4–5 years; n = 453 from 4 cohorts); (ii) school age (7–9 years; n = 899 from 5 cohorts) and (iii) adolescence (16–18 years; n = 1129 from 5 cohorts), Additional file 1: Table S1. Of the 1276 three or more adjacent genome-wide significant CpGs from our analyses in cord blood, 1258 CpGs were available for analyses in all older age groups. Out of these CpGs, we observed 40 sites in early childhood, 60 sites in school age, and 60 sites in adolescence to be associated with gestational age at the nominal significance level, P < 0.05 with the same direction of effect (Additional file 1: Table S9). However, no CpG survived Bonferroni look-up level correction (0.05/1258; P < 3.97 × 10− 5). One CpG (cg26385222 annotated to TMEM176B) previously associated with gestational age at birth [27] was nominally significant in all age groups with same direction of effect.

Longitudinal analysis

The results of the longitudinal analyses of blood DNA methylation in the INMA Study (n = 177 with paired samples from birth and 4 years) and the ALSPAC Study (n = 281 with samples collected at birth, 7 and 17 years) are provided in Additional file 1: Table S10. The vast majority of gestational age associated CpGs (n = 1054/1276; 83%) underwent changes in methylation levels with age. Both increasing and decreasing patterns of change during early childhood (4 years) were observed, followed by stabilization during school age (7 years). For example, for cg08943494 in PRR5L on chr 11, an initial level of 61.5% and 51.4% in cord blood DNA methylation in INMA and ALSPAC respectively, decreased by 8.2% per year on average during early childhood in INMA and by 3.3% per year on average up to school age in ALSPAC, but then negligible further changes were seen from 7 to 17 years (Fig. 5A). In contrast, increasing levels were seen for cg18183624 (chr 17; IGF2BP1), from an initial 48.8% and 38.7% in cord blood DNA methylation in INMA and ALSPAC, respectively, with a 5.1% per year on average between birth to 4 years in INMA and 1.9% per year on average between birth to 7 years, but after that no changes from 7 to 17 years. (Fig. 5B).

Fig. 5 Change in DNA methylation during childhood and adolescence for selected CpG sites associated with gestational age. A Decreasing methylation levels from birth to childhood (A.1) and stabilization during adolescence (A.2). B Increasing methylation levels from birth to childhood and stabilization during adolescence. C Stable CpGs that did not change during childhood or adolescence; (1) INMA from birth to early childhood and (2) ALSPAC from birth to adolescence. The figures show representative single CpGs for each category (A–C) Full size image

Of the 1054 CpGs displaying changes in DNA methylation levels with age, there were 589 CpGs where gestational age was associated with changes in DNA methylation levels (i.e. where an interaction between gestational age and age was found) from birth to 4 years (INMA) and 460 CpGs with changes from birth to 7 years (ALSPAC). However, only 30 of the 1054 CpGs changed significantly in DNA methylation between 7 and 17 years (ALSPAC), suggesting that gestational age-related changes in DNA methylation levels had largely stabilized by age 7.

We identified 222 stable CpGs out of 1276 (17%) that did not change appreciably from birth to adolescence. As an example, the stable DNA methylation at cg27058497 (RUNX3, chromosome 1) is shown in Fig. 5C. A much lower proportion of the gestational age associated CpGs were stable from birth to adolescence compared to all CpGs on the array (17% versus 71%, P enrichment = 2.23× 10− 308).

Enrichment for biological processes and pathways

Using the complete list of 8899 CpGs annotated to 4966 genes, these were enriched for 1784 GO terms including regulation of cellular and biological processes, system development, different signaling pathways and organ development (Additional file 1: Table S11). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses revealed 124 significant terms at FDR < 0.05 representing a variety of human diseases, most notably various cancers, viral infections, metabolic processes and immune-related disorders (Additional file 1: Table S12). The 325 genes annotated to the 1276 CpGs, selected by virtue of three or more CpGs being localized to the same gene, were enriched for 198 Gene Ontology (GO) terms very similar to those identified using Bonferroni significant CpGs (Additional file 1: Table S13). When restricting analyses to the 222 longitudinally stable CpGs, corresponding to 139 genes, 13 significant KEGG terms were revealed, primarily representing infection- and immune-related disorders (Additional file 1: Table S14). For 186 genes annotated to the 1054 CpGs changing with postnatal age, only one KEGG terms were identified as statistically significant (P = 1.2 × 10− 3 for the term MAPK signaling pathways; Additional file 1: Table S14).

Correlation of DNA methylation and gene expression

For the 1276 CpGs differentially methylated in relation to gestational age with at least 3 adjacent CpGs, we assessed correlations between DNA methylation and gene expression (cis-eQTMs). From a publicly available dataset of expression and DNA methylation measured in 38 cord blood samples [51,52,53], 1174 out of the 1276 CpGs were located within a 500-kb (+/− 250 kb) window of a transcript cluster. Of these 1174, 246 unique CpGs (367 total CpG-transcript associations) correlated significantly with gene expression (Bonferroni P < 0.05, Additional file 1: Table S15). Forty-six percent of these DNA methylation-expression correlations were negative, with the lowest P = 3.55 × 10− 6 coeff = − 6.03 for cg01332054 and SEMA7A expression and the largest negative effect estimate (− 12.69) for cg26179948 and JAZF1 expression (Additional file 3: Figure S3 A, B). Fifty-four percent were positive, with the lowest P = 1.04 × 10− 5 coeff = 2.88 for cg20139800 and MOG expression and the largest positive effect estimate (19.35) for cg03665259 and CDSN expression (Additional file 3: Figure S3 C, D).