All procedures were approved by the joint Gambian Government/MRC Ethics Committee and written-informed consent was obtained from all participants or their guardians.

Study setting and population

This was an observational prospective cohort study, conducted between July 2009 and July 2011 in 34 villages across the rural West Kiang district of The Gambia, within the catchment area of the MRC International Nutrition Group’s field station at MRC Keneba ( http://www.ing.mrc.ac.uk). This study was registered at http://www.clinicaltrials.gov, reference number NCT01811641, as a proof-of-principle observational study.

‘Indicator’ group women: Thirty non-pregnant, non-lactating women from three villages were followed monthly for one full calendar year for the assessment of dietary intake (48 h-weighed records) of nutrients involved in methyl-donor pathways and their effect on respective metabolic plasma biomarker concentrations by season in a parallel study (for full details see ref. 13).

‘Main’ group mothers: All women of reproductive age (18–45 years) registered in the MRC ING’s Demographic Surveillance System for West Kiang in The Gambia (DSS, http://www.ing.mrc.ac.uk/research_areas/west_kiang_dss.aspx) were invited to participate; 2040 women consented. Exclusion criteria included confirmed pregnancy at time of recruitment, menopause or likely migration (short or long term) away from West Kiang. Each month all 2040 women were assessed at the village health post for weight (Tanita DH305 scales (Tanita Corporation, Japan) and height measurement (Leicester stadiometer, Seca 214, UK)) and answered a short questionnaire on the date of their last menstrual period. On the first report of a missed menses, a 10 ml fasting venous blood sample was collected for the purpose of plasma biomarker assessment. Upon reporting of a second consecutive missed period the following month, a urine sample was collected for pregnancy testing (this system was set up to avoid early disclosure of pregnancy to which some women objected). If the test was negative the woman continued to be visited monthly, and her blood sample from the previous month was discarded. If the test was positive, the woman was invited to the MRC Keneba field station for confirmation and dating of pregnancy by ultrasound examination and a full antenatal check. Women who conceived during the peak of the rainy (July–September 2009) or dry (February–April 2010) season and with a maternal blood sample collected within the first 16 weeks from conception were then fully enroled. Multiple pregnancies were excluded. The total number of women who conceived during the a priori selected months and having a blood sample collected during the first 16 weeks of pregnancy from conception was 166, recruited across 24 villages. By study design, conceptions were randomly allocated to the different seasons and therefore village was not considered as covariate in the analyses.

‘Main’ group infants: Between 2–8 month (3.6±0.9) (mean, s.d.) after delivery infant samples of venous blood (3 ml) and HFs were collected by a trained nurse for the purpose of DNA extraction. A total of 126 PBL and 87 HF samples were obtained. Fewer HF samples were collected owing to some mothers objecting as a number of children had too little hair for sampling leading to insufficient DNA harvested from HF.

Summary statistics of the study population are shown in Supplementary Table 1.

Maternal blood methyl-donor and co-factor concentrations (biomarkers)

The first day of the last menses is estimated to be 14 days before fertilization. Conception date was thus calculated by adding 14 days to the estimated date of onset of a woman’s last menses, based on the gestational age determined by ultrasound at the time of the first antenatal check.

Plasma biomarker measurements included FOL, B2 (by functional test, see below), B6, B12, ACTB12 (holotranscobalamin, the biologically ACTB12), CHOL, BET and MET, as well as HCY, SAM, SAH and DMG. Maternal blood biomarker assessment was carried out using the same methodologies as for the indicator group women, as described previously13. Briefly, maternal blood samples (10 ml in EDTA tubes) were collected in the field and transported on ice to the MRC Keneba laboratory for processing and freezing within a maximum of 2 h, to avoid decay of any of the biomarkers (for example, SAM to SAH conversion). Blood samples were spun at 2,750 g for 10 min, the plasma taken off and frozen at −80 °C immediately. A sample of red blood cells was removed from the plasma-depleted blood fraction, washed and stored at −80 °C. All plasma biomarkers except B2 were assessed at the Department of Paediatrics, University of British Columbia, Canada. SAM, SAH, CHOL, BET, DMG, HCY, MET, CYS and B6 were analyzed by liquid chromatography–tandem mass spectrometry. B12, ACTB12 and FOL were analyzed by a microparticle enzyme intrinsic factor assay and by ion capture assay respectively, on an AxSyM analyzer (Abbot Laboratories, Chicago, IL, USA). B2 status was determined in red blood cell lysate at MRC Human Nutrition Research (HNR), Cambridge, UK, using the erythrocyte glutathione reductase activation coefficient (EGRAC) assay, performed on a microplate. Higher EGRAC values denote B2 deficiency.

DNA methylation

Infant PBL DNA was extracted from venous blood using a standard salting-out method18 and extracted DNA was cleaned using the Chelex-100 (BIO-RAD) protocol. Infant DNA from HFs was extracted by phenol chloroform extraction and ethanol precipitation, as previously described7. DNA methylation analysis was carried out at the Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, Texas, USA. Four previously described MEs7, namely BOLA3, LOC654433, EXD3 and ZFYVE28, were assessed in the current study. (An additional locus, SLITRK1, was eliminated based on a strict cutoff (R2>0.50) in the inter-tissue correlation comparison of DNA methylation in the expanded set of Vietnamese adults.) In addition, three newly identified MEs (RBM46, PARD6G and ZNF678) were investigated. These new MEs were determined as previously described employing a custom methylation-specific amplification microarray14 combined with a multiple-tissue screening procedure with validation by bisulfite pyrosequencing3. CpG site-specific methylation in the current infant DNA samples was measured by quantitative bisulfite pyrosequencing (Pyro Gold reagents and a PSQTM HS 96 pyrosequencer, both from Biotage), as described elsewhere2. Briefly, 0.5–2 μg of genomic DNA was bisulfite treated, followed by locus-specific PCR amplification and pyrosequencing to measure methylation at 4 to 12 CpG sites per candidate locus (Supplementary Table 3). Each pyrosequencing assay covered 50–70 bp of DNA and was initially validated by analyzing 0, 25, 50, 75 and 100% methylated human genomic DNA standards19 (Supplementary Fig. 4).

The MZ twin peripheral blood DNA samples were drawn from the Northern California Twin Registry15; polymorphisms were genotyped to determine zygosity. Post-mortem liver, kidney and brain tissues from Vietnamese motor vehicle accident victims was obtained from a human tissue bank (ILSbio, LLC, Chestertown, MD, USA)7.

Statistical analyses

We set out to test: (i) whether methylation of the seven MEs varied according to the season of conception, (ii) whether plasma biomarkers associated with one-carbon metabolism were predictive of ME methylation; and (iii) whether changes in the availability of methyl donors (as reflected by plasma biomarkers) explain the seasonal difference in ME methylation. The analysis followed seven steps:

1 Back extrapolation of biomarker concentrations: Blood samples for biomarker measurements were collected within 0–16 weeks (mean 8.6±4.0 weeks) post conception. To estimate biomarker concentrations at the time of conception we back-extrapolated along a trajectory parallel to the seasonal patterns (fitted by Fourier regression12) derived from a separate group of non-pregnant women recruited specifically for this purpose (the indicator group described in detail in13). To account for pregnancy-mediated changes in biomarkers the values were further adjusted for the gestational age of the infant at the time of measurement. This was achieved by regressing the biomarker on the first three orthogonal polynomials of gestational age and subtracting the predicted value so obtained from each woman’s seasonally adjusted value. Comparison of seasonal patterns in the plasma biomarker concentrations between the indicator and the main groups are shown in Fig. 1b and Supplementary Fig. 1. 2 Confirmation of the seasonal variation in biomarkers using analysis of variance: These data are shown in Supplementary Table 2. 3 Testing the validity of a single methylation score averaged overall six MEs (PARD6G excluded): This analysis was carried out using PBL DNA methylation data only, given the smaller HF data set. A simple score for the methylation percentage at each ME was derived for each infant by taking the logit of the mean methylation percentage overall CpG sites within the ME. We then fitted these simultaneously to each biomarker and covariated one at a time using seeming unrelated regression (SUR20). In each case we compared two models; one in which a different coefficient was fitted for each ME, and the other in which all the coefficients were constrained to be the same. In every case, except for offspring sex (SUR P=0.004), the unconstrained model fitted no better than the constrained one (that is, the biomarkers and other covariates did not have a differential effect on different MEs). Supplementary Table 4 shows the difference in individual ME methylation in males and females with generally lower methylation in males. These results justified the use of an overall methylation score for each infant, based on the sex-adjusted individual ME methylation scores, for subsequent analysis. The overall score was calculated by standardising each ME score (subtracting the sex-specific mean and dividing by the sex-specific standard deviation) and taking the mean overall six MEs. 4 Regression of MEs on maternal biomarkers and covariates: We used simple linear least squares regression to fit the methylation score to each biomarker, season and other covariates in turn. We also fitted two derived variables predicted a priori to be important in the regulation of methylation: SAM:SAH and BET:DMG ratios. These data are presented in Table 1. Two measures of effect size are presented to facilitate different interpretation: (i) the standardised β-coefficient, which gives the change in mean methylation score for each 1 s.d. change in the predictor and thus facilitates comparison of effect size between predictors and (ii) as the odds ratio, which was derived from the SUR analysis and gives the factor by which the odds of methylation, that is, percent methylation/(100-percent methylation), is expected to change for each unit change in the predictor. Significant maternal predictors of infant DNA methylation in PBL showed a consistent dose-responsiveness by maternal biomarker quartiles (Supplementary Fig. 3). 5 Exploring interactions between biomarkers: Simple models including the main effects for two variables and their interaction term were fitted (using PBL DNA methylation data). The motivation was to capture some of the complexities of one-carbon metabolism, for instance possible switching of the source of methyl groups between the betaine and the FOL-cycle. Since there is a prohibitively large number of biomarker pairs, we only examined a limited number of their interactions selected based on a priori knowledge of their relationships in the metabolic pathways: B2*FOL; B2*B12; B12*FOL; B2*HCY; B12*HCY; and FOL*HYC. Least angle regression, LASSO or other methods might have allowed us to select the best predictors among many correlated main effects and interactions. However, applying the LASSO to these data added little to the reported analysis (data not shown). 6 Estimation of the total association between biomarkers and methylation: Since no interactions between biomarkers or biomarkers and sex of the infant were found to be significant we calculated R2 for the multiple regression of the methylation score on all the main biomarker terms. We then ran the same model on 1,000 bootstrap samples and used the 2.5th and 97.5th centiles of the R2 estimates to derive the 95% confidence interval for R2. Models with terms for season, sex of infant and age of mother yielded very similar partial R2 values for the biomarkers so these terms were not included in the bootstrap analyses. 7 Testing for seasonal differences in biomarkers: To examine whether seasonal differences in methylation might be owing to seasonal difference in biomarkers we fitted a model including season and all biomarker main effects.

All biomarkers were analyzed in the logarithm. The main analysis was performed using Stata v12MEP_L_cop2 (StataCorp, College Station, TX, USA) and the sklearn package in Python was used to implement the LASSO.