Ethics statement

The study was conducted according to the guidelines laid down in the Declaration of Helsinki, and all procedures involving human subjects were approved by the ethical committee of the University Hospital in Heraklion, Crete, Greece. Written informed consent was obtained from all women participating in the study.

The mother-child cohort in Crete, Rhea study

The Rhea project is a mother–child study, prospectively examining a population-based cohort of pregnant women and their children in the prefecture of Heraklion, Crete, Greece [18]. Female residents (native Greeks and immigrants) who had become pregnant during the 12-month period starting in February 2007 were contacted at the four maternity clinics (two public and two private) in Heraklion, and asked to participate in the study. Study enrolment and urinary collection were made at the end of the first trimester, at the time of the first major ultrasound examination (mean ± SD 11.96 ± 1.49 weeks). Questionnaires on health behaviours, pregnancy history, lifestyle characteristics, and dietary habits during pregnancy were administered by trained interviewers at enrolment, during the third trimester, and at delivery.

During this study period 1,317 women were followed up until delivery. Women with incomplete diagnostic information, multiple pregnancies, diagnosed pre-eclampsia (a condition associated with PB), spontaneous or induced abortion, or who gave birth to stillborn infants were not included in the study [18]. Our metabolomics study was designed as a case–control study nested within the Rhea cohort. Mothers giving birth preterm and for whom early pregnancy urine samples were available, were matched with controls (in a ratio of approximately 1:3) based on age (±2 years), country of origin and parity (n = 464). From these urine specimens, proton nuclear magnetic resonance (1H-NMR) spectra were acquired, of which 26 spectra were excluded (because of high dilution or high excretion of drug metabolites), leaving 438 spectra available for modelling the metabolite profile with respect to birth outcome.

Definition of the outcomes

PB, the primary outcome of interest, is defined as premature delivery at less than 37 weeks of gestation [24]. The gestational age was estimated as the period between the most recent menstruation and the delivery. When the date did not match the ultrasound measurement estimation by 7 days or more, the gestational age was corrected using its relationship to the crown–rump length [18]. Of the PBs, some were classified as spontaneous deliveries (SPBs; n = 88) when the birth was vaginal or when the labour was not documented as having been induced. Any PBs requiring either an induction of labour or pre-labour caesarean, or both, were defined as medically induced deliveries (IPB; n = 26) [25]. In addition, neonates were classified as FGR in weight if their birth weight fell below the 10th percentile of their predicted birth-weight distribution, adjusted for genetic growth potential. This customised estimation of growth impairment allows for better detection of those neonates who fail to reach their genetic growth potential or their constitutional potential because of maternal, fetal, placental or external factors, and excludes constitutionally small babies [26].

A multivariable fractional polynomial linear regression model was used to predict birth weight, allowing polynomial terms for continuous variables in the linear regression models. The final model included as covariates the gestational age, infant gender, maternal and paternal height, pre-pregnancy maternal weight, and interaction of gestational age with maternal weight. Gestational age and type of PB were known for 438 women, whereas FGR data were available for only 401 women because a number values necessary to define the outcome were missing.

Metabolic syndrome variables

Data on plasma triglycerides, total cholesterol, high density lipoprotein cholesterol (HDL-C) and low density lipoprotein cholesterol (LDL-C) of 227 fasting pregnant women at the first prenatal visit were available [18]. The insulin concentrations were measured for 369 women, and the diastolic and systolic blood pressures (BPs) were available for 338 participants. The body mass index (BMI) calculated on reported weight before pregnancy and height, measured at the first prenatal visit, was used to classify women as underweight (BMI <18.5 kg/m), normal weight (BMI >18.5 to <25 kg/m), overweight (BMI 25 to 30 kg/m) or obese (BMI >30 kg/m), according to the standard international classification.

1H NMR spectroscopic analysis of urine

Sample handling and preparation

Urine samples were stored at −80 °C until analysis. An aliquot of 400 μL of urine was added to 200 μL phosphate buffer solution (0.2 M Na 2 HPO 4 /NaH 2 PO 4 , pH 7.4) to minimise variations in chemical shift values in the acquired 1H NMR spectra due to minor pH differences. This buffer contained 1 mM sodium 3-trimethylsilyl-(2H 4 )-1-propionate (TSP) in 20% D 2 O and 3 mM of the bacteriostatic agent sodium azide (NaN 3 ). TSP is a chemical shift reference (δ = 0.00) and D 2 O provided a field-frequency lock. The buffered urine sample was then centrifuged at 16,000 × g for 5 minutes to remove any debris, and 550 μL of the resulting supernatant was pipetted into standard 5 mm NMR tubes [27].

1H NMR experiments and data processing

1H NMR spectra of the urine samples were acquired using a Bruker Avance 600 spectrometer (Bruker Biospin, Rheinstetten, Germany) operating at 600.13 MHz. The 1H NMR spectra of the urine samples were acquired using a standard one-dimensional pulse sequence with water pre-saturation (recycle delay-90°-t 1 -90°-t m -90°-acquisition; XWIN--NMR 3.5) during both the recycle delay (2 seconds) and mixing time (t m , 100 milliseconds). The 90° pulse length was adjusted to approximately 10 μs and t 1 was set to 3 microseconds. For each sample, 128 free induction decays (FIDs) were collected into 32 K data points using a spectral width of 12,000 Hz. The FIDs were multiplied by an exponential weighting function corresponding to a line broadening of 0.3 Hz prior to Fourier transformation [27].

All NMR spectra (spectral region δ 10 to 0.5) were imported into MATLAB 7.3.1 (MathWorks), and were referenced and corrected for phase and baseline distortion using an in-house script (developed by Drs Rachel Cavill, Hector Keun and Tim Ebbels, Imperial College, London, UK). The spectral region δ 4.0 to 5.4, containing residual water and urea resonances, were removed prior to median fold change normalisation [28]. Integrals of well-resolved peaks were calculated. Certain metabolites were quantified using the Profiler and Library Manager modules in Chenomx NMRSuite 5.11 (Chenomx Inc, Edmonton, Canada), when overlapping signals were present in the integration window or when there were metabolites with a low signal-to-noise ratio (specifically creatine, creatinine, tyrosine, dimethylamine (DMA) and 1-methylnicotinamide). The advantage in using Chenomx for these metabolites is that it accounts for quantification error by fitting experimental spectra of pure compounds to all the resonant peaks for the metabolite [29]. The statistical analysis presented later was applied to the peak integrals for all the metabolites, except for the metabolites cited above, for which Chenomx values were used.

1H NMR spectroscopic signals were assigned to metabolites after reference to the literature [30, 31] or online databases (HMDB) [32], and/or confirmation by 2D NMR experiments on a selected sample, including homonuclear 1H-1H correlation spectroscopy and 1H-1H total correlation spectroscopy.

Statistical analysis

All statistical analyses were performed using R project software [33]. Continuous distributed variables were displayed as median with interquartile range and were tested using Mann–Whitney non-parametric statistical tests. Categorical variables were tested using the χ2 test. The threshold statistical significance was set at a P < 0.05 and conducted with a two-sided alternative hypothesis.

Statistical analyses were conducted on 34 metabolites to assess their variation in relation to birth outcomes (for example, PB, IPB, SPB and FGR) and to maternal parameters (biochemical measures and dietary intake). A five-step analysis was conducted to select metabolites that were significantly associated with birth outcomes and associated with metabolic syndrome. To identify metabolites associated with birth outcomes, a non-parametric test (Mann–Whitney U–test), was used, because of the non-normal distribution of the metabolite relative concentrations. The effect of multiple testing was considered by calculating the false discovery rate (FDR; that is, the expected proportion of the tests misclassified as significant for any given P value cut-off) [34]. To test for a dose–response association between metabolite levels and birth outcomes, a trend test (χ2 test) for trend in proportions, was used to assess the frequency distribution of women with pregnancy outcomes according to the quartiles of the metabolites [35]. For the metabolites identified as ‘of interest’ by the above analyses, their association with birth outcomes was tested after adjusting for confounding factors using multivariate logistic regression models. Interquartile range odds ratios (IORs) with 95% confidence intervals (CIs) were calculated for PB, IPB, SPB and FGR by using interquartile range for standardisation. We used the change from the outer quartiles as a measure, because metabolite integrals/predictors are not always normally distributed. Using the difference in the outer quartiles as a measure (0.25 and 0.75 quantiles), the OR is called the interquartile range or half-sample OR. Potential confounders with an established or potential association with PB or FGR were included in the logistic regression models. Receiver operator characteristic (ROC) curves and 95% CIs based on candidate metabolites (significant in logistic regression) were calculated for cases versus healthy controls using the package pROC in R [36].

In order to assess whether the metabolite panel associated with birth outcomes is also associated with known metabolic syndrome traits (BMI, BP, blood glucose, insulin, lipids), Spearman’s correlation coefficients were calculated. Metabolites with significant association with birth outcomes in logistic regression models and significant correlation coefficients with metabolic syndrome traits were selected for the final analysis. A stratified analysis by maternal BMI before pregnancy and maternal insulin levels at the first prenatal visit, was performed using multivariate logistic regression models on log-transformed metabolite levels, correcting for potential confounders (as described above).