Considering that performing a moderator analysis is generally not advisable with less than 10 studies, 53 we performed a moderator analysis for the whole sample of studies, rather than separately for the affective domain. The following moderators were initially included in single-predictor models: publication year, child age at assessment, outcome domain and diet category. Instrument category was originally considered, but only questionnaires were used in the studies within the affective domain and this was therefore deemed unnecessary. Child age at assessment explained none of the heterogeneity and were excluded from further analysis. Separately, outcome domain, publication year and diet category accounted for some of the heterogeneity present, and when included together in a moderator analysis they explained approximately 30% of the heterogeneity (p=0.0471). However, there was still a significant degree of heterogeneity present (p<0.0001), indicating that other moderators not considered in the model were influencing the outcome effect sizes.

The imputed effect sizes (open circles) are all smaller than the summary effect size, and the trim and fill analyses suggest that the adjusted overall summary effect size would be 0.075 (cf, table 3 ). It is still significant (p<0.0001), but smaller than the originally calculated summary effect size (g=0.112). Additionally, even with the imputed effect sizes, there are still significant levels of heterogeneity present, which indicate that other factors than publication bias are contributing to the observed heterogeneity. Table 3 also shows that only the studies in the affective domain appear to be afflicted by publication bias, as the summary effect size for the cognitive domain remains unchanged with the trim and fill analysis. The results from regression-based adjustment for publication bias are consistent with the trim and fill analysis in that they show a similar overall effect size, a clear association in the cognitive domain and a noticeably weaker association in the affective domain.

If no publication bias was present, approximately 95% of the points for the original effect sizes should be located within the white funnel area 25 and should be roughly distributed evenly to the left and to the right of the vertical line illustrating the overall summary effect size. Because this is not the case, a trim and fill analysis was performed and the results are displayed in figure 4 . §

As can be seen from the Q and I 2 statistics in table 3 , there is a significant degree of heterogeneity present for the overall summary effect size, indicating a systematic difference in effect sizes between the studies. As possible sources of this heterogeneity, we investigated publication bias and performed a moderator analysis on all included studies.

Three separate REMs were fit: one across all studies to yield an overall summary effect size, one for the cognitive domain and one for the affective domain. Table 3 provides a summary of the three REMs, including test results for heterogeneity (Q-statistics and I 2 -statistics). For the original REM the effect sizes are typically larger for the cognitive domain (g=0.14), compared with the affective domain (g=0.093), while the summary effect size across both domains is g=0.112.

For the two REMs investigating for publication bias (trim and fill and with SE as moderator) the adjusted SEs based on the above formulae was used. For the original REM we used the reported effect sizes and corresponding adjusted variance as a basis for the calculations and obtained robust SEs 56 using Metafor’s ‘robust’ function. We chose this robust estimator function as it is appropriate to use for models with unspecified heteroscedasticity, 23 which is the case with all studies reporting multiple effect sizes that are included in this meta-analysis.

We used a weighting scheme and calculated robust SEs to account for these sources of dependencies. 56 Weights were adjusted for studies that contribute multiple effect sizes by recalculating them such that the sum of the weights of all effect sizes from a study reflect the sample size of that study. When using the Metafor package which calculates weights from effect size variances or SEs, this can be achieved by calculating effect size variances with adjusted N. In particular, we adjusted N for study i such that: Here, k is the number of effect sizes from a study sample (eg, ALSPAC) and N j are the sample sizes for the different effect sizes. When estimating average effect sizes for specific domains, this approach corrects for multiple effect sizes for one domain coming from one study sample. When estimating the overall effect size this approach corrects for multiple contributions of effect sizes from one or more domains from one study sample. Hence, one study could be allocated different weights, depending on the meta-analytic model. Optimally, the calculation of overall effect sizes would also account for the covariance between effects in different domains; however, the reviewed articles did not provide this information. The employed weighting scheme implies an assumed correlation of ρ=0.5.

There are several sources of effect size dependencies within our sample of studies. First, some of the included studies use subpopulations of the same cohort sample, while using different outcome measures at different time points: Four of the studies 30 31 34 39 were based on subsamples of the ALSPAC cohort, and two studies 37 47 were based on subsamples from the Project Viva cohort. Second, some of the studies included in this meta-analysis 33–35 40 report multiple effect sizes.

The 18 studies included in the final meta-analysis comprised a total of 63 861 ‡ participants and 26 separate effect sizes divided into four different cognitive or affective dimensions. These effect sizes with corresponding CIs are depicted in the forest plot in figure 2 . § The size of each square reflects the precision of the effect size estimate by means of the weight that is assigned to each respective study when the summary effect size is computed. A larger square equals larger weight assigned to that study.

Application of these rules resulted in a total of 26 separate effect sizes. Only the fully adjusted effect sizes from each study were chosen. We initially considered including the corresponding unadjusted effect sizes for each study, however only four studies provided this information, 33 35 41 42 and the studies that reported minimally adjusted results adjusted for different variables.

For reported effect sizes where corresponding outcome dimensions were unclear (eg, based on inadequate reporting of instrument properties) and not resolved with discussion among the reviewers, these effect sizes were excluded. This occurred for three separate instruments, each from different studies. 43 44 46

For studies that report more than one effect size relevant for inclusion in one domain based on the same type of instrument, the average weighted effect size across those reported were included after Hedges g had been calculated.

If effect sizes were reported for all types of fish as well as oily fish, only all types of fish were included in the final analysis, as to make the exposure definition as homogenous as possible.

For fish/seafood intake, if more than two intake groups had been defined, we only included the group which best corresponded to what is considered a healthy diet by the national health authorities, which is 2–3 servings per week for total fish intake, where about half should be fatty fish. 11

After Hedges’ g had been calculated, the effect sizes were further summarised as many studies reported several effect sizes for the same exposure–outcome combination. Preferably, only one effect size per study should be retained; 22 however, this was considered inapplicable, due to the large variations in neurodevelopmental outcomes assessed in the different studies. After careful consideration, four outcome dimensions (externalising, internalising, socioemotional, cognitive) covering the affective and general cognitive domains were chosen. Selection of the outcome measures into each respective domain was based on (1) a thorough review of the properties of each instrument with regard to what area of development the instrument is aimed at measuring based on the manual for each instrument and (2) research indicating that language, cognition and executive functions are more strongly correlated with each other than with affective functioning. 54 55

Some studies did not report all the required information to calculate Hedges’ g in the compute.es package. For the studies reporting OR, where group sample sizes were not reported, the method for converting OR to Cohens’ d proposed by Chinn 49 was used. For mean difference in standardised test score, Hedges’ g was calculated using SD and mean difference. For regression coefficient (β), Hedges’ g was calculated using p-value, SE and z-statistics. For Cohens’ d, Hedges’ g was calculated using the formula proposed by Lakens. 50 For studies lacking p value, CIs, SDs or SEs, the required inferential statistics were calculated in advance of the final effect size estimation by using appropriate formulas. 51–53

None of the included studies reported Hedges’ g as their effect size. The compute.es package 48 was used to calculate Hedges’ g for the studies reporting the following: (1) For OR, Hedges’ g was calculated using the ‘lores’ function (based on log of OR and its corresponding variance), (2) for p values (with information of group sample sizes), Hedges’ g was calculated using the ‘pes’ function and (3) for correlation coefficient (r), Hedges’ g was calculated using the ‘res’ function.

Each study was evaluated with the NOS checklist; please see online supplementary table 3 for individual study scoring information. Of the 18 included studies, nine were rated as of ‘fair’ quality and nine as of ‘poor’ quality. No study received the rating ‘good’ because none of the studies that used high-quality measurements also adequately dealt with self-selection into studies and selective dropout of participants. All ‘poor’ ratings were due to insufficiencies in the ‘Outcome’ dimension of the NOS in studies that measured outcomes through self-reports (independent blind assessments or record linkage is preferred by the NOS) and that additionally did not account for selective dropout between exposure and outcome assessments.

Overall, the studies controlled for a number of different factors are depicted in online supplementary table 2 . Studies varied greatly in which confounders they included in the analysis, with SES being the only confounder considered by all studies.

Table 2 summarises the wide range of different neuropsychological instruments that were used across the studies to assess cognitive and behavioural functions. A total of 18 original instruments were used in addition to one self-developed instrument, comprising both questionnaires and neuropsychological tests.

Eleven studies used maternal fish intake as exposure, 31–34 36–38 41 42 46 47 where the studies categorised fish intake into groups based on meals/portions or grams eaten per day or week. The remaining three studies used Ω-6/Ω-3 fatty acid ratio, 43 saturated fat intake 45 and fruit intake 44 as their exposure variable. All studies were published in the period from 2004 to 2016, with study populations ranging from 48 to 23 020 mother-child pairs ( table 1 ).

Four studies used maternal dietary patterns as exposure variables, 30 35 39 40 either defined by the use of principal component analysis or confirmatory factor analysis. In three studies, 30 35 40 two distinct dietary patterns were identified: one ‘healthy’ and one ‘unhealthy’, while only an unhealthy dietary pattern was defined in the fourth. 39 The healthy dietary patterns were generally characterised by higher intakes of vegetables, fish, legumes, wholegrains and vegetable oils, while the unhealthy dietary patterns consisted of higher intakes of processed foods (fried foods, French fries, meats) confectionary foods (cakes, candy, sugary drinks), refined cereals and salty snacks.

All studies collected information on maternal dietary intake with the use of a food frequency questionnaire (FFQ), either self-administered or by a trained interviewer, with some using validated FFQs; and/or a food-diary. Data obtained with these instruments were used as the basis for the definition of dietary patterns, estimation of fish/seafood intake, fruit intake, saturated fat intake and estimation of Ω-6/Ω-3 fatty acid ratio based on intake.

All included studies were observational in nature and based on a prospective cohort design or case-control design, with baseline measures of maternal dietary intake during pregnancy and subsequent measurement of child cognitive or affective functioning, at one or more time points.

Discussion

The aim of this meta-analysis was to systematically review and summarise the currently existing literature about the association between maternal diet quality and different child neurodevelopmental outcomes. When dietary exposures believed to be appropriate proxies for maternal diet quality were included, a total of 18 studies comprising 63 861 participants were found relevant for inclusion in this meta-analysis.

The meta-analysis showed that a better maternal diet quality had a small, statistically significant association with child neurodevelopment. The summary effect size for the cognitive domain was larger than the overall summary effect size, with no significant presence of heterogeneity. This positive association with cognitive outcomes is in line with findings from a recent narrative review investigating the association between maternal fish intake and child cognitive outcomes.57 The important contribution of our quantitative meta-analysis is the calculation of average effect sizes, which shows, also after correcting for publication bias, a small but robust association. The summary effect size for the affective domain was smaller than the overall summary effect size, with a large and significant degree of heterogeneity present. Considering that an overall summary effect size is most appropriate to use for studies with little heterogeneity,22 the summary effect size should be interpreted with caution. If we look at the effect sizes for all four outcome dimensions (cf, figure 2), we find that maternal diet quality is associated with all neurodevelopmental dimensions except for the internalising dimension, with the strongest associations seen for socioemotional and general cognitive functioning. However, these effect sizes are still considered small according to Cohens’ interpretative guidelines.58

In the moderator analysis, outcome domain, publication year and diet category (type of dietary classification—dietary pattern or its proxies (fish intake, fruit intake, saturated fat intake or Ω-6/Ω-3 fatty acid ratio)) contributed significantly to the heterogeneity present in the total sample of studies, explaining 30% of the heterogeneity. However, a large degree of heterogeneity remained. As only the fully adjusted effect sizes from each study were included in the meta-analysis, unmeasured or unreported variables may have contributed to the remaining heterogeneity. Furthermore, these results might indicate that maternal diet might be of more importance for certain neurodevelopmental outcomes. However, we emphasise that this moderator analysis is only exploratory and the results should be seen as preliminary, given the small number of eligible studies included in the meta-analysis and the possible number of potential moderators.

The majority of the studies included in this meta-analysis using proxies of a maternal dietary pattern during pregnancy had fish or seafood intake as their exposure measure. Although fish intake most likely is a good marker for diet quality, there are limitations involved. One major limitation is that the studies investigating fish intake varied greatly with regard to intake group definitions, with division into two, three or four groups, where most groups were compared with a reference group (generally those who never or rarely consumed fish), or included as a continuous variable in a linear regression model. Some studies also compared extreme groups (lowest vs highest quintile), which were the studies reporting the largest effect sizes. Due to this varying dietary exposure definition, it is likely that the amount of heterogeneity the diet category accounts for is underestimated in the moderator analysis. Ideally, we could have used a more elaborate classification of categories, to reflect the actual diversity of the exposure measures, but this was not appropriate considering the small number of studies included in this meta-analysis.59

As seen from the funnel plot in figure 3, there is a clear negative correlation between effect sizes and SE, indicating that the larger the sample size, the smaller the association between maternal diet quality and the outcome measure. This is not surprising, considering that the effect size of a study with a small sample needs to be large to reach significance in comparison to studies with a large sample size where only very small effect sizes are required to reach statistical significance. However, even if the observed pattern has a statistical explanation, a clear visual indication of publication bias remains. Accordingly, analyses that corrected for publication bias through a trim and fill procedure and meta-regression resulted in overall effect size estimates that were around 30% lower compared with the effect size estimates from the original REM.