We have presented evidence that healthy, adequately nourished, well-educated pregnant women, recruited from five diverse geographical and cultural study sites, who receive recommended antenatal care, have children that display consistent similarities at 2 years of age across a comprehensive set of neurodevelopmental outcomes. The evidence complements previous reports from the INTERGROWTH-21st Project demonstrating equivalent similarities across these study sites for skeletal growth from the first trimester of pregnancy to 2 years of age9.

In 14 of the 16 domains evaluated, the percentage of variance explained by between-site differences ranged from 1.3% (cognitive score) to 9.2% (behaviour score) of the total variance. Of the 80 comparisons using SSDs, only six were >±0.50 units of the pooled SD for the corresponding item, two of them just marginally outside that limit and without any specific pattern. The percentage of variance explained by site differences in the emotional reactivity and negative behaviour domains (14%) could be because the scoring of these items is more culturally dependent, i.e. negative behaviour could be perceived and hence scored differently across settings20.

It is evident that across developmental and growth parameters, only a very small percentage (around 10%) of the total variance in these fundamental human functions can be explained by differences among these populations (Fig. 3). The present results and previous publications, presented together in Fig. 3, support the position that most of the observed differences in growth and neurodevelopment across general populations or countries are primarily due to socioeconomic, educational and class disparities, i.e. postal codes define the health profiles of humans better than their genetic code21.

Fig. 3 Variance components analysis of 16 neurodevelopmental domains evaluated in the present study (upper part) and variance components analysis of 7 measures of fetal, newborn, infant and child growth (lower part). Red bars are the % of total variance explained by between sites variability for each domain or growth measure. Data for the seven growth measures: INTERGROWTH-21st Project (eight study sites)2,3,5,9; WHO Multicentre Growth Reference Study (five study sites)8; Habicht et al.41 Full size image

Our study has some unique features. It is based on a set of multi-site, prospective cohorts that consisted of healthy, adequately nourished, well-educated mothers and their babies, enrolled to test a specific hypothesis, that were followed from early pregnancy to 2 years of age. Anthropometric, visual and gross motor development scores agreed well within recommended limits for normal populations. Detailed information was obtained about the socioeconomic, education and environmental backgrounds of the families selected at both population and individual levels3. All clinical, anthropometric measures, feeding practices and monitoring procedures were rigorously standardised across time and sites. The samples described are not representative of whole countries, regions or cities, nor were they intended to be. Instead, they were specifically selected—reflecting our conceptual approach to the issue—to include geographical areas populated by low-risk pregnant women and their children within each country or city. Hence, the samples are intended to represent a theoretical, healthy, low-risk population—rather than one that includes both low- and high-risk populations—within a country or city.

We initially considered other short assessment tools during the preparation of the study14. While some employ a global approach, our study design required a tool based on a multi-country sample9 that could separately evaluate a range of domains. Moreover, we were interested in combining psychometric methodologies, i.e. the direct administration of tasks, concurrent observation of child performance and caregiver reports, to balance the risk of recall and reporter bias13. Our efforts at scrutinising the pre-existing literature showed that such a tool was not available14. Hence, we developed, standardised and validated a new, domain-focused, culturally neutral, and simple-to-implement assessment tool13,14. For maternally reported items on attentional problems and emotional reactivity, local language versions from validated Child Behaviour Checklist translations were produced22.

We a priori selected a set of seven primary domains and within them, the cognitive domain of the INTER-NDA was considered the primary outcome because its constituent items are directly administered to the child in task-based sequences; it is less affected by cultural factors, and it is scored objectively so it is not affected by recall or parental report bias. The cognitive domain is also strongly associated with adverse pregnancy outcomes such as impaired fetal growth23.

Interestingly, the two language domains were considered a priori secondary outcomes for comparing the cohorts because of their strong association (particularly expressive language) with children’s interactions with their cultural environment and the influence of care providers. However, our observation of similarities in the children’s performance on the language domains suggests that, overall, children may not have differed dramatically in the levels of early stimulation they received from these urban, well-educated families (median of 15 years of maternal formal education) (http://data.uis.unesco.org/Index.aspx?queryid=242).

Alternatively, the findings could support the concept that language acquisition is mostly a function of cognitive maturation24,25, affective and social connectedness to other persons (universal to all cultural contexts) and, as such, children with high levels of cognitive and social functioning would, in most cases, display similarly high levels of language skills26,27.

We selected 2 years of age as the time-point for the key development assessment of the entire study because growth markers at this age have been found to be predictive of intelligence, school performance, adult nutrition and human capital in high-, middle- and low-income settings28,29,30,31,32. Moreover, this is the earliest age at which the assessment of development is not confounded by transient neurological syndromes of prematurity, and at which conventionally used developmental instruments, such as the Bayley Scales of Infant Development, have been found to possess an acceptable level of medium- and long-term predictive validity13,33; this age also corresponds to the end of Piaget’s sensorimotor stage34.

The psychometric properties of the measure we developed, the INTER-NDA, including internal consistency and construct validity, evaluation of its performance using interclass correlations for absolute agreement; Bland−Altman analyses for bias and limits of agreement, and sensitivity and specificity analyses for accuracy, have all been validated against the Bayley Scales of Infant Development14,15.

In our statistical analyses, we adjusted by the age of the child at the developmental assessment, which controls for self-selection of families attending follow-up clinics, i.e. health-care-seeking patterns. We did not adjust for any characteristics at recruitment early in pregnancy as the main hypothesis of the paper is to evaluate the differences among the study samples that were selected using the same entry criteria. We opted for this conservative approach because further adjustment for any “residual confounding” would have made the samples more “artificially” similar.

We successfully evaluated 76% of the eligible children, despite the known difficulties in retaining healthy participants in large-scale, multi-national, follow-up studies. Selection bias is unlikely in view of the baseline similarities between the children evaluated and those lost to follow-up. Unfortunately, we could not include three of the original sites that participated in the INTERGROWTH-21st Project because initiation of the follow-up study was delayed due to the funding process. This reduced the external validity of our observations although we retained five study sites on four continents, with distinct geographical and cultural identities. It also affected the total size of the original cohort; however, this is less relevant to our hypothesis because the comparisons were focused on the sites that contributed data, which retained a large proportion of their original sample.

An additional limitation of the analysis is that we did not have repeated developmental measures: hence, while we were able to estimate the percentage of variance explained by “between sites’ variation”, i.e. the main hypothesis of this study, it would have been informative to estimate the “within site variation” but this requires individual repeated measures at different ages. There are considerable conceptual difficulties in identifying comparable repeated measures of developmental domains at both 1 and 2 years of age. We have, however, recently published results comparing the percentage of “within” versus “between” sites variance for repeated skeletal growth measures, i.e. infant length for the same cohort2. These data for the “between” variance component (5.5% of the total variance for length), that is considerably lower than the “within” variance (42.9% of the total variance for length), are very similar to the results presented here.

An inevitable constraint, inherent in the brief psychometric tools required to evaluate large populations, is that some domains are based on only a few items. This was the case in our study for negative behaviour, positive affect and receptive language, which were evaluated by two items. Therefore, we concentrated our interpretation and conclusions on the consistent similarities observed overall.

Traditionally, developmental comparisons during childhood across populations have been made between developing and developed countries; low and high socioeconomic populations; children of parents with high and medium to low levels of education, or immigrant populations with native children in the USA or Europe35,36,37,38. When attempts were made across contexts to compare levels of parental stimulation, which is key to early child development (ECD), samples have come from very different socioeconomic contexts and likely health conditions39,40. These studies suggested that cognition, language, play and sociability are heavily influenced by the socioeconomic structures of the society in which the children are growing up. Such evidence has been used to explain developmental differences between socioeconomic levels in industrialised societies, where most of the research has been conducted.

We have approached this fundamental question differently, although in a complementary manner. We studied cohorts of healthy, well-educated, adequately nourished women receiving evidence-based pregnancy care from culturally different geographical areas that were selected because of their low morbidity and environmental risks. We have documented that under similar socioeconomic, health and nutrition conditions, there are remarkable similarities in the physical growth of their children up to 2 years of age9, to which we now add evidence of comparable similarities in the attainment of neurodevelopmental functions and associated behaviours.

The initial INTERGROWTH-21st Project publications stressed that the relevant question when comparing healthy, low-risk populations to identify physical growth differences or similarities is: whether or not the variability in skeletal growth within a population, i.e. inter-individual difference, is larger than the variability between populations, i.e. inter-population difference, when nutritional, socioeconomic, environmental and health-care needs are met. There is now consistent evidence that the variability in human skeletal growth within a population is seven times larger than that between populations (genetic variability), which represents less than 10% of the total variance2,8,41,42.

The magnitude of the results presented is entirely consistent with studies of the genetics of human growth. For example, genome-wide association studies (GWAS) testing for common fetal variant effects on birthweight have to date identified as many as 60 associated genetic loci43, and estimation of the variance in birthweight due to a distinct maternal genetic contribution ranges from 3 to 22%43,44. Similarly, GWAS have identified nearly 700 independent variants within over 400 genetic loci that together explain only 20% of the heritability of adult height45.

Interestingly, a very recent report studied a cross-sectional sample of clinically healthy, 0-to-42-month-old children using a short, pre-coded interview with caregivers to assess age of achievement of 106 developmental milestones. The study was conducted in 22 health-care clinics in four diverse low-middle income countries (LMICs) with ethnic, cultural, and language differences46. Using predefined criteria of practical equivalence, almost all milestones at 1 year of age and 76% of the milestones up to 3 years of age were attained at similar ages across the four study sites. Despite the considerable methodological differences with our study, this report provides evidence from four other populations of the similarities of early developmental patterns among healthy children. The authors concluded with the statement that internationally applicable tools were needed to “assess children’s development to guide policy, service delivery, and intervention research that might help narrow the gap between high-income countries and LMICs in addressing early childhood development”.

We agree completely with this statement, especially in the light of the similarities in results obtained for anthropometric measures across populations when health and nutritional needs are met. International prescriptive standards for ECD are recommended for both clinical practice and research as many of the tools currently used are based on very skewed populations from developed countries, which little resemble present, culturally mixed societies. For example, personality tests, which have been used on millions of subjects worldwide, were constructed using a few hundred Swiss-German patients in the 1960s in the case of the Rorschach test, and white, rural, Protestant “Minnesotan normal” hospital visitors in the 1930s in the case of the Minnesota Multiphasic Personality Inventory47,48.

Clearly, in clinical settings, it is important to focus on an individual child as the unit of diagnosis and interventions. At population level, the first step is to identify children at risk, i.e. those that require further assessment and are most likely to benefit from an intervention. However, the heterogeneity of ECD measures presently in use and their reliance on specialists has made it difficult to carry out population-based ECD screening.

Our results strongly support the construction of international, psychometric, ECD standards (manuscript submitted). Our strategy not only provides more variability to the distribution of scores, which is desirable for a first-level screening tool, but also better reflects the underlying constitution of modern multi-cultural societies.

Finally, it is worth noting that we have deliberately avoided reporting the race/ethnicity of the participants despite some widely held beliefs that it influences ECD. We have consistently argued that the use of self-reported race/ethnicity in scientific publications is problematic in most non-isolated populations because, in addition to the inherent biases of self-reporting, there is large ancestral admixture due to global migration, invasions and other population movements. Furthermore, there are at least 116 definitions of self-reported race/ethnicity in the biomedical literature49.

In short, our neurodevelopmental and skeletal growth results from conception to childhood, as well as the genetic evidence summarised by Craig Venter in reference to possible links between race and intelligence: “There is no basis in scientific fact or in the human genetic code for the notion that skin colour will be predictive of intelligence” (https://www.theguardian.com/news/2018/mar/02/the-unwelcome-revival-of-race-science) strongly support our conclusions.