Participants

MRI data and behavioural data of 43 normally developing 3- and 4-year-old children (17 children aged 3–3.5 years, M=3.32, s.d.=0.19, 10 female; and 26 children aged 4–4.5 years, M=4.29, SD=0.17, 15 female) were analysed for the present study. The data of another five 3-year-olds and one 4-year-old were acquired but not analysed due to artifacts in the dMRI data set. Children were excluded if more than 10 out of 60 acquired directions in the dMRI data set were corrupted. Directions were removed due to intensity dropout caused by head motion48 or due to artefacts detected in a visual inspection49,50. The sample size was based on previous developmental dMRI studies with approximately 20 children per age group, assuming a dropout rate of 10–20% due to motion artefacts. A power analysis with G*Power51 showed that the computed correlations with N=43, an effect size of r=0.5, and an α-error of 5% had a power of 1-β=97%. Parental informed consent was obtained for all children in accordance with approval from the Ethics Committee at the Faculty of Medicine of the University of Leipzig.

Cognitive assessment of the false belief score

The children performed two standard tests of explicit false belief understanding27—a false location task4 and a false content task5—on the same day as their MRI scans. In the false location task, each child was introduced to a mouse puppet, and they were both shown a sweet in a little bag and an empty box. The mouse then left the room, and the sweet was moved from the bag to the box. When the mouse returned, the child was asked three probe questions about where the mouse would look for the sweet, whether she knew where it was and where she believed it was, along with a control question to make sure the child remembered the actual location of the sweet. In the false content task, the children were shown a familiar chocolate box and were asked what they believed was inside the box. Every child expected chocolates to be inside the box. They were then shown that the box actually contained pencils. The mouse puppet then entered the scene and the children were asked three probe questions: whether the mouse knew what was in the box, what she believed was in it, and what the child itself had originally believed, along with a control question on the actual content of the box. All children answered the control questions correctly in both tasks. In each of the tasks, children could obtain a total of three points, one for each of the three probe questions. The performance on the two tasks was highly intercorrelated (Spearman's ρ(43)=0.879; P=8 × 10−15). We therefore combined them into a total false belief score with equal weight for each of the six probe questions. This yielded a sufficiently varied and highly reliable measure (Cronbach α=0.894) suited to study correlations with other measures. The three questions in each of the false belief tasks could have led to carry-over effects or pragmatic pressure to give different answers to consecutive very similar questions. To exclude the possibility that such effects might have influenced our results, we replicated our analyses with a false belief score that only included the first question of each of the two false belief tasks. This scoring yielded very similar results (see Supplementary Tables 2 and 3).

Other cognitive abilities

To control for co-developing abilities, the children additionally performed a battery of three executive function tasks27 as well as a standardized test of language development (SETK 3-5, Sprachentwicklungstest für drei- bis fünfjährige Kinder)33. Moreover, children's implicit expectations of the actions of an agent with a false belief—known to precede explicit false belief reasoning in development—were tested with an anticipatory looking false belief task27. These additional tests were conducted in two separate sessions before the MRI scan, all within an average period of 14.7 days (s.d.=6.8).

Executive functions

The children were tested on a battery of three executive function tasks27—a Reverse Categorization task52, a Go-NoGo task53 and a Delay of Gratification task54. The tasks were chosen to tap into the children's inhibitory control, response selection and cognitive flexibility, which have been argued and shown to be particularly relevant for mastering standard false belief tests25,27,28.

In the Reverse Categorization task, the children were asked to sort blue and red cubes of two different sizes into a big blue box and a small red box with changing rules: first matching the colours of cubes with the boxes, then the rule was reversed, next according to the cube size, and finally reversed. The percentage of correct trials in the three rounds following a rule change was encoded as dependent variable (M=89.9%, s.d.=11.7%). This measure had a very high reliability (Cronbach α=0.899).

In the Go-NoGo task, children were asked to perform actions a duck puppet asked them to do (for example, 'Clap your hands!'), but not to do anything the nasty crocodile asked them to. It was checked before that children understood the rules and were able to perform the movements. A d-prime value was calculated with correct NoGo-trials as hits and incorrect Go-trials as false alarms (M=0.886, s.d.=0.172). This task was highly reliable (Cronbach α=0.843).

In the Delay of Gratification task, children were seated in a small room with a small portion of their preferred sweets (gummy bears or chocolate bars) and a bell on a table in front of them. A bigger portion of the sweets was placed in a locked transparent box next to it. The experimenter told the children that she had to leave for a while, but if they waited until the experimenter came back without eating the sweets or ringing the bell to call the experimenter, they would get the big portion of sweets. Task comprehension was checked with two control questions before the children were left alone for a maximum of 5 min. The children's mean waiting time was M=233 s (s.d.=107 s).

We formed an aggregate executive function score for further analysis by building the mean of the z-scores of all three tasks (3-year-olds: M=−0.63, s.d.=1.16; 4-year olds: M=0.33, s.d.=0.76; age effect: t(42)=−3.30, P=0.002). The aggregate executive function score explained a significant amount of variance in the children's false belief scores (Spearman's ρ=0.520***, P=0.0004), indicating that it indeed allowed us to control for the variance in false belief understanding due to the children's executive function abilities.

Language

As a measure of language abilities, we acquired the standardized test of language development for 3- to 5-year-old children SETK 3-5 (Sprachentwicklungstest für drei- bis fünf-jährige Kinder)33. The test included sentence comprehension and production, vocabulary comprehension and production, morphological rule building and phonological working memory. The mean standardized T-value of all subtests served as independent variable to control for children's language abilities (M=57.4, s.d.=7.4). This measure was significantly correlated with the false belief score (Spearman's ρ=0.306*, P=0.046).

Belief-related anticipation

In an implicit belief-related anticipatory looking task27, the children were presented with short film clips on a Tobii T120 eye-tracker monitor showing an animal agent observing and following a mouse through a y-shaped tunnel to one of two boxes at the two exits of the tunnel (see Fig. 3). The children were first familiarized with the fact that the animal agent would go to the box with the mouse. Then, the children were shown film clips in which the agent had a false belief about the location of the mouse, which had actually left the scene in the animal's absence. The children's anticipatory looking was evaluated as a measure of their expectations as to where the agent would look for the mouse. There were two different false belief conditions (FB1 and FB2), respectively controlling for different non-belief-related strategies27,29. Every child was presented with a total of 10 familiarization (FAM) trials, 12 FB trials (six of each condition), and six true belief trials (TB1 and TB2) analogous to the FB trials, except that the agent held a true belief (TB) about the mouse's location.

Figure 3: Implicit belief-related anticipatory looking task. Selected scenes from the two false belief conditions FB1 and FB2. Arrows indicate the movement of the animals, check marks or crosses underline whether the agent animal can see what happens or not. Full size image

Gaze data were analysed for a time of interest from the moment when the agent had disappeared in the tunnel until its reappearance in the FAM and TB conditions or until the end of the trial in the FB conditions. Two regions of interest (ROI) were defined, each covering one of the tunnel exits and the corresponding box. During the time of interest, the ROI in which the child looked first (first look), as well as the ROI with the longer gaze duration (longer look) was coded. Since both measures were highly intercorrelated (r(43)=0.444, P=0.003), the measures were collapsed to the mean of first and longest look for subsequent analyses27.

The children performed significantly above chance in the FAM and TB control conditions (M=67.8%, s.d.=12.8%, t(42)=9.09, P<0.001), confirming that they had understood the events displayed in the film clips and showed correct anticipation when no false belief was involved. The children also performed above chance in the FB trials (M=53.7%, s.d.=11.2%, t(42)=2.14, P=0.038). As opposed to the standard tasks of explicit false belief understanding, there was no significant difference between age groups (3-year-olds: M=54.1%, s.d.=11.7%; 4-year olds: M=53.4%, s.d.=11.1%; t(41)=0.214, P=0.83). This is in line with previous literature that shows that belief-related anticipation is already achieved before the age of 2 years28,29.

MRI data acquisition

The dMRI data were acquired on a Siemens 3 T TIM Trio scanner using the multiplexed echo planar imaging sequence55,56 with a resolution of 1.9 mm isotropic (TR=4,000 ms; TE=75.4 ms; b-value=1,000 s mm−3; 60 directions; GRAPPA 2) reducing the scanning time to 5:32 min. A field map was acquired directly after the dMRI scan. Additionally, an anatomical scan was acquired using the MP2RAGE sequence57 at 1.2 × 1 × 1 mm resolution (TR=5,000 ms; TE=3.24 ms; GRAPPA 3; 5:22 min). Children were acquainted with the scanning procedure by performing a mock scan a few days before the actual scan and watched a movie of their choice on MR-compatible goggles during the scan.

dMRI data analysis

Before preprocessing the dMRI data, volumes affected by artefacts due to motion were removed manually, as described above. Motion itself was corrected for by rigidly aligning all volumes to the last one without diffusion weighting (b0) using flirt58 from the FSL software package59. The dMRI data were then rigidly aligned to the anatomical image, which again had been rigidly aligned to the Montreal Neurological Institute standard space and was interpolated to 1 mm isotropic voxel space. Distortions were corrected using the corresponding field map. All these transformations were combined before being applied to the data to require only a single step of interpolation. The diffusion tensor was computed in every voxel within the brain volume and FA maps were derived. A common group template of the participants' FA maps was created using ANTs (Advanced Normalization Tools)60.

TBSS analysis

The participants' FA maps were then correlated voxelwise with their false belief scores using TBSS34. TBSS projects the individual subject's maximal FA values onto a common white matter skeleton, before applying voxelwise cross-subject statistics. The skeleton was thresholded at an FA value of 0.2. The nonlinear registration was done using the group-specific template as a target image. Voxelwise statistics were then carried out with a non-parametric permutation test61 implemented in FSL59 with the false belief score as the dependent variable, taking into account the non-normal distribution of the data. In a next step, we controlled for the language and executive function scores, as well as for implicit belief-related anticipation by including them as covariates in the linear model61. In addition, we computed linear regressions where we controlled for each of the executive function tasks separately, and for all the language subtests of the SETK as separate covariates. This was done to make sure that we did not miss out on variance that was only explained by one of the subtests due to the aggregate scores. Reported clusters on the skeleton were significant at P<0.01 at voxel-level and exceeded a cluster size significant at P<0.05 based on local smoothness estimation on the skeleton with AFNI (3dClustSim and 3dLocalstat)62. In addition, a similar TBSS analysis was performed for the other cognitive domains. The correlation of FA with the executive function score and the standardized language test are reported in the Supplementary Methods. No regions of significant correlation of FA with the implicit anticipatory looking false belief task were found..

Commonality analysis

To get a better understanding of the role of developmental change in the effects found in the TBSS analysis, a commonality analysis63 was computed voxelwise on the skeleton including age and FA as predictors for the false belief score. A commonality analysis allows the decomposition of the contributions of several, possibly intercorrelating, linear predictors into subcomponents explained by the unique variance of the individual predictors, as well as subcomponents explained by the shared variance of all possible combinations of the predictors. Our commonality analysis thus allowed us to determine whether the children's false belief scores were explained by age-related increases in FA in a given voxel (common contribution of age and FA) or by age-independent individual differences in FA (unique contribution of FA), while in both cases the variance explained uniquely by age (unique contribution of age) was controlled for. In addition, the children's language, executive function and belief-related anticipation scores were included as covariates in the commonality analysis to ensure that differences in FA were specifically related to false belief understanding, independently of more general cognitive development. This analysis revealed that age-related increases in FA in the respective regions significantly explained between 4 and 10% of the variance in the false belief score, over and above the unique contribution of age and of the other cognitive abilities (details see Supplementary Table 1).

Connectivity analysis

To see within which tracts the significant clusters from the TBSS analysis were located, these regions were taken as seeds for probabilistic tractography with MRtrix64 using Constrained Spherical Deconvolution as a local model65 with the default parameters. Streamlines were started in randomly selected initialization points within the seed regions until 100,000 streamlines with a minimum length of 10 mm were obtained. The tracking followed directions with a maximum fibre orientation density (FOD) value of 0.1 and a curvature radius of at least 1 mm. The tractography was restricted to white matter. This analysis yielded the tracts shown in Fig. 4.

Figure 4: Streamline density maps resulting from probabilistic tractography seeded in the regions depicted in grey with significant correlation of FA and the false belief scores in the TBSS analysis. Left hemisphere, (a) arcuate fascicle seeded in the MTG, (b) IFOF seeded in the STG, (c) ILF, IFOF and fornix seeded in WM below the ITG. Right Hemisphere: (d) arcuate fascicle seeded in the pMTG/TPJ, (e) arcuate fascicle seeded in TPJ, (f) IFOF seeded in vMPFC. Medial: (g) corpus callosum seeded in the right SPL (WM near PC), (h) left anterior thalamic radiation seeded in the thalamus. Full size image

Streamline density maps of the individual subjects' resulting tracts were masked in the common template space by imposing that at least half the subjects have nonzero values in every voxel. This was done to ensure that correlations were not outlier-driven. The individual subjects' streamline density maps were then correlated with their false belief scores using FSL randomize61, while controlling for the mean FA in the seed region of the tractography. This was done in order to ensure that correlations with streamline density were not driven by the correlation of the false belief score and FA found in the TBSS analysis. We then controlled for age, the language, executive function, and implicit belief-related anticipation scores by including them as covariates in the linear model. In addition, we computed linear models where we controlled for each of the executive function tasks separately, and for all the language subtests of the SETK as separate covariates. Reported clusters in the tract volumes were significant at P<0.001 at voxel-level and exceeded a cluster size significant at P<0.05, in addition taking into account the number of streamline density maps according to Bonferroni correction.

To confirm that the effect observed in the anterior IFG stemmed from dorsal streamlines of the arcuate fascicle, we computed an additional tractography from the seed regions in the left MTG and right TPJ, which we restricted to dorsal pathways. For this, a termination mask was defined as a plane parallel to the Sylvian fissure (see Supplementary Fig. 1).

We localized and named the clusters and tracts based on the MRI Atlas of Human White Matter17.

Data availability

Data, in anonymized format (according to data protection policy in the ethics agreement), is available upon request. The publication of the script for the voxelwise commonality analysis is in preparation, and the script is available from the authors on request.