Participants

For each linguistic group, we recruited 30 native speakers (with the exception of South Korea, where only 25 participants were tested due to logistic problems). Participants were of both sexes, aged between 14 and 43. They resided either in a village/town (i.e. <100.000 inhabitants) or in a city (i.e. >100.000 inhabitants), and had a different number of siblings (from 0 to 16). Participants differed in their education level, had different occupations and monthly income. Participants further varied in the second languages they spoke and in their level of proficiency. English was the most common second language spoken in all linguistic groups, with the exception of Khoekhoe (who mostly spoke Afrikaans as a second language) and Sidaama (who mostly spoke Amharic as a second language). For more details, see Table 1 and Supplementary Information.

Table 1 Information on the subjects included in the analyses. Full size table

All experimental procedures had been approved by the ethical committee at the University of Bern, Switzerland (2016-06-00006), all experiments were performed in accordance with European guidelines and regulations, and informed consent was obtained from all participants.

Experimental protocol

Testing took place in surroundings that were familiar to the participants, such as schools, community centers and private homes. Individuals were generally tested alone, unless they felt uncomfortable and asked for other people being present, in which case these were sat at a certain distance behind the computer screen and instructed not to interfere in any way with the testing procedure. For each population, one research assistant collected the data together with a local research assistant translating the procedure, when needed (i.e. in Cambodia, Ethiopia, Japan, Korea and Namibia). In Italy and Thailand no local research assistant was needed, as the research assistant collecting the data was a native speaker of the language tested. Overall, a native speaker of the local language conducted recruiting, consenting and testing for all populations tested. Written consent was obtained before testing, while biographical information was obtained at the end of the tasks, by noting participants’ name, sex and age, residence, number of siblings, main occupation, approximate monthly income, educational level, native language and proficiency in other languages.

Each participant was tested in 6 different memory tasks, administered one after the other on a laptop, with approximately one-minute breaks in-between. The six tasks were three short-term memory (STM) tasks with words as stimuli (WS = word span), with numbers as stimuli (DS = digit span), or with spatial stimuli (MS = matrix span); and three working memory (WM) tasks with words as stimuli (OS = operation span), with numbers as stimuli (CS = counting span), or with spatial stimuli (SS = symmetry span). For these tasks, we adapted the classic automated span tasks programmed with E-prime and implemented in the Attention & Working Memory Lab by Engle’s research group80,81. All tasks have been validated across a variety of studies and basically test STM and WM by requiring individuals to observe a series of stimuli and recall them immediately afterwards, in the same order they were presented. Before each task started, participants were instructed about the procedure and provided with two examples containing two stimuli. Moreover, they were also reminded that stimuli had to be sequentially recalled, in the same order as they were presented. In case the procedure was not clear, it was explained again until the participant understood it. Throughout the tasks, the experimenter made no suggestions, but could motivate participants regardless of their performance by reassuring them that they were doing fine. The order of tasks was pseudo-randomized and counterbalanced across subjects, but the order of stimuli and trials within each task was the same for all participants (see Supplementary Information for more details).

STM tasks

In the STM-WS task, participants were presented with 18 test trials, each one containing 2–7 stimuli. The stimuli consisted of 600 px × 800 px pictures with images of common animals and objects (e.g. a cat, a hen, a leaf, an ant, a cloth), being visible for 2000 ms in the middle of the screen. Before the task started, individuals were instructed to observe the series of pictures on the screen, name each of them aloud as soon as it appeared, and recall them aloud in the same order they had appeared, as soon as question marks appeared on the screen. The experimenter audio-recorded all trials.

In the STM-DS task, participants were presented with 21 test trials containing 3–9 stimuli. The stimuli consisted of numbers from 1 to 9 (presented as 100 px × 150 px images with a black number on a white background), which were visible for 2000 ms in the middle of the screen. Before the task started, individuals were instructed to observe the series of numbers on the screen and then recall them in the same order they had appeared, as in the previous task. Participants provided their response on coding sheets with series of 9 squares, so that each square could contain one number.

In the STM-MS task, participants were presented with 18 test trials containing 2–7 stimuli. The stimuli consisted of 4 × 4 squared matrixes (presented as 400 px × 300 px images) with a black grid on a white background, and one of the 16 squares inside being colored red in each stimulus (the position of this red square was different depending on the stimulus). Each stimulus was visible for 2000 ms in the middle of the screen. Before the task started, individuals were instructed to observe the series of matrixes on the screen and then recall the position of each red square in the same order they had appeared, by writing them down in a coding sheet as soon as questions marks appeared on the screen.

WM tasks

In the WM-OS task, participants were presented with 12 test trials containing 2–5 stimuli. The stimuli consisted of 600 px × 800 px pictures with images of common animals and objects (as in the STM-WS task), and three little squares with a variable number of red dots inside, which served as stimuli for the distracting task. Before the task started, individuals were instructed to observe the series of pictures on the screen, name each of them aloud as soon as it appeared, solve the distracting task (by subtracting the red dots in a box from the red dots in the other one, and telling aloud whether the result corresponded to the number of red dots in the third box; i.e. distracting task), and then recall the name of the pictures aloud in the same order they had appeared, as soon as question marks appeared on the screen. In this task, each stimulus remained in the middle of the screen until it was named and the mathematical operation was solved. The experimenter audio-recorded all trials.

In the WM-CS task, participants were presented with 15 test trials containing 2–6 stimuli. The stimuli consisted of 600 px × 800 px pictures with a grey background and a varying number of blue circles, blue squares and green circles (with the number of blue circles in each image varying from 3 to 9). Before the task started, individuals were instructed to observe the series of images on the screen, count aloud the number of blue circles among other figures in each image (i.e. distracting task), repeat this number aloud and then recall aloud the series of final numbers in the same order they had appeared, as soon as question marks appeared on the screen. Each stimulus remained in the middle of the screen until the blue circles had been counted. The experimenter audio-recorded all trials.

In the WM-SS task, participants were presented with 12 test trials containing 2–5 stimuli. The stimuli consisted of 4 × 4 squared matrixes (presented as 400 px × 300 px images) with a black grid on a white background (as in the STM-MS task), and one of the 16 squares inside being colored red in each stimulus. These matrixes were alternated to 8 × 8 squared matrixes of the same size, serving as stimuli for the distracting task: some of the 64 squares were colored black, forming a muster that could either be symmetrical or asymmetrical along the vertical axis. Before the task started, individuals were instructed to observe the series of 4 × 4 matrixes on the screen, assess aloud whether the 8 × 8 symmetry matrixes were symmetrical or not (i.e. distracting task), and then recall the position of each red square in the 4 × 4 matrixes in the same order they had appeared, by writing them down in a coding sheet as soon as the question marks appeared on the screen. All matrixes were visible for 2 seconds in the middle of the screen, but 4 × 4 matrixes were only visible after the previous symmetry judgment had been done. On a piece of paper, the experimenter further noted the participants’ responses to the distracting task.

Scoring

We transcribed all participants’ responses from the audios and coding sheets. We then compared the recalled stimuli to the stimuli as named during the stimuli presentation. For each trial, we divided the list of stimuli presented in two halves and separately coded the number of correct responses for the first half (i.e. initial stimuli) and for the second half (i.e. final stimuli). For the first half, we coded whether the first stimulus recalled corresponded to the first stimulus having been presented, whether the second stimulus recalled corresponded to the second stimulus having been presented, and so on. For the second half, we coded whether the last stimulus recalled corresponded to the last stimulus having been presented, the second to last stimulus recalled corresponded to the second to last stimulus having been presented, and so on. Crucially, coding the final stimuli starting from the end ensured that mistakes in recalling initial stimuli did not affect the response for the final stimuli, as a correct response required that both identity and order of stimuli were recalled correctly.

Inter-observer reliability

A second observer recoded 11.6% of all the trials and inter-observer reliability was excellent (for the sum of correct initial stimuli in each trial: Cohen’s k = 0.955, N = 2592, p < 0.001; for the sum of correct final stimuli in each trial: Cohen’s k = 0.940, N = 2592, p < 0.001).

Statistical analyses

Before conducting the analyses, we excluded some participants from the sample. In particular, although all participants alleged to be native speakers of the language they were going to be tested for, based on the interactions with the participants we inferred that one Korean and one Khoekhoe-speaker were not native speakers of those languages and we therefore dropped them from the analyses. We further excluded from the analyses one Sidaama who failed to count the blue circles aloud in the distracting task of the WM-CS task (as the distracting task was not implemented, transforming the nature of the WM task). Finally, we excluded 68 trials (i.e. 0.3% of the remaining trials), due to problems with the audio-recordings, participant’s failure to understand the procedure, participant’s distraction or others’ interference in the task.

All analyses were conducted using generalized linear mixed models (GLMM)93 and were run using R statistics (version 3.2.3) with the lme4 package94. We ran one model for the WM tasks, and one for the STM tasks, both with a Poisson structure. In the models, we included participants’ performance for initial and final stimuli in each trial of the WM numerical, spatial and word tasks (N = 18050), and in each trial of the STM numerical, spatial and word tasks (N = 26470), respectively. All numerical variables were z-transformed, to obtain comparable and more easily interpretable coefficients95. To analyze the effect of test predictors (i.e. the predictors of interest) on the response, we compared each full model (including both control and test predictors) to a corresponding null model (only including control predictors). When test predictors have a significant effect on the response, the full-null model comparison is significant. To obtain the p values for the individual fixed-effects we conducted likelihood-ratio tests96. In order to rule out collinearity, we checked variance inflation factors (VIF)97 and overall VIF values were generally close to one (maximum VIF = 3.26). All models were stable.

In both models, the dependent variable was the number of correct stimuli identified (initial and final). Moreover, in both models, we included three test predictors: branching direction (right or left), kind of stimuli (numerical, spatial and word), and stimuli position (initial or final), as well as their 2- and 3-way interactions. Main branching direction based on (i) the SVO/SOV order, (ii) the presence of head nouns preceding/following (iii) genitive and (iv) relative clauses, and (v) separate adverbial subordinators at the beginning/end of subordinate clauses79. See Supplementary Information for more details.

As control predictors we included (i) fixed effects known to potentially affect WM and/or STM, crucially including all possible random slopes, and (ii) random effects. In this way, we could (i) assess the effect of our test predictors after controlling for the effect of other potentially confounding variables, and (ii) account for the non-independence of data points. As fixed effect variables we included: participant’s sex (2 levels e.g.98,99,100), participant’s age (from 14 to 43 years old e.g.101,102,103), number of siblings (from 0 to 16 see104), residence (village/town or city, with threshold set at 100.000 inhabitants; as living in cities may favour enhanced spatial memory), level of education (depending on the years spent at school/university e.g.105,106), occupation (unemployed, working in the primary sector, in the secondary sector, in commerce or tourism, in other areas of the tertiary sector, students e.g.107,108,109), centered income (as the deviation of each participant’s monthly income from the average national income e.g.110,111), average national income (from 58 to 2588 €, as calculated by the International Labour Organization), knowledge of a language with an opposite branching (none, low to middle, middle to high, as based on a simplified version of the Interagency Language Roundtable scale for Language Proficiency by the U.S. Department of State; as this could reduce the effect of the native branching), number of stimuli in each trial (from 2 to 9 see e.g.80), trial number within each task (from 1 to 21), and (only in the WM tasks) the percentage of correct choices in the distracting trials. Note that the inclusion of all these fixed effects makes our results especially robust, as they assess the effect of test predictors (which are a priori defined), independently of other potential confounding factors, also defined a priori. As random effect variables, we included language, participant’s identity and trial identity (given that each trial was coded twice: the first half starting from the beginning, and the second half starting from the end), to account for the non-independence of data points.