Adult gender differences in science, technology, engineering, and math (STEM) career representation sometimes are thought to originate from inborn differences between the sexes in aptitude for STEM fields.1,2,3,4,5 Gender differences could be biological differences that are present at birth, or they might emerge over time with maturation.4 In this study, we focus on gender differences in early childhood. Although adult STEM talent is derived from a large suite of cognitive abilities and unlikely to be traceable to a single domain or skill, if intrinsic differences between the sexes are indeed a root cause for the under-representation of women in STEM, one expectation is that gender differences in quantitative cognition will emerge early in human development.

Understanding the nature of gender differences in mathematics has been a focus of research for many years. However, differences in measurements, analyses, and participant samples have led to a variety of findings. For one, differences can emerge in mean performance on mathematical tasks,6,7,8 and small differences in favor of boys have been reported in a range of numerical skills by the end of kindergarten.9 Although most studies of school-aged children that find gender differences report higher performance in boys, some studies have only found advantages for boys when tasks involve more reasoning or are more spatial in nature.2,10 In contrast, elementary school girls sometimes show an advantage on computational tasks and when performance is assessed using school grades.11 Other studies find no differences, trivial differences, or differences in older children, but not younger children.10,12,13,14 Group differences can sometimes be attributed to cohort effects. For instance, some studies show that differences between US and Chinese children in mathematics depend on generation or school,15,16 and a recent study showed that the strength of any advantage in mathematics for boys vs. girls varies by country.17 Gender differences may also emerge in the variability of mathematical performance across boys and girls. When these gender differences in cognition are observed, boys tend to show greater variability than girls, resulting in more boys than girls at the high-performing and low-performing ends of distributions.6,7,8,17,18 This may cause gender differences in mean performance to be absent at the group level12,14 but detectable at the high-performing and low-performing ends of the distributions.18

Another major obstacle in assessing such gender differences on school-based mathematics metrics is that sociocultural influences, such as stereotype threat and the influence of teachers and parents, make it difficult to tease apart gender differences in experience from differences in intrinsic abilities.19,20,21,22,23,24,25,26,27,28,29 For example, school-aged children could show gender differences in mathematics abilities because girls are given less or different exposure to mathematics than boys or are told that “math is not for girls.” Therefore, it is unclear whether differences in mathematics abilities are rooted in intrinsic differences in numerical reasoning in early childhood or whether gender differences emerge as a result of differences in cultural exposure to mathematical concepts. Understanding the sources of any gender differences is crucial for optimizing early childhood math and science curricula.

Previous research30 described evidence against the existence of gender differences in visuospatial reasoning in early childhood. Across six tasks, boys and girls performed similarly on measures of object tracking (the ability to follow multiple, independent moving objects), early numerical processing, and core geometric abilities (Fig. 1). Those data revealed no gender differences in some basic cognitive abilities of children aged 3–10 years. However, that research leaves open key areas for investigating gender differences in core numerical processing, including patterns of looking at quantitative information during infancy, early discrimination acuity during quantity processing, and formal mathematics learning.

Fig. 1 Previously described gender similarities. Redrawn data30 showing no gender differences in early childhood on measures of object processing, numerical processing, and geometric reasoning Full size image

We examined children’s early mathematical cognition during infancy and early childhood to provide insight into whether gender differences are evident in early childhood. With the exception of the infant data, these data were collected as part of standard testing batteries measuring numerical processing skills. While we acknowledge that there are other ways to measure mathematical thinking in this age range, we combined published data31,32,33,34,35 with unpublished data from our longitudinal records that measured children’s performance in three key areas of numerical processing from our standard testing battery of early childhood numerical cognition. First, we assessed numerosity perception and acuity in infancy and childhood. Numerosity perception allows us to estimate the quantity of a set without knowing exactly how many items are in the set—we measured children’s acuity to detect differences in numerosity. Next, we examined two aspects of verbal counting acquisition during preschool, which is the earliest emerging exact understanding of quantities. Finally, we evaluated school-based mathematics during the first few years of schooling when children learn to manipulate numbers. School-based mathematics refers to comprehensive, standardized testing of a variety of mathematical skills including counting proficiency, numeral knowledge, concrete set comparison and transformation, word problems with numerical comparisons and basic arithmetic transformations, and part-whole concepts. Because the tests are age-based, the tasks completed by each child varied. These data are largely unpublished but were combined with published data31,32,33,34,35 in order to examine gender differences in over 500 children.

We conducted several analyses to test for statistical differences and statistical equivalence in performance, the emergence or disappearance of differences with age, and statistical differences in variability between groups. Similarities and differences between boys’ and girls’ performance were assessed using independent-samples t tests to identify statistical differences in mean performance and Schuirmann’s two one-sided tests of equivalence36 to identify statistical equivalence in mean performance (similarity within ½ standard deviation (s.d.) of the group data; implementation of this test for SAT-Math scores.37) Testing for both statistical differences and statistical equivalence is important. Non-significant t tests only allow us to conclude that there is not enough evidence to reject the assumption that performance is equivalent between groups. However, this does not necessarily mean that the groups are statistically equivalent. By including tests of equivalence, we can determine whether the lack of a significant difference between groups reflects statistically equivalent distributions of scores between groups. To date, tests of equivalence have not been conducted on data on mathematical abilities in early childhood, but these tests are especially important for informing the “Gender Similarities Hypothesis.”38,39 To determine whether the results of the t test were consistent across age, we also conducted simultaneous linear regressions with age, gender, and their interaction entered as predictors. A main effect of gender would suggest that there is a difference between boys and girls when controlling for age and an interaction would suggest that differences may emerge only at one end of the age range. In addition to assessing children’s mean performance, we determined whether boys and girls showed equal variance in performance using Levene’s test. Testing for equality of variance is particularly important in light of previous work that suggests that there are more high-performing and low-performing males than females because males show greater variability in measures of quantitative processing.4 For thoroughness, tests of statistical equivalence and differences in variability on scores controlled for age are reported in Supplement 1 (statistical differences in age-controlled scores should be evident in the regression analyses). Finally, for visualization purposes, we calculated growth curves at the group level following previous work.40 Because these curves were calculated at the group level, we do not statistically test for differences between boys’ and girls’ growth rates and simply provide these curves as a way to visualize changes in performance with age.

Across all three aspects of early mathematical cognition assessed here, we would expect that if boys and girls truly differ in their capacities for numerical processing, we should find evidence of statistical differences in mean performance (independent-samples t tests), and we should see that this effect is consistent across age (main effect of gender in the linear regressions) or driven by one end of the age range (interaction between gender and age in the linear regression). However, the cross-sectional analysis indicates that there are no robust gender differences in early numerical processing including preverbal numerosity perception, counting acquisition, and school-based mathematics ability.

Core numerosity perception

Humans have the ability to perceptually estimate the numerical magnitude of a set of objects without counting. For example, without counting, people can rapidly determine that a set of 20 objects is numerically greater than a set of 10. Because numerosity representations are only noisy estimations of number, discrimination between quantities depends on the numerical ratio of the sets based on Weber’s law.41 For example, using estimation it is equally easy for people to choose the larger quantity of 10 vs. 5 as 20 vs. 10, because they have the same ratio (2:1 ratio)—quantities with finer ratios like 7 vs. 5 or 15 vs. 10 will be more difficult to discriminate. Research has shown that this ability to represent and discriminate numerosities emerges within the first year of life42,43,44,45 and that it is evident in nonhuman animals,46,47,48,49,50,51 suggesting an evolutionarily primitive origin. At 6 months, human infants can discriminate quantities that differ by a ratio of 2:1 (e.g., 16 vs. 8 dots),44,45 but by 9 months, infants can discriminate quantities at a 3:2 ratio.45 Numerosity representations become more refined with age such that 4-year-old children can discriminate at a 4:3 ratio and adults can discriminate at a ratio of 10:9.52 The visuospatial nature of numerosity perception makes it an important ability to investigate in children because gender differences in mathematics have sometimes been attributed to fundamental visuospatial skills, such as mental rotation.53 Moreover, because the acuity of these representations has been shown to relate to math ability54,55,56 (but note opposing views57), understanding whether there are gender differences in early numerical processing is essential to understanding the fundamental nature of gender differences in math achievement. Here we examined data from infants, preschool children, and early school-aged children.

To test for gender differences in numerosity representations in infancy, we analyzed previously published data from 80 6-month-old infants35 (range = 5 months 13 days–6 months 17 days, 38 girls, 42 boys). The precision of infants’ numerosity representations was assessed using a preferential looking paradigm in which infants were presented with two image streams: one in which numerosities alternated between images and one in which numerosity was constant (see Fig. 2a). Infants preferred to look to the numerically alternating image stream if numerosities differed by at least a 2:1 ratio, and there were individual differences in infants’ preferences.35 An independent-samples t test and Schuirmann’s test of equivalence revealed no gender differences (t test: t(78) = 0.14, p = 0.89, difference = 0.41%, 95% confidence interval (CI) = −6 to 7; equivalence test: t 1 (78) = 2.36, p = 0.01; t 2 (78) = −2.08, p = 0.026), with boys showing a mean preference for the numerically alternating image stream of 7.5% (±14.6) and girls showing a mean preference of 7.09% (±14.3). Levene’s test of Equality of Variances revealed no significant differences in variance between girls and boys (F(1, 78) < 0.01, p = .94, boys’ s.d. = 14.65, girls’ s.d. = 14.27). This is consistent with the previous work showing no overall differences between boys’ and girls’ sensitivity to numerosity in infancy.58,59,60

Fig. 2 Infant numerosity. a Depiction of numerosity change detection task. b Average percentage of time looking at the numerically changing image stream for girls (red) and boys (blue). Error bars represent standard error of the mean. c Density distributions for percentage of girls (red) and boys (blue) at a given % looking time preference Full size image

We also tested for gender differences in numerosity perception in the earliest years of formal education. Two hundred forty-one scores were collected from 3- to 7-year-old children (mean age = 5.48 years, 125 girls, 116 boys; data from 68 children have been previously reported31,32). All children completed a computerized numerical comparison task. In this task, children were shown two side-by-side dot arrays and were asked to choose the side that had more dots. The numerical ratio between dot arrays varied between 4:1 and 10:9. This type of numerical discrimination task permits a psychophysical evaluation of numerosity representation and is consistent with previous literature using this task in adults and children.52,61,62,63,64,65 Furthermore, performance on this type of task has been shown to be similar to neural measures of numerosity encoding,31,63 indicating that this is a fundamental aspect of numerical cognition. Although previous work found that women and girls performed better than men and boys,52 sample sizes were small (n = 16 per age group), so it is unclear whether these differences are representative of the general population.

To assess the acuity of boys’ and girls’ numerosity representations, Weber fractions (w) were calculated for each child.66 The w score represents the acuity of numerosity representations such that a smaller w indicates greater acuity. An independent-samples t test and Schuirmann’s equivalence test revealed that boys and girls showed equal acuity of numerosity representations in early childhood (Fig. 3; t test: t(239) = 0.23, p = 0.82, boys’ mean = 0.56, girls’ mean = 0.58, difference = 0.02, 95% CI = −0.12 to 0.15; equivalence test: t 1 (239) = 4.10, p = 0.00003; t 2 (239) = −3.64, p = 0.0002). A simultaneous regression further revealed that while acuity improves with age, the effect of age on acuity of numerosity representations does not differ between boys and girls (F(3, 237) = 32.03, p < 0.0001, R2 = 0.29; Gender: b = 0.10, t(237) = 0.33, p = 0.74; Age: b = −0.27, t(237) = −7.33, p < 0.0001; Age × Gender: b = 0.02, t(237) = 0.30, p = 0.77). Finally, Levene’s test of Equality of Variances did not reveal a difference in variance between boys and girls (F(1, 239) = 0.09, p = 0.76; boys’ s.d. = 0.57, girls’ s.d. = 0.49).

Fig. 3 Early childhood numerosity. a Average Weber fraction for girls (red) and boys (blue). Error bars represent standard error of the mean. b Growth curves for Weber fractions calculated across girls (red), boys (blue), and all children (black). Lightly shaded areas around girls’ and boys’ growth curves indicate 1 standard deviation above and below the mean growth curve. c Density distributions for percentage of boys (blue) and girls (red) at a given Weber fraction Full size image

Taken together, we find that from infancy into early childhood, boys and girls do not differ in their earliest numerosity perceptions. Boys and girls are equally capable of discriminating numerosities.

Culturally trained counting

Verbal counting is the first culturally trained symbolic mathematics concept to develop in children. Knowledge of the verbal counting routine emerges gradually between the ages of 2 and 5 years. First, children learn to rote recite the count list (2–2.5 years). Over the next 6 to 12 months, children begin to acquire the meanings of the number words one at a time: they learn that the number word “one” corresponds to exactly one item, then that the word “two” corresponds to exactly two items, then “three,” and finally “four.” Around 3.5 years, children seemingly suddenly become cardinal-principle knowers in that they learn that each number word refers to a specific quantity and that a number word can be used to label the size of a set as determined by counting.67,68,69,70,71 We tested children’s knowledge of the rote-memorized counting sequence with the “How High?” task, and we tested their cardinal knowledge of number and counting principle knowledge with the “Give-N” task.69,70 Although there are other ways to assess counting skills and knowledge of the cardinal principles,9,69,70,71,72 these tasks are commonly used and standardized across the literature. These two measures of culturally trained counting allowed us to determine whether boys or girls show a general advantage for early number word learning or whether there are different patterns of gender differences in memorizing the counting sequence (“How High” task) vs. learning the meanings of number words (“Give-N” task). A general advantage for early number word learning would be supported by differences in favor of one gender on both measures of early number word knowledge. An advantage on only one test would suggest the advantage is isolated to a specific skill.

For the “How High?” task, children were asked to count as high as they could until they reached 100. One hundred forty-three children aged 2–5.5 years old were tested (mean age = 4.10 years, 71 girls, 72 boys). An independent-samples t test revealed that boys and girls did not show a difference in their ability to memorize the verbal counting sequence (t(141) = 1.48, p = 0.14, boys’ mean = 30 girls mean = 23, difference = 7, 95% CI = −2 to 16), and Schuirmann’s tests of equivalence found marginal statistical equivalence (t 1 (141) = 4.48, p = 0.00001; t 2 (141) = −1.52, p = 0.06). A simultaneous regression confirmed that although children’s ability to recite the count list improves across age, differences do not emerge when controlling for age or at one end of the age range (Fig. 4a for scatterplot of data by age; F(3, 139) = 20.96, p < 0.0001, R2 = 0.31; Gender: b = 12.75, t(139) = 0.58, p = 0.56; Age: b = 18.26, t(139) = 5.10, p < 0.0001; Age × Gender: b = 4.20, t(139) = 0.80, p = 0.43). Furthermore, Levene’s test of Equality of Variances revealed no difference in variability (F(1, 141) = 1.41, p = 0.24; boys’ s.d. = 30, girls’ s.d. = 25). Taken together, this suggests that from 2 to 5.5 years of age, boys and girls show equal proficiency in memorizing and reciting the count list.

Fig. 4 Early childhood counting. Growth curves for performance on the a “How High?” task and b “Give-N” task. Growth curves are calculated across girls (red), boys (blue), and all children (black). Lightly shaded areas around boys’ and girls’ growth curves indicate 1 standard deviation above and below the mean growth curve Full size image

Performance on the “How High?” task only represents verbal learning of the culturally trained, rote-memorized list of count terms and is not an index of children’s quantitative or logical reasoning during counting. To test children’s understanding of the counting procedure, we tested children on the “Give-N” task. In the “Give-N” task,69,70 children were asked to count in order to produce sets of 1 to 10 objects. One hundred and twenty-three children aged 2.98–5.47 years completed the tasks (mean age = 3.87 years, 65 girls, 58 boys). Children were scored by the highest set size that they could correctly produce. An independent-samples t test revealed no statistical difference between boys and girls, but Schuirmann’s tests of equivalence test failed to find statistical equivalence (t test: t(121) = 1.67, p = 0.097, boys’ mean = 6.38, girls’ mean = 5.26, difference = 1.12, 95% CI = −0.2 to 2.44; equivalence tests: t 1 (121) = 4.46, p = 0.00001; t 2 (121) = −1.12, p = 0.13). The simultaneous regression revealed a main effect of age, but no effect of gender or interaction between gender and age (Fig. 4b for scatterplot of data by age. F(3, 119) = 31.63, p < 0.0001, R2 = 0.44; Gender: b = 1.67, t(119) = 0.57, p = 0.57; Age: b = 3.58, t(119) = 7.45, p < 0.0001; Age × Gender: b = 0.23, t(119) = 0.30, p = 0.76). In addition, we did not detect differences in variance between boys and girls (F(1, 121) = 0, p = 0.99; boys’ s.d. = 3.61, girls’ s.d. = 3.78). Overall, there are no strong differences between boys and girls in their ability to use counting to produce sets.

Thus, boys and girls do not significantly differ in their cardinal and logical knowledge of the counting sequence during early childhood. The lack of a difference between boys and girls is consistent with the findings depicted in Fig. 1 that tested 194 3-year-old children on similar counting tasks.30

In sum, we find that boys and girls show equal proficiency in memorizing and reciting the count list, and comparable abilities to learn the logic of the counting sequence. We conclude that there is no true gender difference in children’s early counting.

Formal and informal early elementary mathematics

Children begin to learn school-based numerical and mathematical concepts shortly after acquiring the counting principles. To test for early gender differences in the foundations of school-based mathematical concepts, we administered the Test of Early Mathematics Ability Third Edition (TEMA-373) to 275 children aged 3.07–7.92 years (mean age = 5.45 years, 133 boys, 142 girls; data from 77 children have been previously reported32,33,34). The TEMA-3 is a comprehensive test of school-based mathematical knowledge for children aged 3–9 years. Items are categorized as “formal” and “informal”: Formal items tap into knowledge that is formally taught such as numeral names, numeral writing, and arithmetic facts. Informal items tap into children’s abilities to count and reason about quantitative relations and transformations that draw on acquired knowledge but are not explicitly trained or memorized. Although some test items overlap with the skills measured in the previous section on verbal counting acquisition, the TEMA-3 represents math achievement at a broader level. Importantly, the achievement scores that result from the TEMA-3 reflect knowledge on a wide range of mathematical skills including, but not limited to, counting ability. We compared boys’ and girls’ performance on the TEMA-3 overall and on items tapping into formal vs. informal math achievement separately.

Boys and girls did not differ in overall math achievement, suggesting that children show equal understanding of math concepts in early childhood (Fig. 5; t test: t(273) = 1.11, p = 0.27, boys’ mean = 32.32, girls’ mean = 30.04, difference = 2.28, 95% CI = −1.76 to 6.31; equivalence test: t 1 (273) = 5.25, p < 0.001; t 2 (212) = −3.04, p = 0.001; test of equality of variances: F(1, 273) = 0.002, p = 0.99; boys’ s.d. = 16.96 girls’ s.d. 17.02). This pattern was consistent across age suggesting that during early childhood boys and girls show equal competency for math concepts (regression: F(3, 271) = 224.3, p < 0.00001, R2 = 0.71; Gender: b = 3.81, t(271) = 0.70, p = 0.49; Age: b = 12.72, t(271) = 19.00, p < 0.0001; Gender × Age: b = 0.19, t(271) = 0.19, p = 0.85).

Fig. 5 Early childhood mathematics. a Average raw TEMA score for girls (red) and boys (blue). Error bars represent standard error of the mean. b Growth curves for performance on TEMA calculated across girls (red), boys (blue), and all children (black). Lightly shaded areas around boys’ and girls’ growth curves indicate 1 standard deviation above and below the mean growth curve. c Density distributions for percentage of girls (red) and boys (blue) at a given raw TEMA score Full size image

To look at differences in boys’ and girls’ performance by question type, we compared formal vs. informal math scores. We conducted a 2 (Formal/Informal) × 2 (Boys/Girls) repeated-measures analysis of variance (ANOVA) on a subset of the data from children who answered at least four formal questions and at least four informal questions (for a similar approach74). We found no interaction between gender and question type nor did we find a main effect of gender (Fig. 6; Gender: F(1, 207) = 0.56, p = 0.46; Question Type: F(1, 207) = 235.98, p < 0.0001; Gender × Question Type: F(1, 207) = 0.30, p = 0.58). Furthermore, we found statistical equivalence between boys’ scores and girls’ scores and no differences in variances for both formal and informal questions (Formal Questions: equivalence tests: t 1 (207) = 3.77, p = 0.0001; t 2 (107) = −3.44, p = 0.0003, boys’ mean = 0.46, girls’ mean = 0.46; variance test: F(1, 207) = 0.24, p = 0.63, boys’ s.d. = 0.17, girls’ s.d. = 0.17; Informal Questions: equivalence tests: t 1 (207) = 4.57, p < 0.00001; t 2 (207) = −2.65, p = 0.004, boys’ mean = 0.71, girls’ mean = 0.69; variance test: F(1, 207) = 0.32, p = 0.57; boys’ s.d. = 0.15, girls’ s.d. = 0.16). In addition, differences did not emerge for either question type when controlling for age or testing for interactions between gender and age (Formal Questions: F(3, 205) = 8.25, p = 0.00003, R2 = 0.12; Gender: b = 0.05, t(205) = 0.14, p = 0.74; Age: b = 0.06, t(205) = 3.70, p = 0.0003; Gender × Age: b = −0.01, t(205) = −0.25, p = 0.81; Informal Questions: F(3, 205) = 19.62, p < 0.0001, R2 = 0.22; Gender: b = 0.16, t(205) = 1.29, p = 0.20; Age: b = 0.09, t(205) = 6.09, p < 0.00001; Gender × Age: b = −0.02, t(205) = −1.05, p = 0.29).

Fig. 6 Early childhood formal and informal mathematics. a Average proportion correct (mean score) for administered TEMA informal and formal questions for girls (red) and boys (blue). Error bars represent standard error of the mean. b Proportion correct (mean score) for administered Informal and Formal Questions plotted by age. Lightly shaded areas around the regression lines indicate the 95% confidence interval Full size image

In sum, we did not find any robust performance differences in early childhood math ability between boys and girls. Differences did not emerge with age or by question type. This suggests that boys and girls show equal competency forming mathematics concepts in early childhood.