Effect sizes at the extreme tails of the score distribution

Figure 1 shows the gender gaps in reading, mathematics and science, by education levels. Detailed results (by survey) are presented in Table 1. In reading, the effect size on the mean at the primary level is 0.23, and is much larger at the secondary level (0.40), both effects being in favour of females. In mathematics and science, the effect sizes on the means are much lower. In mathematics, the effect size is −0.04 with international studies focusing on primary education, and is −0.07 for studies involving students in secondary education. In science, the male advantage is −0.05 for the youngest students and −0.07 for the oldest.

Fig. 1 Gender effect sizes in mean and extreme tails of the distribution in reading, math. and science. Gender effect sizes at percentiles 5, 10, 90, 95 and on the mean scores, by content area and education level. Positive values indicate higher scores for females; negative values indicate higher scores for males. Data IEA (TIMSS/PIRLS) and OECD (PISA) surveys, 1995–2012 Full size image

Table 1 Gender effect sizes in mean and extreme tails of the distribution in in reading, math. and science, by survey Full size table

The effect sizes computed at the extreme tails of the distribution show that the size of the gender gap varies also according to the proficiency level of the students. In reading, even if the gender differences were fairly large along the entire proficiency distribution, they were particularly large at the lower tail, since effect sizes were sometimes about twice as large as at the upper tail. The largest gap in reading was observed on PISA 2012 (0.58) for the weakest students (percentile 5). The phenomenon was more pronounced in PISA and/or at the secondary level of education: the available database did not make it possible to distinguish the effect of educational level from the survey effect. At primary level, only IEA data were available, while at the secondary level, only PISA data were available.

In mathematics, effect sizes were smaller than in reading, but again, the size of the effect varies according to the proficiency level of the students. At the lower tail of the distribution, effect sizes were close to zero or in favour of females, while systematically at the upper tail, males were more proficient. The largest gap in mathematics was observed on PISA 2003 (−0.24) for the most proficient students (percentile 95). This tendency was more pronounced at the secondary level of education and in the PISA surveys. In the IEA Population II studies, the tendency for males to outperform females at the upper end of the distribution decreased across time.

Science results appear similar to mathematics results on Fig. 1, but looking at the data by survey (Table 1) reveals a situation somewhat more complex. At the primary level, a slight tendency for girls to be more proficient at the lower tail and for males to be more proficient at the upper tail was observed. At the secondary level, the gender difference in favour of males observed up to TIMSS 1999, at the lower and at upper tail of the distribution, has changed since the year 2000 in both the IEA and PISA surveys: at the lower tail, girls tend to perform somewhat better than males, and the male advantage at the upper tail tends to fade away across time.

Gender differences in variability

Table 2 focuses on gender variability. Four categories are presented: (1) the proportion of countries where the gender variance ratio was significantly greater than 1 (i.e. males’ variance is significantly greater than females’ variance); (2) the proportion of countries where the gender variance ratio is greater than 1, but not significantly; (3) the proportion of countries where the gender variance ratio is lower than 1, but not significantly; (4) the proportion of countries where the gender variance ratio is significantly lower than 1 (i.e. females’ variance is significantly greater than males’ variance). For each study, the mean of the country variance ratios and its standard error is also provided.

Table 2 Gender differences in variance ratios in reading, mathematics, and science Full size table

In 93 % of the 1654 cases, variance ratios are greater than one, which means that males' variance is larger than females'. Males’ results are more widespread than females’ results. The difference is statistically significant in 48 % of cases. This pattern is found whatever the content area (reading, mathematics and science), the educational level (primary/secondary), the year of the survey, or the study sample design (grade-/age-based samples). In only two of the 1654 cases is the variance of the female population significantly higher than the variance of the male population. The variance ratios are lower than one but not significantly in 7 % of cases, which means that on those few occasions, female variance is larger in absolute terms than male variance.

As can be seen in Table 2, the general pattern of greater variance for males changes, sometimes substantially, according to the domain or the educational level or between the IEA and the OECD. For instance, in reading, male variance is significantly greater than female variance in 238 (or 58 %) of the 410 cases. This proportion is larger than that observed for science (49 %) and for mathematics (42 %). Males at the secondary level more often (in 54 % of the cases) present a significantly larger variability than males at the primary level of education (30 % of significant gender difference at this level). In terms of the agencies organising the surveys, PISA surveys present much more variance ratios greater than 1 (64 %) than the IEA surveys (33 %).

The high proportion of variance ratio greater than 1 does not inform how much larger the variance for the male subpopulation is compared to the variance of the female subpopulation. On average, for all studies and countries, the variance ratio is 1.14. This means that on average, male variance is 14 % higher than female variance. Variance ratios range from 1.08, for PIRLS 2006, and 1.22, for PISA 2012, again in reading.

The variance ratios do not change much according to the content assessed, nor according to the agency organising the survey. There is almost no difference between educational levels in science and mathematics. In reading, however, the mean variance ratio at the primary level is 1.09 and increases to 1.19 at the secondary level.

Looking at the year of the surveys, no clear trend appears, either by content area or organisation.

One main finding emerges from this analysis: there are almost no exceptions to the higher male variance. The differences between content areas, educational levels, organisations, and surveys are quite slight, except for the difference between PISA and PIRLS in reading. One can just notice that the smallest variance ratio is found in primary reading, computed on PIRLS data, while the largest is also in secondary reading, computed on PISA data. This result might suggest that in reading the gender difference in variability increases with student age.

Nowell and Hedges (1998) found a strong correlation (0.74) between the variance ratio and the effect sizes of the mean gender difference. We also computed this correlation. With the data used in this study, the correlation is 0.42. It is worth noting that it ranges from 0.50 (for the correlation between the variance ratio and percentile 5) to 0.31 (for the correlation between the variance ratio and percentile 95). This indicates that the more males’ scores vary compared to females’ scores, the larger the difference between males and females at the lower end of the distribution.