In a January 2005 speech, Harvard President Lawrence Summers provoked the proverbial firestorm by suggesting that women lacked the ‘intrinsic aptitude’ of women for math, science and engineering (story in the Boston Globe on the incident). Summers was merely stating out loud what many people believe: that inherent differences between men and women cause significant inequalities in aptitude for math (and presumably also for art history, Coptic studies, or cultural anthropology, but they usually get a lot less attention…).

A recent report in Science by Janet S. Hyde and colleagues, ‘Gender Similarities Characterize Math Performance,’ used a mass of standardized testing data generated under the No Child Left Behind program to compare male and female performance and found that the scores were more similar than different. The gap in average performance on math tests has shrunk significantly since the 1970s, disappearing in most states and grades for which the research team could get good data. According to Marcia C. Linn of the University of California, Berkeley, one of the co-authors of the study: ‘Now that enrollment in advanced math courses is equalized, we don’t see gender differences in test performance. But people are surprised by these findings, which suggests to me that the stereotypes are still there.’

From the way that this report has been discussed, it seems clear that the data has not settled this question in many people’s minds. Tamar Lewin of The New York Time covered the story in (‘Math Scores Show No Gap for Girls, Study Finds‘) provoking comments on a wide range of websites, including some who insisted that the team led by Hyde missed entirely the point being made by Summers or that Lewin had misread the study (some accusing her of feminist bias). In contrast, Keith J. Winstein of The Wall Street Journal focused not on the average scores, but on the results at the top end of the bell curve, writing, Boys’ Math Scores Hit Highs and Lows, which highlights the discussion of variance in boys’ scores.

Although I briefly want to go over the study and the way its being interpreted, I’m more interested in the shift in test scores over time because I think that the movements in these numbers, including gaps that disappear over time (or don’t), point to a basic problem in the tests themselves. Well, not a problem in the tests—they’re very sophisticated instruments for assessing certain kinds of performance on selected tasks—but rather with the common assumption about what these tests actually reveal and the nature of ‘math ability.’ For me, this larger point is more important for neuroanthropology because it applies to far more than just the ‘math gap.’



The gap in test scores: sample bias?

The gap between boys and girls in standardized math tests in the 1970s and 1980s seemed to open wider as children grew older. From a statistical dead heat early in grade school, a pronounced inequality developed by high school that only grew worse at successive stages, up to a near total male dominance of PhDs, math contests, and high status faculty positions in fields like engineering, physics, and mathematics. A number of people point to the results of things like mathematics olympiads or major prizes for theoretical physics to show that, at the upper end of ability, male dominance is complete.

Although some explained the ‘math gap’ as the result of brain differences (such as differences in spatial sense or abstract reasoning) in boys or girls that affected math ability, others insisted that the inequality was caused by social forces, stereotypes, or other factors. One explanation was that the ‘math gap’ was a self-fulfilling prejudice; the assumption that boys would do better both encouraged boys’ performance and discouraged girls from developing ability in mathematics. Another argument was that girls were discouraged from publicly demonstrating academic gifts in general as they brought stigma (as a geeky young man, I find it hard to believe that my female peers were more severely stereotyped, but that’s a different story). Some critics pointed to the way that changing patterns of enrollment in advanced math classes steadily seemed to be eating away at the ‘math gap’; they felt that if girls pursued the same educational opportunities, they would have similar results.

But another explanation for the ‘math gap’ in performance was a sample bias problem that Hyde and colleagues sought to address in this study. The research team pointed out that only college-bound students took the SAT (and ACT) traditionally , and more girls than boys took the test (100,000 or so more every year for the SAT). The lower average for girls’ test scores might have arisen from the fact that the larger number of girls taking the tests meant that their scores reflected a deeper dip into the talent pool, with a larger percentage of female students being scored. The additional students — perhaps not the most intellectually gifted girls — were pulling down the girls’ average score.

Changing policies for administering standardized tests, especially making them mandatory, might provide a better sample, less prone the bias of one group participating at a higher frequency. For example, in 2002, Colorado and Illinois mandated that every graduating high school senior take the ACT; the gender gap between boys and girls disappeared when girls were no longer over-represented in the test. In fact, girls demonstrated a slightly higher average math score than boys. As the research team writes: ‘These findings support the conclusion that the male advantage on the SAT mathematics test is largely an artifact of sampling.’ In other words, boys scored better because fewer took the test.

Research results from the Hyde team on US scores

The No Child Left Behind program forced states to administer standardized tests broadly. The team led by Hyde, drawn from faculty at the University of Wisconsin and the University of California, Berkeley, contacted all 50 states to try to get access to these scores, but only 10 gave them enough information to make their samples useful. Based on test scores for 7 million students, the research team found no difference in the average math scores; this seemed to be the culmination of a trend of girls steadily gaining ground in math scores, first among younger age groups and recently through adolescence. The following chart shows the gaps at different grade levels for the ten states:



This chart originally appeared in Science magazine with the article. From the UC Berkeley News website, we have the following explanation:

Each square represents a grade level in one of 10 U.S. states. At the center of the chart (the 0 mark), the two genders performed equally in math, with increasing differences between boys and girls toward the left (where girls outperformed boys) and right (where boys outperformed girls). When researchers averaged the results, they found no difference between the two genders in their math proficiency. The 10 states were New Mexico (olive), Kentucky (fuchsia), Wyoming (tan), Minnesota (blue), Missouri (red), West Virginia (lavender), Connecticut (green), California (yellow), Indiana (aqua), and New Jersey (purple).

Hyde’s earlier work (e.g., Hyde 2005) also tested the ‘innate sex differences in math ability hypothesis,’ but the meta-analysis of extant studies is just never going to get the traction that 7 million test scores will. The current study seems to show convincingly that there is no inherent difference between boys and girls on average in mathematics proficiency. This hardly demonstrates that boys and girls have identical brains or thought processes, nor does it really refute the kind of argument that Harvard’s president Summers was making, but it does seem to suggest that the stereotype of girls being unable to do maths is widely inaccurate (not just in a few exceptional cases).

Math and reading gaps and gender equality globally

In fact, the story of the change in the ‘math gap’ in the United States seems to mirror a pattern that is also seen across cultures: changing status of women seems to correlate pretty strongly with the math gap. When women are treated more equally, it shows up in girls’ math scores. Studies of students in different societies show that the difference between boys’ and girls’ averages is not constant, and can be reversed. The Economist recently ran a story on gender achievement gaps in different places based on another Science report by Luigi Guiso of the European University Institute in Florence and colleagues: Diversity: Culture, Gender, and Math.

Guiso’s team used results from the Programme for International Student Assessment (PISA) run by the Organization for Economic Cooperation and Development (OECD). A total of more than 275,000 15-year-olds took the PISA exam in 40 countries. The gap between girls and boys on math was, on average, 2 percent, although the results varied, and the size of the gap correlated with measures of gender inequality. In addition, girls scored on average 7 percent higher than boys on reading, with average boys’ reading scores matching girls’ in not a single country.



The diagram shows a pattern that continued across the pool (see the supplementary data to the original report for the averages on all of the countries participating in the PISA testing).

Interestingly, the gap between boys and girls in geometry was immune to the effects of social equality: boys enjoyed a consistent advantage in the subject. And the female ‘reading gap’ actually grew with greater equality for women, suggesting that greater equality led to better education for girls, and better performance, across subjects (not a real surprise). As the conclusion to Guiso’s paper makes clear:

This evidence suggests that intra-gender performance differences in reading versus mathematics and in arithmetic versus geometry are not eliminated in a more gender-equal culture. By contrast, girls’ underperformance in math relative to boys is eliminated in more gender-equal cultures. In more gender-equal societies, girls perform as well as boys in mathematics and much better than them in reading.

If the math gap explains male dominance in physics and math, and academic hiring was relatively unbiased, then why isn’t there an even more pronounced preference for admitting women into PhD programs and hiring them as faculty in fields like English literature, law, and history?

The gap between men and women in reading scores also suggests another explanation for the math gap: if choice of career is based on comparative advantage (rather than just absolute fitness), then women would likely choose a career where there much greater advantage in literary skill might produce a clearer superiority. That is, even if their math scores go up, their reading scores go up too, so they still enjoy a greater comparative advantage over men in fields that might be considered ‘traditionally female.’

Performance at the extremes

Alex Tabarrok at Marginal Revolution, in a post titled, Summers Vindicated (again), highlights the crucial issue of variance in test scores. That is, for understanding the highest achievers, the averages don’t really matter; we’re talking about the extremes, the geniuses, the Nobel Prize-winning physicists. If the variation in boys’ performance is larger than that of girls, more boys will likely wind up in the ‘upper tail’ of performance, posting the highest scores (and lowest), while the averages could be dead even (or for that matter, male averages could even be lower).

Among white students, Hyde and the team from the University of Wisconsin and the University of California, Berkeley, found that the ratio of girls to boys in the top percentile was 1 to 2 ; twice as many exceptional scores (99 percentile scores) were turned in by white boys as white girls. Among Asian students, however, more girls than boys reached the top percentile (but only slightly more, statistically). Not enough Latino or African-American turned in scores in the top percentile to provide a ratio. (Another interesting wrinkle was that white boys significantly outperformed Asian and Pacific Islanders, both boys and girls, running against stereotypes in some parts of the United States, but that’s for another day…)

The Wisconsin-Berkeley team go on to suggest that the gap in achievement, although it might explain part of the disparity between men and women in certain occupations, doesn’t explain the severe gap in some fields. For example…

If a particular specialty required mathematical skills at the 99th percentile, and the gender ratio is 2.0, we would expect 67% men in the occupation and 33% women. Yet today, for example, Ph.D. programs in engineering average only about 15% women.

I don’t find this a particularly compelling argument — if the specialty required mathematical skill at a slightly higher proficiency, then there’s no inherent reason why 15% would not be the ‘right’ ratio of women to men in a program, but we’ll come back to that…

Some commentators point to this gap between (white) boys and girls in the top percentile to argue that, in fact, Summers’ tempest-provoking speech was inadvertently supported by the report in Science. Whereas the NYTimes report says boys and girls are equal, Heather MacDonald of the Manhattan Institute begs to differ:

Actually, the study, summarized in the July 25 issue of Science, shows something quite different: while boys’ and girls’ average scores are similar, boys outnumber girls among students in both the highest and the lowest score ranges. Either the Times is deliberately concealing the results of the study or its reporter cannot understand the most basic science reporting.

According to MacDonald’s reading, ‘Science’s analysis of math test scores only confirms the hypothesis that cost Summers his Harvard post: that boys are found more often than girls at the outer reaches of the bell curve of abstract reasoning ability.’ Or, as Alex Tabarrok puts it more directly, ‘we can expect that there will be more math geniuses and more dullards, among males than among females.’

MacDonald and others, like Luboš Motl at The Reference Frame, argue that, even this ‘upper tail’ effect underestimates the impact of higher male variance because these tests are simply too simple, too easy, to illuminate the differences between men and women’s brains at the extreme of performance. In my day, if you scored over about a 700 on the SAT math test (I believe — please don’t quote me on that one, it’s been more than a couple decades), you were in the 99th percentile. In fact, a lot of variety was masked by the ‘same’ percentile score. This lack of resolution makes it difficult to study the really extraordinary performers from these tests.

Moreover, the Wisconsin-Berkeley researchers found that most standardized state math tests did not include the most difficult sorts of problems, those demanding complex reasoning or problem solving ability, so it is unlikely that the test even could differentiate among elite levels of performance. In addition, the simple problems are a sad commentary on the state of American secondary education and the stultifying effect of the No Child Left Behind testing regimen, which demands students be tested but encouraged dilution of the tests to make sure that states met goals for funding through a set of perverse incentives.

Luboš Motl, who writes as a ‘conservative physicist,’ takes Hyde and her colleagues to task for not really focusing on measures of extraordinary mathematical talent (Motl’s pretty negative on the research, alleging that the authors have been ‘writing similar cargo cult scientific papers for quite some time.’). Motl, in Janet Hyde: boys = girls in math? Not really, points to the extreme gender gap in winners in the US Mathematical Olympiads, major math prizes, and other more elite forums for demonstrating math ability than the SAT. As he puts it near the end of his post, ‘the more selective your math-related tests become, the lower percentage of females you will obtain (the boys have a greater variance of the distribution).’ Some of the disparities are startling; you can’t even call some ‘disparities’ as there is simply no female representation in the history of certain prizes, and this dominance persists in spite of social change, changes in education, and a host of other factors.

Motl serves up a fair amount of vitriol for ‘political correctness’ and references a study showing that certain fields in the humanities and social sciences have higher numbers of faculty espousing ‘politically correct’ views (see ‘Defining Political Correctness and Its Non-Impact’ on Inside Higher Ed). Although Motl’s attacks on ‘political correctness’ may be heavy handed, I found it fascinating and ironic that one of the key views that the reserachers on ‘political correctness’ used as a diagnostic was specifically about the ‘math gap.’ That is, one of the key questions used to identify which professors were ‘politically correct’ was whether the professor believed that ‘gender gaps in math and science fields are largely due to discrimination.’

What to make of this research and the critiques?

In the first place, the research by Hyde and colleagues tends to poke a gaping hole in the argument that there is an unvarying difference between all girls and all boys in mathematics ability (please note before you start writing hate mail: I know that this is not what most of the bloggers critical of the studies are arguing for…). Although the argument for sex differences may have shifted so that the current debate is focused on variance and extreme performances, let’s not forget that for a very long time, and still in a lot of people’s minds, this basic argument—boys are better at math than girls, period.—was taken for granted because of the ‘math gap’ in the average scores. That gap is now gone.

The Science article makes this case using a new data set (from the NCLB program) to overcome a sample bias in tests like the SAT. On this front, the case is pretty compelling, as is the Guiso-led research on European scores. I would hope that people supporting Summers’ perspective with reference to the data on extreme male performance in Hyde’s study (such as Winstein in The Wall Street Journal) will be equally passionate about denying this very common ‘boys-are-better-than-girls-in-math’ argument as they are about the alleged higher variance of boys.

Summer’s defenders have argued that his comments (and their defense) focus only on extreme performance, only on math geniuses; although the Hyde study discusses the 2 to 1 gap in achieving scores in the 99th percentile among white students, the authors argue that this gap does not explain an even greater disparity in women’s participation in PhD programs and faculty positions in mathematics-related fields. As I suggested above, I don’t find this part of the Wisconsin-Berkeley team’s argument terribly compelling. The 99th percentile cut-off is arbitrary; if it required an even rarer level of mathematics achievement, then, following from this logic, the disparity would be justified. The 2 to 1 gap in performance with such a large pool of (white) students is statistically significant.

But maybe because I work at a university, I don’t think university academic hiring is always a process of finding geniuses (god, I wish it were). Even if there are more male math geniuses, I’m still not sure that explains disparities in hiring or PhD programs; I’d want to see some proof that university professors are reliably geniuses or some concrete studies of social conditions within departments in these fields before I’d rule out the possibility that the gender gaps are at least partially ‘due to discrimination’ (Does that make me PC, or would I need to be more one-sided?). That is, even if women were participating at higher rates, it seems to me that studies of hiring and promotion are more convincing in demonstrating discrimination (or its absence) than statistical arguments about distribution in the field. In fact, there is statistical evidence to argue both ways (see, for example, studies of promotion and hiring compared to the percentage of female candidates in the pools).

The people with the highest math scores don’t necessarily wind up with careers in mathematics. Back in 2006, Jake Young at Pure Pedantry wrote an excellent post on the problems with the ‘upper tail’ hypothesis, Debunking the Upper Tail: More on the Gender Disparity, arguing that it depends upon two key assumptions that don’t hold: a) that you have to perform in the upper tail to go on to a career in these professions, and b) people with high scores in tests of mathematical ‘aptitude’ wind up in math, physics, and engineering. Check out his post for an excellent discussion (see the comments as well).

In the case of certain sorts of awards for creativity in scientific theory (like the Nobel Prize in physics), ability to do mathematics at the highest level may not (may not) be the key intellectual quality that produces the performance. That is, theoretical creativity may be a constellation of factors, including some personality traits as well as abilities considered classically ‘intellectual,’ that trumps pure mathematical problem solving ability. Ironically, male dominance in these awards may not be due to an advantage in mathematical ability (although this advantage might still exist) but because of other characteristics, which may or may not be innate or attributable to being male. In order to know this, we’d have to actually study Nobel Prize winners (we run into a similar issue with study of elite athletes being treated as representative of gender or ethnic groups when we don’t really have a clear sense of why these athletes are winning and whether there’s a consistent source of superiority).

Testing and changes in group ability over time

The bigger point for me, however, is that I’m still not convinced that math tests are testing innate ability. The whole shift in the US ‘math gap’ over time, like the shifts in European countries linked to greater gender equality, is a powerful demonstration that intellectual qualities of an entire group, as measured by standardized tests, can shift over time, even on a scale so enormous that one might reasonably expect environmental factors to be swamped. The fact that the math gap between boys and girls shrank so markedly between the 1970s and 2003, where the clear causes are educational and social, puts the burden of proof squarely on those who want to argue that mathematics abilities are genetic. Even the Flynn Effect, the fact that IQ scores tend to rise over time (discussed here and here at Neuroanthropology), strongly suggests that ‘intelligence’ is a moving target, likely a synergy of different abilities and traits, some of which are plastic.

The terribly consistent geometry gap has the hallmark of some sort of innate difference, but even this might be deceptive. Science Daily recent ran a story, Plastic Brain Outsmarts Experts: Training Can Increase Fluid Intelligence, Once Thought To Be Fixed At Birth, detailing how one dimension of intelligence thought to be relatively fixed, in fact, could be molded by training (and not terribly demanding training at that, unlike a really good mathematics education).

Even if girls and boys had identical mathematics scores, that wouldn’t mean that we’ve proven their brains are indistinguishable (they’re not), nor even that they both have the ‘same’ mathematical ability (they may be achieving comparable scores in different ways). The Hyde-led research team tried to determine whether boys and girls were doing better or worse than each other on different types of questions, but the tests were too remedial to even discern this kind of effect because they had no demanding test questions (although the Guiso-led team’s information on a ‘geometry gap’ being resistant to change is suggestive).

In this sense, I agree strongly with Luboš Motl: the big story here is not the scores, but the fact that the tests are so watered down, likely reflecting the disintegration of high expectations and rigorous mathematics training. I have a suspicion that, no matter how bad the overall education, you’re likely to still get some exceptional performers, but the effect on the vast majority of children will be tragic. Without rigorous math training, one of the opportunities we have to encourage students to develop a whole range of neurological capacities is being squandered.

One neuroanthropological perspective on ‘intelligence’ (or ‘mathematic ability’)

As a neuroanthropologist, I have no problem acknowledging differences between men and women; I think it’s simply a statistical fact that men and women are performing differently, as Motl points out. Nor would I deny that there might be significant cognitive differences between groups of people, including men and women, or that these differences might be grounded in biological differences in the brain. But there seems to be a very quick rush to believing that these tests demonstrate some constant and innate difference between boys and girls; the changes highlighted in both studies published in Science should, at the very least, slow the quick jump from a disparity in scores to assumptions about permanent sex differences. The gap in math scores (and reading scores for that matter) moves, shrinks, changes, grows, or even disappears over time; one would think that this might give those who think ‘intelligence’ is innate reason for pause.

My greater objection in some of the discussion is the lack of a more fine-grained analysis of how any particular difference (such as on a test) arises—it’s often simply attributed to a kind of ‘boy-ness’ or ‘girl-ness.’ Yes, a difference in elite performance might arise because there’s more ‘math power’ in a boy’s brain, but that still leaves us begging a whole lot of questions: what is ‘math power’? how did they get it? why don’t all boys have it? can we give it to girls, or to more boys? do the exceptional girls, however rare, have the same mental ability? if the girls do, is it specifically ‘male’ in some sense, or is it just more likely to occur in boys’ brains?.

For example, there’s often an assumption that greater variance in boys is either genetic or hormonal, and although that could be the case, there are other potential explanations, including mixed explanations that partially, but don’t wholly rely on genetic or hormonal explanations.

We could offer an explanation that tips towards social psychology; for example, just as the boy math geek may be less stigmatized than the girl math geek (as some people argue), the boy math tragedy may be more de-motivated than a girl performing badly. Boys’ exit from education may be supported by male peers or even influenced by innate non-mathematical factors, like having a stronger temper, more pronounced need to defend the sense of self, or more well developed ‘screw-school’ independence. In the ‘math prodigy’ boy, the same temper, desire to defend the self, and independence could boost achievement, especially mathematical creativity.

We could offer an explanation that relies more on the fitness for the testing dynamic itself rather than brain ‘math power’; for example, the high pressure and tight time constraints of very challenging tests (the kinds in math contests) could reward certain kinds of behavioural patterns (high risk, ‘sloppy’ but quick problem solving) or a specific constellation of traits in addition to mathematics ability, so that they are testing other behavioural traits or intellectual abilities in addition to problem solving. Boys’ bodies might respond differently to stress than girls’, producing a better mental state in which to take tests. These other traits might even be ‘innate’ differences between boys and girls (they might not be), but they would be recorded on the test as ‘mathematical ability.’

That is, showing greater variance, even showing that it’s rooted in inherent male-female differences, still doesn’t demonstrate that male brains are inherently better at specific math functions. The effect of other sorts of differences may be indistinguishable on a test, but the implications are profound for how you might redress inequality, test ability, or even how you might teach mathematics to the boys who don’t do well.

In addition, although someone’s liable to criticize this line taken out of context, I’ll write it: there’s an assumption that math tests are testing ‘math ability’ when they might be testing something else in addition to ‘math ability’ (like behavioral responses to stress, test taking strategy, or a number of other things). That is, there’s an assumption that a math test tests a trait, ‘math ability,’ and that it’s distinguishable from other traits, like ‘reading ability,’ in some meaningful sense. I think that it’s more likely that math tests test the ability to take math tests; not only does this include other traits, but even ‘math ability’ itself may be composed of a cluster of functions and abilities, which are themselves diverse in terms of how a psychologist might describe them (‘fixed’ or ‘fluid’ intelligence, ‘procedural’ and ‘propositional’ memory, etc.). Likely, not everyone who does well on a math test is doing the same thing. When I won a math test without a calculator, and the guy who finished second used one, I’m pretty sure that we were engaging in different calculating practices and techniques as we moved through the question.

Although some people may be uninterested in these differences between ways that people achieve solutions in mathematics (perhaps because they’re more interested about differences between boys and girls), from a neuroanthropological perspective, this variation is crucial. Because we are interested in cognitive variation, and in how different practices (like variations in mathematics pedagogy) shape distinct forms of competency, identical scores may actually mask significant cultural-cognitive (or gender-cognitive, for that matter) differences. It may be as important to analyze the apparent parities as to mind the gaps….

I’m going to do another post on ‘intelligence’ soon. This one has taken me too much time to finish, and it’s gotten too long. If you’re in the mood for more, however…



Additional resources

Read more at No Gender Differences In Math Performance and at CNN: Study: Girls equal to boys in math skills. On the blog Ars Technica, Math gender gap gone in grade school, persists in college by Yun Xie (interesting thoughts in that one).

P. Z. Myers at Pharyngula also has a post on a closely related debate (on science, as well as mathematics, and gender disparity): Motivating students (and motivating women) to pursue science careers. Myers does a nifty rhetorical judo take-down on those who use test scores to argue for innate differences; if test scores demonstrate unchangeable ability, then why don’t Americans stop spending money on math and science education for US students and import more Asian kids?

I would never want to be accused of not giving opposing perspectives adequate space: along those lines, there’s a discussion of bias in the sciences over at the ‘Tierney Lab’ at The New York Times website: Intellectual Dishonesty on Sex Bias?

Stumble It!

Still haven’t had enough? There’s an update: Women on tests update: response to stress.

Credits:

Chart 1: Univ. of Wisconsin and UC Berkeley; published in Science magazine; downloaded from http://www.berkeley.edu/news/media/releases/2008/07/24_math.shtml on 31 July 2008.

Chart 2: The Economist, Education and Sex: Vital Statistics.

Photo from a University of Michigan news release from 2003.

References

Guiso, Luigi, Ferdinando Monte, Paola Sapienza, and Luigi Zingales. 2008. Diversity: Culture, Gender, and Math. Science 320 (5880): 1164-1165. doi 10.1126/science.1154094

Hyde, Janet Shibley. 2005. The Gender Similarities Hypothesis. American Psychologist 60(6): 581-592. doi 10.1037/0003-066X.60.6.581.

Hyde, Janet S., Sara M. Lindberg, Marcia C. Linn, Amy B. Ellis, and Caroline C. Williams. 2008. Gender Similarities Characterize Math Performance. Science 321 (5888): 494-495. doi 10.1126/science.1160364