Executive summary

Education policymakers and analysts express great concern about the performance of U.S. students on international tests. Education reformers frequently invoke the relatively poor performance of U.S. students to justify school policy changes.

In December 2012, the International Association for the Evaluation of Educational Achievement (IEA) released national average results from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS). U.S. Secretary of Education Arne Duncan promptly issued a press release calling the results “unacceptable,” saying that they “underscore the urgency of accelerating achievement in secondary school and the need to close large and persistent achievement gaps,” and calling particular attention to the fact that the 8th-grade scores in mathematics for U.S. students failed to improve since the previous administration of the TIMSS.

MORE: This is a corrected version of a report initially posted on January 15, 2013. The corrections do not affect the report’s conclusions. Details can be found in

AUDIO: Authors speak with the press about their report (MP3) This is a corrected version of a report initially posted on January 15, 2013. The corrections do not affect the report’s conclusions. Details can be found in “Response from Martin Carnoy and Richard Rothstein to OECD/PISA comments (PDF)

Two years earlier, the Organization for Economic Cooperation and Development (OECD) released results from another international test, the 2009 administration of the Program for International Student Assessment (PISA). Secretary Duncan’s statement was similar. The results, he said, “show that American students are poorly prepared to compete in today’s knowledge economy. … Americans need to wake up to this educational reality—instead of napping at the wheel while emerging competitors prepare their students for economic leadership.” In particular, Duncan stressed results for disadvantaged U.S. students: “As disturbing as these national trends are for America, enormous achievement gaps among black and Hispanic students portend even more trouble for the U.S. in the years ahead.”

However, conclusions like these, which are often drawn from international test comparisons, are oversimplified, frequently exaggerated, and misleading. They ignore the complexity of test results and may lead policymakers to pursue inappropriate and even harmful reforms.

Both TIMSS and PISA eventually released not only the average national scores on their tests but also a rich international database from which analysts can disaggregate test scores by students’ social and economic characteristics, their school composition, and other informative criteria. Such analysis can lead to very different and more nuanced conclusions than those suggested from average national scores alone. For some reason, however, although TIMSS released its average national results in December, it scheduled release of the international database for five weeks later. This puzzling strategy ensured that policymakers and commentators would draw quick and perhaps misleading interpretations from the results. This is especially the case because analysis of the international database takes time, and headlines from the initial release are likely to be sealed in conventional wisdom by the time scholars have had the opportunity to complete a careful study.

While we await the release of the TIMSS international database, this report describes a detailed analysis we have conducted of the 2009 PISA database. It offers a different picture of the 2009 PISA results than the one suggested by Secretary Duncan’s reaction to the average national scores of the United States and other nations.

Because of the complexity and size of the PISA international database, this report’s analysis is restricted to the comparative test performance of adolescents in the United States, in three top-scoring countries, and in three other post-industrial countries similar to the United States. These countries are illustrative of those with which the United States is usually compared. We compare the performance of adolescents in these seven countries who have similar social class characteristics. We compare performance in the most recent test for which data are available, as well as trends in performance over the last nearly two decades.

In general, we find that test data are too complex and oversimplified to permit meaningful policy conclusions regarding U.S. educational performance without deeper study of test results and methodology. However, a clear set of findings stands out and is supported by all data we have available:

Because social class inequality is greater in the United States than in any of the countries with which we can reasonably be compared, the relative performance of U.S. adolescents is better than it appears when countries’ national average performance is conventionally compared.

Because in every country, students at the bottom of the social class distribution perform worse than students higher in that distribution, U.S. average performance appears to be relatively low partly because we have so many more test takers from the bottom of the social class distribution.

A sampling error in the U.S. administration of the most recent international (PISA) test resulted in students from the most disadvantaged schools being over-represented in the overall U.S. test-taker sample. This error further depressed the reported average U.S. test score.

If U.S. adolescents had a social class distribution that was similar to the distribution in countries to which the United States is frequently compared, average reading scores in the United States would be higher than average reading scores in the similar post-industrial countries we examined (France, Germany, and the United Kingdom), and average math scores in the United States would be about the same as average math scores in similar post-industrial countries.

A re-estimated U.S. average PISA score that adjusted for a student population in the United States that is more disadvantaged than populations in otherwise similar post-industrial countries, and for the over-sampling of students from the most-disadvantaged schools in a recent U.S. international assessment sample, finds that the U.S. average score in both reading and mathematics would be higher than official reports indicate (in the case of mathematics, substantially higher).

This re-estimate would also improve the U.S. place in the international ranking of all OECD countries, bringing the U.S. average score to sixth in reading and 13th in math. Conventional ranking reports based on PISA, which make no adjustments for social class composition or for sampling errors, and which rank countries irrespective of whether score differences are large enough to be meaningful, report that the U.S. average score is 14th in reading and 25th in math.

Disadvantaged and lower-middle-class U.S. students perform better (and in most cases, substantially better) than comparable students in similar post-industrial countries in reading. In math, disadvantaged and lower-middle-class U.S. students perform about the same as comparable students in similar post-industrial countries.

At all points in the social class distribution, U.S. students perform worse, and in many cases substantially worse, than students in a group of top-scoring countries (Canada, Finland, and Korea). Although controlling for social class distribution would narrow the difference in average scores between these countries and the United States, it would not eliminate it.

U.S. students from disadvantaged social class backgrounds perform better relative to their social class peers in the three similar post-industrial countries than advantaged U.S. students perform relative to their social class peers. But U.S. students from advantaged social class backgrounds perform better relative to their social class peers in the top-scoring countries of Finland and Canada than disadvantaged U.S. students perform relative to their social class peers.

On average, and for almost every social class group, U.S. students do relatively better in reading than in math, compared to students in both the top-scoring and the similar post-industrial countries.

Because not only educational effectiveness but also countries’ social class composition changes over time, comparisons of test score trends over time by social class group provide more useful information to policymakers than comparisons of total average test scores at one point in time or even of changes in total average test scores over time.

The performance of the lowest social class U.S. students has been improving over time, while the performance of such students in both top-scoring and similar post-industrial countries has been falling.

Over time, in some middle and advantaged social class groups where U.S. performance has not improved, comparable social class groups in some top-scoring and similar post-industrial countries have had declines in performance.

Performance levels and trends in Germany are an exception to the trends just described. Average math scores in Germany would still be higher than average U.S. math scores, even after standardizing for a similar social class distribution. Although the performance of disadvantaged students in the two countries is about the same, lower-middle-class students in Germany perform substantially better than comparable social class U.S. students. Over time, scores of German adolescents from all social class groups have been improving, and at a faster rate than U.S. improvement, even for social class groups and subjects where U.S. performance has also been improving. But the causes of German improvement (concentrated among immigrants and perhaps also attributable to East and West German integration) may be idiosyncratic, and without lessons for other countries or predictive of the future. Whether German rates of improvement can be sustained to the point where that country’s scores by social class group uniformly exceed those of the United States remains to be seen. As of 2009, this was not the case.

Great policy attention in recent years has been focused on the high average performance of adolescents in Finland. This attention may be justified, because both math and reading scores in Finland are higher for every social class group than in the United States. However, Finland’s scores have been falling for the most disadvantaged students while U.S. scores have been improving for similar social class students. This should lead to greater caution in applying presumed lessons from Finland. At first glance, it may seem that the decline in scores of disadvantaged students in Finland results in part from a recent influx of lower-class immigrants. However, average scores for all social class groups have been falling in Finland, and the gap in scores between Finland and the United States has narrowed in each social class group. Further, during the same period in which scores for the lowest social class group have declined, the share of all Finnish students in this group has also declined, which should have made the national challenge of educating the lowest social class students more manageable, so immigration is unlikely to provide much of the explanation for declining performance.

Although this report’s primary focus is on reading and mathematics performance on PISA, it also examines mathematics test score performance in earlier administrations of the TIMSS. Where relevant, we also discuss what can already be learned from the limited information now available from the 2011 TIMSS. To help with the interpretation of these PISA and TIMSS data, we also explore reading and mathematics performance on two forms of the U.S. domestic National Assessment of Educational Progress (NAEP).

Relevant complexities are too often ignored when policymakers draw conclusions from international comparisons. Different international tests yield different rankings among countries and over time. PISA, TIMSS, and NAEP all purport to reflect the achievement of adolescents in mathematics (and PISA and NAEP in reading), yet results on different tests can vary greatly—in the most extreme cases, countries’ scores can go up on one test and down on another that purport to assess the same students in the same subject matter—and scholars have not investigated what causes such discrepancies. These differences can be caused by the content of the tests themselves (for example, differences in the specific skills that test makers consider to represent adolescent “mathematics”) or by flaws in sampling and test administration. Because these differences are revealed in the most cursory examination of test results, policymakers should exercise greater caution in drawing policy conclusions from international score comparisons.

To arrive at our conclusions, we made a number of explicit and transparent methodological decisions that reflect our best judgment. Three are of importance: our definition of social class groups, our selection of comparison countries, and our determination of when differences in test scores are meaningful.

There is no clear way to divide test takers from different countries into social class groups that reflect comparable social background characteristics relevant to academic performance. For this report, we chose differences in the number of books in adolescents’ homes to distinguish them by social class group; we consider that children in different countries have similar social class backgrounds if their homes have similar numbers of books. We think that this indicator of household literacy is plausibly relevant to student academic performance, and it has been used frequently for this purpose by social scientists. We show in a technical appendix that supplementing it with other plausible measures (mother’s educational level, and an index of “economic, social, and cultural status” created by PISA’s statisticians) does not provide better estimates. Also influencing our decision is that the number of books in the home is a social class measure common to both PISA and TIMSS, so its use permits us to explore longer trend lines and more international comparisons. As noted, however, data on these background characteristics were not released along with the national average scores on the 2011 TIMSS, and so our information on the performance of students from different social class groups on TIMSS must end with the previous, 2007, test administration.

In this report, we focus particularly on comparisons of U.S. performance in math and reading in PISA with performance in three “top-scoring countries” (Canada, Finland, and Korea) whose average scores are generally higher than U.S. scores, and with performance in three “similar post-industrial countries” (France, Germany, and the United Kingdom) whose scores are generally similar to those of the United States. We employed no sophisticated statistical methodology to identify these six comparison countries. Assembling and disaggregating data for this report was time consuming, and we were not able to consider additional countries. We think our choices include countries to which the United States is commonly compared, and we are reasonably confident that adding other countries would not appreciably change our conclusions. If other scholars wish to develop data for other countries, we would gladly offer them methodological advice.

Technical reports on test scores typically distinguish differences that are “significant” from those that are not. But this distinction is not always useful for policy purposes and is frequently misunderstood by policymakers. To a technical expert, a score difference can be miniscule but still “significant” if it can be reproduced 95 percent of the time when a comparison is repeated. But miniscule score differences should be of little interest to policymakers. In general, social scientists consider an intervention to be worthwhile if it improves a median subject’s performance enough to be superior to the performance of about 57 percent or more of all subjects prior to the intervention. Such an intervention should be considered “significant” for policy purposes, but, to avoid confusion, we avoid the term “significant” altogether. Instead, for PISA, we consider countries’ (or social class groups’) average scores to be “about the same” if they are less than 8 test scale points different (even if this small difference would be repeated in 95 of 100 test administrations), to be “better” or “worse” if they are at least 8 but less than 18 scale points different, and “substantially better” or “substantially worse” if they differ by 18 scale points or more. Eighteen scale points in most cases is approximately equivalent to the difference social scientists generally consider to be the minimum result of a worthwhile intervention (an effect size of about 0.2 standard deviations). The TIMSS scale is slightly different from the PISA scale; for TIMSS, the cut points used in this report are 7 and 17 rather than 8 and 18.

With regard to these and other methodological decisions we have made, scholars and policymakers may choose different approaches. We are only certain of this: To make judgments only on the basis of statistically significant differences in national average scores, on only one test, at only one point in time, without regard to social class context or curricular or population sampling methodologies, is the worst possible choice. But, unfortunately, this is how most policymakers and analysts approach the field.

The most recent test for which an international database is presently available is PISA, administered in 2009. As noted, the database for TIMSS 2011 is scheduled for release later this month (January 2013). In December 2013, PISA will announce results and make data available from its 2012 test administration. Scholars will then be able to dig into TIMSS 2011 and PISA 2012 databases and place the publicly promoted average national results in proper context. The analyses that follow in this report should caution policymakers to await understanding of this context before drawing conclusions about lessons from TIMSS or PISA assessments. We plan to conduct our own analyses of these data when they become available, and publish supplements to this report as soon as it is practical to do so, given the care that should be taken with these complex databases.

Part I. Introduction

A 2009 international test of reading and math showed that American 15-year-olds perform more poorly, on average, than 15-year-olds in many other countries. This finding, from the Program for International Student Assessment (PISA), is consistent with previous PISA results, as well as with results from another international assessment of 8th-graders, the Trends in International Mathematics and Science Survey (TIMSS).

From such tests, many journalists and policymakers have concluded that American student achievement lags woefully behind that in many comparable industrialized nations, that this shortcoming threatens the nation’s economic future, and that these test results therefore suggest an urgent need for radical school reform.

Upon release of the 2011 TIMSS results, for example, U.S. Secretary of Education Arne Duncan called them “unacceptable,” saying that they “underscore the urgency of accelerating achievement in secondary school and the need to close large and persistent achievement gaps” (Duncan 2012). Two years before, upon release of 2009 PISA scores, Duncan said that “…the 2009 PISA results show that American students are poorly prepared to compete in today’s knowledge economy. … Americans need to wake up to this educational reality—instead of napping at the wheel while emerging competitors prepare their students for economic leadership.” In particular, Duncan stressed the PISA results for disadvantaged U.S. students: “As disturbing as these national trends are for America, enormous achievement gaps among black and Hispanic students portend even more trouble for the U.S. in the years ahead. Last year, McKinsey & Company released an analysis which concluded that America’s failure to close achievement gaps had imposed—and here I quote—‘the economic equivalent of a permanent national recession.’” The PISA results, Duncan concluded, justify the reform policies he has been pursuing: “I was struck by the convergence between the practices of high-performing countries and many of the reforms that state and local leaders have pursued in the last two years” (Duncan 2010).

This conclusion, however, is oversimplified, exaggerated, and misleading. It ignores the complexity of the content of test results and may well be leading policymakers to pursue inappropriate and even harmful reforms that change aspects of the U.S. education system that may be working well and neglect aspects that may be working poorly.

For example, as Secretary Duncan said, U.S. educational reform policy is motivated by a belief that the U.S. educational system is particularly failing disadvantaged children. Yet an analysis of international test score levels and trends shows that in important ways disadvantaged U.S. children perform better, relative to children in comparable nations, than do middle-class and advantaged children. More careful analysis of these levels and trends may lead policymakers to reconsider their assumption that almost all improvement efforts should be directed to the education of disadvantaged children and few such efforts to the education of middle-class and advantaged children.

Education analysts in the United States pay close attention to the level and trends of test scores disaggregated by socioeconomic groupings. Indeed, a central element of U.S. domestic education policy is the requirement that average scores be reported separately for racial and ethnic groups and for children who are from families whose incomes are low enough to qualify for the subsidized lunch program. We understand that a school with high proportions of disadvantaged children may be able to produce great “value-added” for its pupils, although its average test score levels may be low. It would be foolish to fail to apply this same understanding to comparisons of international test scores.

Extensive educational research in the United States has demonstrated that students’ family and community characteristics powerfully influence their school performance. Children whose parents read to them at home, whose health is good and can attend school regularly, who do not live in fear of crime and violence, who enjoy stable housing and continuous school attendance, whose parents’ regular employment creates security, who are exposed to museums, libraries, music and art lessons, who travel outside their immediate neighborhoods, and who are surrounded by adults who model high educational achievement and attainment will, on average, achieve at higher levels than children without these educationally relevant advantages. We know much less about the extent to which similar factors affect achievement in other countries, but we should assume, in the absence of evidence to the contrary, that they do.

It is also the case that countries’ educational effectiveness and their social class composition change over time. Consequently, comparisons of test score trends over time by social class group provide more useful information to policymakers than comparisons of total average test scores at one point in time or even of changes in total average test scores over time.

Unfortunately, our conversation about international test score comparisons has ignored such questions. It would be foolish, for example, to let international comparisons motivate radical changes in educational policies in a country whose social class subgroup average scores were below those of other nations, if that country’s subgroups had been improving their performance at a more rapid rate than similar subgroups in other nations, even if the country’s overall average still had not caught up. Just as a domestic U.S. school’s average performance is influenced by its social class composition, so too might a country’s average performance be influenced by its social class composition.

The policy responses of educational reformers should be sufficiently nuanced to respond to such considerations, because policy initiatives might improve in response to more sophisticated inquiries.

For example, consider Country C. Its affluent students achieve better than affluent students in comparable countries, but not as much better as in the past; the performance of affluent students in Country C, while still relatively high, has been declining relative to the performance of affluent students in comparable countries. Country C’s socioeconomically disadvantaged students achieve less than disadvantaged children in comparable countries, but not as much less as in the past. The performance of disadvantaged students in Country C, while still relatively low, has been improving relative to the performance of disadvantaged students in comparable nations. In such circumstances, unsophisticated reformers in Country C might well decide to revamp how disadvantaged students are being taught, even though teaching methods have been successfully raising such students’ achievement relative to the achievement of similarly disadvantaged students in other countries and relative to the achievement of wealthier students in Country C itself. Such unsophisticated reformers might also ignore the condition of education of affluent students, believing that their relatively high performance suggests that no reform is needed, while overlooking the decline of such performance over time. Sophisticated education policymakers, in contrast, who have studied the data trends, might direct their reform efforts to the high-scoring rather than the low-scoring students.

Thus, in evaluating a country’s educational performance, we should want to know how children from different social class groups perform, in comparison to other social class groups within their own country and in comparison to children from similar social class groups in other countries. Describing only an “average” national score obscures what is likely to be more useful information. Yet it is only in terms of national averages that policy discussion of international test scores typically proceeds. U.S. policymakers would learn more if they also studied the performances of demographic (socioeconomic) subgroups and compared these to the performances of similar subgroups in other nations. To the extent international comparisons are important, it is critical to know whether each subgroup in the United States performs above or below the level of socioeconomically similar subgroups in comparable industrialized nations.

If we identify subgroups that perform relatively well or relatively poorly in one country or another, we should also ask how the performances of these subgroups, compared to the performances of similar subgroups in other nations, are changing over time. Are some subgroups improving their performance unusually rapidly, in comparison to socioeconomically similar subgroups in other nations, while other subgroups are exhibiting unusual deterioration in performance? Are various subgroups improving or declining in performance at different rates, and are these differences masked when we look only at national averages?

In this report, we also identify inconsistencies between various international tests that may well be related to inaccurate population sampling that has caused some tests to oversample some social class groups and undersample others. Such sampling errors inevitably lead to inaccuracies in reports of how students in a particular country perform, relative to those in other countries where the sampling may have been more accurate.

Other considerations, rarely considered in public debate, also influence the care we should take in the interpretation of international comparisons. One is how the curriculum is sampled in the framework for any particular test. Because the full range of knowledge and skills that we describe as “mathematics” cannot possibly be covered in a single brief test, policymakers should also carefully examine whether an assessment called a “mathematics” test necessarily covers knowledge and skills similar to those covered by other assessments also called “mathematics” tests, and whether performance on these different assessments can reasonably be compared. For example, American adolescents perform relatively well on algebra questions, and relatively poorly on geometry questions, compared to adolescents in other countries. Reports on how the United States compares to other countries show the United States in a more favorable light to the extent a test has more algebra items and fewer geometry items. Whether there is an appropriate balance between these topics on any particular international assessment is rarely considered by policymakers who draw conclusions about the relative performance of U.S. students from that assessment. Similar questions arise with regard to a “reading” test.

Whether U.S. policymakers want to reorient the curriculum to place more emphasis on geometry is a decision they should make without regard to whether such reorientation might influence comparative scores on an international test. It certainly might not be good public policy to reduce curricular emphasis on statistics and probability, skills essential to an educated citizenry in a democracy, in order to make more time available for geometry. There are undoubtedly other sub-skills covered by international reading and math tests on which some countries are relatively stronger and others are relatively weaker. Investigation of these differences should be undertaken before drawing policy conclusions from international test scores.

To stimulate an examination and discussion of these and several other complexities, we analyze data on the performance of adolescents from PISA and TIMSS, as well as from two forms of the National Assessment of Educational Progress (NAEP), a test given exclusively to a sample of U.S. students. The first form, Main NAEP, is modified in small ways over time, so that its coverage tracks modifications in the math curriculum. The second form, Long-Term Trend NAEP (LTT), which changes much less over time, assesses how students’ competence changes over time on a more nearly identical set of skills. The Main NAEP has been administered since 1990, and the LTT since the early 1970s.

Part II. PISA 2009—the comparative performance of U.S. students by social class group

Disaggregation of PISA test scores by social class group reveals some patterns that many education policymakers will find surprising. Average U.S. test scores are lower than average scores in countries to which the United States is frequently compared, in part because the share of disadvantaged students in the overall national population is greater in the United States than in comparison countries. If the social class distribution of the United States were similar to that of top-scoring countries, the average test score gap between the United States and these top-scoring countries would be cut in half in reading and by one-third in mathematics. Disadvantaged U.S. students perform comparatively better than do disadvantaged students in important comparison countries. The test score gap between advantaged and disadvantaged students in the United States is smaller than the gap in similar post-industrial countries; it is generally, although not always, greater than the gap in top-scoring countries. This section explores these findings in greater detail.

To simplify our comparisons of national average PISA scores and of these scores disaggregated by social class, we focus on the United States and six other countries—Canada, Finland, South Korea (hereinafter simply Korea), France, Germany, and the United Kingdom.

We refer to three of these countries (Canada, Finland, and Korea) as “top-scoring countries” because they score much better overall than the United States in reading and math—about a third of a standard deviation better. Canada, Finland, and Korea are also the three “consistent high-performers” that U.S. Secretary of Education Arne Duncan highlighted when he released the U.S. PISA results (Duncan 2010).

We call the other three (France, Germany, and the United Kingdom) “similar post-industrial countries” because they score similarly overall to the United States. They also are countries whose firms are major competitors of U.S. firms in the production of higher-end manufactured goods and services for world markets. Their firms are not the only competitors of U.S. firms, but if the educational preparation of young workers is a factor in national firms’ competitiveness, it is worth comparing student performance in these countries with student performance in the United States to see if these countries’ educational systems, so different from that in the United States, play a role in their firms’ success.

PISA is scored on a scale that covers very wide ranges of ability in math and reading. When scales were created for reading in 2000 and for math in 2003, the mean for all test takers from countries in the Organization for Economic Cooperation and Development (OECD), the sponsor of PISA, was set at 500 with a standard deviation of 100. When statisticians describe score comparisons, they generally talk about differences that are “significant.” Yet while “significance” is a useful term for technical discussion, it can be misleading for policy purposes, because a difference can be statistically significant but too small to influence policy. Therefore, in this report, we avoid describing differences in terms of statistical significance. Instead, we use terms like “better (or worse)” and “substantially better (or worse)” (both of which are significantly better for statistical purposes), and “about the same.”

In general, in this report, we use the term “about the same” to describe average score differences in PISA that are less than 8 scale points; we use the term “better (or worse)” to describe differences that are at least 8 points but less than 18 scale points, and we use the term “substantially (or much) better (or worse)” to describe differences that are 18 scale points or more. Of course, any fixed cut point is arbitrary, and readers may find it strange when we say, for example, that when two countries have an average difference of 7 scale points they perform about the same, whereas when their average difference is 8 scale points one performs better than the other. This is a necessary consequence of any descriptive system using cut points. However, this caution is in order: Readers without statistical sophistication will be tempted to think that a difference of 7 scale points is almost “better.” This is true. But a difference of 8 scale points is also almost “about the same.” Many readers, accustomed to finding differences where there are none, will be more reluctant to consider the latter than the former, but both are equally true.

Table 1 displays overall average scores in reading and math reported by PISA for 2009. These are the basis (without any socioeconomic disaggregation) of most commonplace comparisons.

Table 1 Overall average national scale scores, reading and math, for U.S. and six comparison countries, PISA 2009 Top scoring Similar post-industrial U.S. U.S. versus: Canada Finland Korea Average* France Germany U.K. Average* Top-scoring average Similar post-industrial average Reading 524 536 539 533 496 497 494 496 500 -33 +4 Math 527 541 546 538 497 513 492 501 487 -50 -13 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. * Simple (unweighted) average of three countries Source: Authors' analysis of OECD Program for International Student Assessment (PISA) (2010a) Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

The table shows that, on average, U.S. performance was substantially worse than performance in the top-scoring countries in both math and reading, was about the same as performance in the similar post-industrial countries in reading, and was worse than performance in the similar post-industrial countries in math.

We next disaggregate scores in the United States and in the six comparison countries by an approximation of the social class status of test takers, dividing them into six groups, from the least to the most advantaged. We refer to these as Group 1 (lowest social class), 2 (lower social class), 3 (lower-middle social class), 4 (upper-middle social class), 5 (higher social class), and 6 (highest social class). We also refer to Groups 1 and 2 together as disadvantaged students, to Groups 3 and 4 together as middle-class students, and to Groups 5 and 6 together as advantaged students.

There is no precise way to make social class comparisons between countries. PISA collects data on many characteristics that are arguably related to social class status, and also assembles them into an overall index. Although none of the possible indicators of social class differences is entirely satisfactory, we think one, the number of books in the home (BH), is probably superior for purposes of international test score comparisons, and we use it for our analysis. A very high fraction of students in both the PISA and TIMSS surveys answer the BH question, something less true for other important social class indicator questions asked on the student questionnaires. As we explain in greater detail below, we also examine whether other social class indicators, such as mother’s education or PISA’s overall index, in addition to BH, would produce meaningfully different results, and determine that they would not. We conclude that BH serves as a reasonable representation of social class (home) influences on students’ academic performance.

Our examination of 2009 PISA scores, disaggregated by social class group, reveals that:

In every country, students from more-advantaged social class groups outperform students from less-advantaged social class groups. The social class performance gap is large. In each country we study, the reading gap between the highest (Group 6) and the lowest (Group 1) social class groups is more than a full standard deviation. The math gap is also more than a full standard deviation in the United States and in four of the six comparison countries. In the other two, Canada and Finland, the gap is also large, almost a full standard deviation. The reading and math gaps are larger in France than in any country we studied.

The reading and math gaps are smaller in the United States than in each of the three similar post-industrial countries we studied.

The average U.S. scores in reading and math were about the same or lower than those in the six comparison countries in considerable part because a disproportionately greater share of U.S. students come from disadvantaged social class groups than do students in the six comparison countries.

If the United States had the same social class distribution as the average of the three top-scoring countries, or as the average of the three similar post-industrial countries, its average reading and math scores would have been higher than its reported averages.

Table 2A displays the share by social class group of the national samples for the United States and the six comparison countries.

Table 2A Share of PISA 2009 sample in each social class group, by country Social class group Canada Finland Korea France Germany U.K. U.S. Group 1 (Lowest) 9% 6% 5% 15% 12% 14% 20% Group 2 13 11 9 17 13 16 18 Group 3 31 34 31 31 29 29 28 Group 4 21 23 23 18 19 18 16 Group 5 17 20 22 13 16 15 12 Group 6 (Highest) 9 6 9 7 10 8 6 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 2B Share of PISA 2009 sample in each social class group, for U.S., three top-scoring countries, and three similar post-industrial countries Social class group Average distribution for three top-scoring countries Average distribution for three similar post-industrial countries Distribution, U.S. Group 1 (Lowest) 7% 14% 20% Group 2 11 15 18 Group 3 32 30 28 Group 4 22 18 16 Group 5 20 15 12 Group 6 (Highest) 8 8 6 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 2B summarizes the data by grouping the comparison countries in Table 2A. Column (a) shows the average distribution by social class in the three top-scoring countries, and column (b) shows the average distribution by social class in the three similar post-industrial countries.

From these tables we can see that more U.S. 15-year-olds (37 percent ) are in the disadvantaged (Groups 1 and 2) social class groups than in any of the six comparison countries, and we can therefore see why comparisons that do not control for differences in social class distributions between countries may differ greatly from those that do. There are fewer U.S. students in the middle (Groups 3 and 4) social class groups than in the middle social class groups of the three similar post-industrial countries (Germany, France, and the United Kingdom), although the differences are small. Differences in the size of middle-class groups are larger when the United States is compared to the three top-scoring countries (Korea, Finland, and Canada). And in the advantaged (Groups 5 and 6) social class groups there are substantially fewer U.S. students than there are in these groups in all six of the comparison countries.

Any meaningful comparison of average performance should be adjusted for these differences. To clarify why, consider two countries, in both of which affluent students score higher than poor students. Country A’s most affluent (social class Group 6) students score higher than Country B’s Group 6 students. Similarly, Country A’s least advantaged (Group 1) students score higher than Country B’s Group 1 students. Yet if the proportion of poor children in Country A is higher than the proportion of poor children in Country B, the average score of all students in Country A may be lower than the average score of all students in Country B, even though both affluent and poor students in Country A achieve at higher levels than socioeconomically similar students in Country B. Such apparent anomalies are termed “composition effects.”

Before pursuing policies to address seemingly poor American student achievement in comparison to other nations, we should ask to what extent, if any, lower average U.S. performance is attributable to composition effects. In fact, a part, though small, of the apparently lower U.S. average performance is attributable to composition effects.

We can judge the importance of this composition effect by standardizing the social class distribution of the United States and the comparison countries. If we reweight the average country scores from Table 1, substituting the average social class weights of the top-scoring and similar post-industrial comparison countries from Table 2B, the country scores would be as shown in Tables 3A-D. Tables 3A and 3C show what the 2009 PISA reading and math scores, respectively, would have been if each country had an identical social class distribution to that of the average of the top-scoring countries. Tables 3B and 3D show what the 2009 PISA reading and math scores, respectively, would have been if each country had an identical social class distribution to that of the average of the similar post-industrial countries. Figures A1 and A2 (for reading) illustrate the data in Tables 3A and 3B; Figures A3 and A4 (for math) illustrate the data in Tables 3C and 3D.

Table 3A Overall average scale scores, reading, for U.S. and six comparison countries, PISA 2009 (with standardization for average social class distribution in top-scoring countries) Top scoring Similar post-industrial U.S. U.S. versus: Canada Finland Korea Average* France Germany U.K. Average* Top- scoring average Similar post-industrial average National average reading score (from Table 1) 524 536 539 533 496 497 494 496 500 -33 4 National average reading score, standardized for top-scoring country average social class distribution 529 536 536 534 513 508 507 510 518 -16 9 Difference between social class standardized reading scores and actual average reading scores 5 0 -3 1 18 11 13 14 19 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. * Simple (unweighted) average of three countries Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 3B Overall average scale scores, reading, for U.S. and six comparison countries, PISA 2009 (with standardization for average social class distribution in similar post-industrial countries) Top scoring Similar post-industrial U.S. U.S. versus: Canada Finland Korea Average* France Germany U.K. Average* Top-scoring average Similar post-industrial average National average reading score (from Table 1) 524 536 539 533 496 497 494 496 500 -33 4 National average reading score, standardized for similar post-industrial country average social class distribution 521 527 528 525 501 496 497 498 509 -17 11 Difference between social class standardized reading scores and actual average reading scores -4 -8 -11 -8 5 -1 3 2 9 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. * Simple (unweighted) average of three countries Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 3C Overall average scale scores, mathematics, for U.S. and six comparison countries, PISA 2009 (with standardization for average social class distribution in top-scoring countries) Top scoring Similar post-industrial U.S. U.S. versus: Canada Finland Korea Average* France Germany U.K. Average* Top-scoring average Similar post-industrial average National average math score (from Table 1) 527 541 546 538 497 513 492 501 487 -50 -13 National average math score, standardized for top-scoring country average social class distribution 531 541 543 538 513 522 504 513 504 -34 -9 Difference between social class standardized math scores and actual average reading scores 4 0 -3 0 17 10 11 13 17 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. * Simple (unweighted) average of three countries Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 3D Overall average scale scores, mathematics, for U.S. and six comparison countries, PISA 2009 (with standardization for average social class distribution in similar post-industrial countries) Top scoring Similar post-industrial U.S. U.S. versus: Canada Finland Korea Average* France Germany U.K. Average* Top-scoring average Similar post-industrial average National average math score (from Table 1) 527 541 546 538 497 513 492 501 487 -50 -13 National average math score, standardized for similar post-industrial country average social class distribution 523 534 533 530 502 511 495 502 495 -35 -7 Difference between social class standardized math scores and actual average reading scores -3 -6 -13 -8 5 -2 2 2 8 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. * Simple (unweighted) average of three countries Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

The result of this reweighting is generally to increase scores in France and in the United States and to reduce scores in Korea. With reweighting, the U.S. average reading and math performance would still be below that of the top-scoring countries, although the U.S. deficit in reading in comparison to Canada would no longer be substantial. The U.S. average reading performance would now seem to be better than that in Germany or the United Kingdom, whereas before social class standardization the reading scores in these two countries were about the same as those in the United States.

Figure A1 Average national reading scores, actual and re-weighted using top-scoring country average social class group distribution, for U.S. and six comparison countries, PISA 2009 Source: Authors’ analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Figure A2 Average national reading scores, actual and re-weighted using similar post-industrial country average social class group distribution, for U.S. and six comparison countries, PISA 2009 Source: Authors’ analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Figure A3 Average national math scores, actual and re-weighted using top-scoring country average social class group distribution, for U.S. and six comparison countries, PISA 2009 Source: Authors’ analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Figure A4 Average national math scores, actual and re-weighted using similar post-industrial country average social class group distribution, for U.S. and six comparison countries, PISA 2009 Source: Authors’ analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Tables 3A and 3C show that if the U.S. PISA sample had the same social class weights as the average of the three top-scoring countries, and if the average performance of each social class group were the same as it was in actuality, the U.S. average reading score would not have been 500, but substantially better at 518, and the U.S. average math score would not have been 487, but better at 504.

Table 4 Scale scores by social class group for U.S. and similar post-industrial countries, PISA 2009 France Germany U.K. U.S. Reading Group 1 (Lowest) 403 413 424 442 Group 2 458 455 455 471 Group 3 498 496 490 504 Group 4 533 523 522 529 Group 5 559 555 555 563 Group 6 (Highest) 573 551 562 563 Gap (Group 6 – Group 1) 170 137 138 121 Gap (Group 5 – Group 2) 101 100 100 93 Math Group 1 (Lowest) 413 433 435 434 Group 2 460 466 455 464 Group 3 498 509 487 491 Group 4 529 535 517 510 Group 5 562 571 547 548 Group 6 (Highest) 569 570 551 548 Gap (Group 6 – Group 1) 156 137 116 114 Gap (Group 5 – Group 2) 102 104 92 84 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 5 Scale scores by social class group for U.S. and top-scoring countries, PISA 2009 Canada Finland Korea U.S. Reading Group 1 (Lowest) 459 466 461 442 Group 2 492 495 501 471 Group 3 518 523 529 504 Group 4 543 552 546 529 Group 5 561 571 564 563 Group 6 (Highest) 567 572 581 563 Gap (Group 6 – Group 1) 108 106 119 121 Gap (Group 5 – Group 2) 70 75 63 93 Math Group 1 (Lowest) 471 490 452 434 Group 2 493 507 504 464 Group 3 521 528 531 491 Group 4 543 552 553 510 Group 5 560 570 579 548 Group 6 (Highest) 567 580 602 548 Gap (Group 6 – Group 1) 96 90 149 114 Gap (Group 5 – Group 2) 67 63 75 84 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Tables 3B and 3D show that if the U.S. PISA sample had the same social class weights as the average of the three similar post-industrial countries, and if the average performance of each social class group were the same as it was in actuality, the U.S. average reading score would not have been 500, but better at 509, and the U.S. average math score would not have been 487, but better at 495.

Tables 3A and 3B show that, in reading, if all countries in our study had the same social class composition as the average social class composition of the three top-scoring countries, or had the same social class composition as the average social class composition of the three similar post-industrial countries, the positive test score gap between the top-scoring countries and the United States would be cut in half, and the positive test score gap between the United States and similar post-industrial countries would at least double to become meaningful.

Tables 3C and 3D show that, in math, if all countries in our study had the same social class composition as the average social class composition of the three top-scoring countries, or had the same social class composition as the average social class composition of the three similar post-industrial countries, the positive test score gap between the top-scoring countries and the United States would be cut by a third or more, and the positive test score gap between the similar post-industrial countries and the United States would also be cut by a third or more.

Tables 3A-D show how the U.S. average PISA reading and math scores might improve if the United States had the more favorable social class distributions of similar post-industrial countries. In Appendix A, we perform an opposite exercise, showing how much the scores of other countries might decline if they had the less favorable social class distribution of the United States. There is no single correct way to standardize scores by social class distribution. Other weighting methods generate somewhat different results, but the pattern is the same. Because of this distortion of average scores from social class composition, for the balance of this report, we focus on scores by social class group, not on average national scores.

Table 4 displays the 2009 reading and math scores for the United States and three similar post-industrial countries, disaggregated by comparable social class groups in each country.

In reading, in comparison to students in the three similar post-industrial countries, U.S. students from the lowest (Group 1) social class group scored substantially better than comparable social class students in each of the three similar post-industrial countries. U.S. students from the lower (Group 2) social class group performed better than comparable social class students in each of the three similar post-industrial countries. U.S. students in the lower-middle (Group 3) social class group performed better than comparable social class students in Germany and in the United Kingdom, and about the same as comparable social class students in France. U.S. students in the upper middle (Group 4) social class group performed about the same as comparable social class students in the three similar post-industrial countries. U.S. students in the higher (Group 5) social class group performed better than comparable social class students in Germany and in the United Kingdom, and about the same as comparable social class students in France. U.S. students in the highest (Group 6) social class group performed about the same as comparable social class students in the United Kingdom, better than comparable social class students in Germany, and worse than comparable social class students in France.

Tables 3A-B showed that the U.S. average reading score was higher than reported when social class distribution was controlled for. Table 4 shows that, in reading, U.S. students performed as well or better than students in the three similar post-industrial countries at every social class level. The only exception is students in France in the highest (Group 6) social class group, who performed better in reading than students in the United States.

In math, in comparison to students in the three similar post-industrial countries, U.S. students from the lowest (Group 1) social class group performed substantially better than comparable social class students in France and about the same as comparable social class students in Germany and the United Kingdom. U.S. students from the lower (Group 2) social class group performed about the same as comparable social class students in France and Germany and better than comparable social class students in the United Kingdom. In all other (Groups 3-6) social class groups, U.S. students performed substantially worse than comparable social class students in Germany, and about the same as comparable social class students in the United Kingdom. U.S. students in the upper-middle (Group 4) and highest (Group 6) social class groups performed substantially worse than comparable social class students in France, and U.S. students in the higher (Group 5) social class group performed worse than comparable social class students in France.

Unlike in reading, however, in math U.S. students underperformed students from middle and advantaged (Groups 3-6) social class groups in France and Germany, and mostly performed about the same as students from similar social class groups in the United Kingdom. Only in a comparison with the lowest (Group 1) social class students in France were comparable social class U.S. students substantially superior in math performance.

Table 4 also displays the test score gradient (commonly referred to as the “achievement gap”), measured in two ways: the gap in average scores between students in Group 1 and students in Group 6, and the gap in average scores between students in Group 2 and students in Group 5.

In reading, the Group 1/Group 6 achievement gap is smaller in the United States than in the three similar post-industrial countries, and much smaller than in France. The Group 2/Group 5 reading achievement gap is smaller in the United States than in France or the United Kingdom. In math, the Group 1/Group 6 achievement gap is smaller in the United States than in France or Germany, and about the same as in the United Kingdom. The Group 2/Group 5 math achievement gap is smaller in the United States than in each of the similar post-industrial countries.

Careful examination of these gradients, however, should serve as a warning to be cautious about interpretation of “achievement gaps,” the subject of frequent policy comment in the United States. One interpretation of these gradients, mostly larger in the similar post-industrial countries than in the United States, suggests that social class has a bigger impact on reading and math performance in the similar post-industrial countries than it does in the United States. Perhaps this is because the United States has a more equal school system than have the similar post-industrial countries, or because non-school social class characteristics have a bigger impact in the similar post-industrial countries than they do in the United States. Either of these explanations is at variance with commonplace assumptions in U.S. policy discussion. This finding is especially noteworthy because income inequality is probably larger in the United States than in the similar post-industrial countries.

However, having a more equal school system is not necessarily the same as having a superior school system. Consider the Group 2/Group 5 gradients for the United States and France: In reading, the U.S. gap is smaller than the gap in France. This is attributable to the United States having higher reading achievement in Group 2 and about the same reading achievement in Group 5. This seems to be a desirable relative (to France) outcome for the United States. But in math, the smaller U.S. gap is attributable to Group 2 mathematics achievement that is about the same in the two countries, with Group 5 mathematics achievement that is lower in the United States than in France. Generating a smaller gap by having lower achievement in the higher social class group is probably not a result most policymakers would seek.

The U.S.-Germany reading gradient comparison is even more favorable to the United States than the U.S.-France gradient comparison, with U.S. achievement higher both for Group 2 and Group 5 students. Because the Group 2 U.S. superiority is greater than the Group 5 superiority, the U.S. gap is smaller. This is a desirable result.

But in math, the smaller U.S. gap relative to the German gap is attributable to Group 2 scores that are about the same in the two countries while Group 5 scores are substantially lower in the United States than in Germany. Although the United States has a smaller achievement gap, this is not a desirable result.

Comparing the U.S. and U.K. gradients, in reading the result is similar to that in the German comparison—desirable for the United States, because U.S. Group 2 achievement is higher than that in the United Kingdom, while U.S. Group 5 achievement is also higher than in the United Kingdom, but not as much so. In math, U.S. achievement in Group 2 is higher than that in the United Kingdom, while Group 5 achievement in the two countries is about the same. This, too, is a desirable result for the United States, but not as desirable as it would be if Group 5 achievement were higher as well.

Table 5 displays the 2009 reading and math scores for the United States and three top-scoring countries, disaggregated by comparable social class groups in each country.

In reading, disadvantaged (Groups 1 and 2) students in the U.S. score substantially worse than comparable students in the three top-scoring countries, the only exception being the lowest (Group 1) social class students, where U.S. students score worse but not substantially worse than their social class counterparts in Canada. Likewise for middle (Groups 3 and 4) social class students: U.S. students score worse than comparable students in Canada and substantially worse than comparable students in Finland and Korea. Higher (Group 5) social class students in the United States score about the same as comparable social class students in the three top-scoring countries, while the highest (Group 6) social class students in the United States score worse than comparable social class students in Finland and Korea and about the same as comparable social class students in Canada.

In comparing the United States and the three top-scoring countries in math, the picture is consistent across all social class groups and countries: U.S. students score substantially worse than comparable students in each social class group in the three top-scoring countries, the exception being that U.S. higher social class (Group 5) students score worse than comparable social class students in Canada.

Table 5 also displays the test score gradients between advantaged and disadvantaged students in the United States and the top-scoring countries.

Unlike the gradients in the similar post-industrial countries, the gradients in the top-scoring countries are generally smaller than those in the United States. In reading, the Group 6/Group 1 gap is smaller in Canada and in Finland than in the United States and about the same in Korea as in the United States. The Group 5/Group 2 reading gradient is smaller in Finland than in the United States and much smaller in Canada and Korea than in the United States.

In math, the Group 6/Group 1 gradient is much smaller in Canada and Finland than in the United States, as is the Group 5/Group 2 math gradient in Finland. The Group 5/Group 2 math gradients in Canada and Korea are smaller than in the United States.

What stands out most, however, is the unusually large gap in achievement between Korean students in Group 6 and those in Group 1. This gradient of 149 scale points is larger than in any other comparison we have made, and results both from the unusually low relative performance in math of Korean students in Group 1 and unusually high relative performance in math of Korean students in Group 6. Although the lowest (Group 1) social class students in Korea score substantially better than similar social class students in the United States, the relative advantage of Korean performance is much more pronounced at the highest (Group 6) social class level.

We cannot say whether this Korea–United States difference is attributable to the United States having a more equal school system than does Korea, or because non-school characteristics of the highest social class students have a bigger positive impact on students in Korea than on students in the United States. For example, widely reported access to out-of-school tutoring may have an unusually large impact on the highest social class students in Korea.

The comparisons described in this part of the report show that, to some extent, the widely reported disparity between the performance of U.S. students and that of comparable countries’ students on the PISA is attributable to the U.S. sample of test takers being more heavily weighted toward disadvantaged students than the samples of comparable countries. Although adjustment for these social class differences does not eliminate the gap between the performance of United States and top-scoring country students, it narrows the gap. And relative to the performance of students in similar post-industrial countries, the performance of U.S. students in many cases no longer seems deficient once social class composition is taken into account.

In this connection, we note here but reserve for detailed discussion in Part IV an apparent flaw in the 2009 U.S. PISA sampling methodology. Although the U.S. sample included disadvantaged students in appropriate proportion to their actual representation in the U.S. 15-year-old population, the U.S. sample included a disproportionate number of disadvantaged students who were enrolled in schools with unusually large concentrations of such students. Because, after controlling for student social class status, students from families with low social class status will perform more poorly in schools with large concentrations of such students, this sampling flaw probably reduced the reported average score of students in the bottom social class groups (perhaps Groups 1-3). However, with available data, we cannot say to what extent this occurred. We do conclude, however, that this distortion probably depressed the reported average scores of U.S. students beyond the composition effect discussed in this section, artificially reducing the reported U.S. average score and its international ranking.

A consistent pattern in the 2009 PISA scores is the better performance of U.S. students on the reading than on the math test, relative to the comparison countries. Table 6 displays this pattern.

Table 6 Reading vs. math, U.S. compared with other countries, PISA 2009 U.S. versus: Canada Finland Korea France Germany U.K. Group 1 (Lowest) 19 32 -1 18 28 19 Group 2 8 18 9 9 18 7 Group 3 17 19 15 14 27 11 Group 4 19 18 26 14 32 15 Group 5 14 15 30 18 32 7 Group 6 (Highest) 15 22 36 11 35 4 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Note: Numbers in this table are the reading gap less the math gap for each social class group. The reading (math) gap is the U.S. average reading (math) score for a given social class group less the comparison country's reading (math) score for that social class group. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2009 database for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

For each social class group in each comparison country, the table shows the difference between the reading gap for a U.S. comparison and the math gap for a U.S. comparison. For example, for the lowest (Group 1) social class, the Canada–U.S. reading gap is 17 scale points (from Table 5, the U.S. Group 1 reading score is 442 and the Canadian Group 1 reading score is 459). The Canada–U.S. math gap is 37 scale points (from Table 5, the U.S. Group 1 math score is 434 and the Canadian Group 1 math score is 471). The difference between the reading gap of 17 scale points and the math gap of 37 scale points is the 19 scale points shown in Table 6 for Group 1, Canada. Wherever a positive number appears in Table 6, the reading gap is smaller than the math gap. Note that a positive number does not signify that U.S. students perform better in reading than students in the same social class group in a comparison country, or better in reading but not in math; it may mean that, or it may mean that the U.S. comparative deficit is less in reading than in math for that particular social class group and country because the reading deficit is smaller than the math deficit.

Table 6 shows that, on average, and for almost every social class group U.S. students do relatively better in reading than in math, compared to students in both the top-scoring and the similar post-industrial countries. The only exceptions to this pattern are with respect to social class Group 1 in Korea and to social class Groups 2, 5, and 6 in the United Kingdom. In these four cases, the reading and math gaps are about the same. In all other comparisons (for each social class group in each of the six comparison countries), the United States does relatively better in reading than in math, either because the U.S. reading score is higher than the reading score for the same social class group in a comparison country and the U.S. math score is less higher or lower, or because the U.S. reading score is lower than the reading score in the same social class group in a comparison country by a lesser amount than the U.S. math score is lower.

Part III. PISA trends from 2000 to 2009

Data are now available for four administrations of PISA – 2000, 2003, 2006 and 2009. Score trends over this decade may seem surprising. We would ordinarily expect instruction to be more difficult when the concentration of disadvantaged students increases. Yet while the social class composition of the national PISA sample deteriorated more in the United States than in any other country, disadvantaged U.S. students nonetheless saw their scores improve, while scores of similarly disadvantaged students in countries to which the United States is frequently compared have been declining. PISA reported that the U.S. average reading score was about the same in 2009 as it had been in 2000, but if U.S. social class composition had not deteriorated, the average U.S. reading score would have improved from 2000 to 2009. PISA reported that the U.S. average math score was worse in 2009 than in 2000, but this was all because of deteriorating social class composition. If this deterioration had not occurred, U.S. average math performance would have been about the same in 2009 as it had been in 2000.

The test score gaps between disadvantaged students in the United States and in top-scoring countries generally narrowed, but the gaps between advantaged students in the United States and in these top-scoring countries widened in some cases. In comparison to similar post-industrial countries, the United States also narrowed the gap more at the bottom than at the top, and in some cases ended the decade with clear superiority over similar social class groups toward the bottom of the scale. This section explores these findings in greater detail.

Score trends over time are as important for policy purposes as score levels at the current time. We want to know not only in which countries adolescents perform better than in other countries, but also whether there are socioeconomic factors or educational policies and practices that are causing a country’s performance to improve or deteriorate. If one country has lower 2009 PISA scores than another, but if scores in the lower-scoring country have been improving over the previous decade while scores in the higher-scoring country have been declining, policymakers in the lower-scoring country might be ill-advised to look exclusively to the higher-scoring country for model school improvement policies. At the very least, policymakers should attempt to understand why the higher-scoring country’s superior achievement appears, at least to some extent, to be unsustainable.

PISA has been administered every three years since 2000, and the multiple years of data provide policymakers an opportunity to make more useful judgments than would be allowed by a single year of data. Unfortunately, there are no U.S. reading data for 2006 because of an error in test administration. Thus, we can look at changes in U.S. students’ math performance on PISA from 2000 to 2003, to 2006, and to 2009, but at reading performance only from 2000 to 2003 and then to 2009.

Students who were 15 years old and took the PISA in 2000 would have been affected by their families’ social, economic, and community environments beginning in about 1985, and would have entered school in about 1990. PISA score changes from 2000 to 2003 could have been influenced by socioeconomic or instructional or other educational changes that took place anywhere from the mid-1980s to 2003. Likewise, PISA score changes from 2003 to 2006 could have been influenced by socioeconomic or instructional changes that took place anywhere from the late 1980s to 2006. And PISA score changes from 2006 to 2009 could have been influenced by socioeconomic or instructional changes that took place anywhere from the mid-1990s to 2009.

In this report, we are unable to attribute causes to trends in scores; we can only describe them. We review trends in reading and math for the United States and each of the six comparison countries in the discussion and tables that follow.

As was the case when we examined comparative score levels in 2009, our main conclusion from this review is that there are few consistent patterns in these score trends that can be used to inspire policy. Simplistic judgments based on selective or overly generalized data can (and do) mask critical aspects of U.S. relative performance, and they can support policy changes that can undermine U.S. sources of strength and exacerbate U.S. sources of weakness.

As in the previous section of this report, we focus on trends by social class group, because changes over time in the composition of a country’s test takers by social class can affect a country’s average score while masking real changes (or lack of change) in the performance of that country’s students. Composition effects can distort changes over time as well as comparisons between countries at a given time.

In fact, the proportion of students sampled in different social class groups from 2000 to 2009 in the United States and in the six comparison countries has changed, and these shifts influence changes in the overall average score of each country over time.

It is made somewhat more difficult to understand these changes because PISA modified its books-in-the-home (BH) group definitions after the 2000 assessment. Table 7 displays these changed definitions.

Table 7 PISA group definitions by books in the home Number of books in home 2000 2003 and after Group 0 0 – Group 1 1 – 10 0–10 Group 2 11–50 11–25 Group 3 51–100 26–100 Group 4 101–250 101–200 Group 5 251–500 201–500 Group 6 >500 >500 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: OECD Program for International Student Assessment (PISA) 2000, 2003, 2006, and 2009 databases Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

We can make some comparisons of social class distributions of test takers in 2000 and 2009 because four categories are consistent over this period: a combination of Groups 0 and 1, which includes test takers from homes with 10 books or fewer; a combination of Groups 2 and 3, which includes test takers from homes with 11 to 100 books; a combination of Groups 4 and 5, which includes test takers from homes with 101 to 500 books; and Group 6, which includes test takers from homes with more than 500 books.

Table 8A shows how the distribution of test takers by these four books-in-the-home categories in each of the seven countries changed from 2000 to 2009.

Table 8A Changes in PISA sample social class composition by books in the home, U.S. and six comparison countries, 2000–2009 (percentage points) Canada Finland Korea France Germany U.K. U.S. 0–10 books +2 -2 -3 +3 +4 +5 +7 11–100 books +6 -4 -1 +3 0 +3 +5 101–500 books -5 +6 +2 -5 -2 -3 -7 >500 books -4 0 +2 -1 -2 -5 -4 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2000 and 2009 databases for each country Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

The table shows that the share of students whose homes had the fewest (0-10) books declined in Finland and Korea, but increased in Canada, France, Germany, the United Kingdom, and, most of all, the United States. The share of students from homes with only 11-100 books also increased in the United States and in Canada as well. Correspondingly, the share of students whose homes had more than 100 books increased in Finland and Korea, but declined everywhere else, with the largest decline in the United States. By these measures of change in the sample proportions of students from homes with fewer and more books, U.S. students’ average social class declined more than the average social class of any of the comparison countries from 2000 to 2009, with the United Kingdom a close second. Finland and Korea’s average social class increased.

Because the BH categories remained consistent from 2003 onward, Table 8B shows how the distribution of test takers by social class in these countries changed from 2003 to 2009.

Table 8B Changes in PISA sample social class composition by books-in-the-home group, U.S. and six comparison countries, 2003–2009 (percentage points) Canada Finland Korea France Germany U.K. U.S. Group 1 (Lowest) +2 +1 -1 +6 +5 +5 +7 Group 2 +2 -2 -2 +1 0 +2 +2 Group 3 +1 -3 -2 -3 -1 -1 -3 Group 4 -1 +1 -2 -2 -2 0 -3 Group 5 -1 +4 +4 0 -1 -3 -2 Group 6 (Highest) -3 0 +3 -1 -2 -2 -2 Disadvantaged (Groups 1 and 2) +4 -1 -3 +6 +5 +7 +10 Middle class (Groups 3 and 4) 0 -2 -4 -5 -3 -1 -6 Advantaged (Groups 5 and 6) -4 +3 +6 -1 -3 -5 -4 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2000 and 2009 databases for each country, with authors' interpolations for 2000 social class composition to match 2009 books-in-the-home groupings Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

We can see from Table 8B that, during the six-year period 2003–2009, the average social class of the test-taking samples in Canada, in the three similar post-industrial countries (France, Germany, and the United Kingdom), and in the United States declined, with the U.S. decline larger than in any of the comparison countries.

Because of such social class compositional changes, comparisons of test score trends over time by social class group provide more useful information to policymakers than comparisons of total average test scores at one point in time or even of changes in total average test scores over time.

For reading and math, we examine trends in the United States by BH categories compared to the six comparison countries for the 2000 to 2009 period. The paths by which performance changed from 2000 to 2009 varied by country, so an investigation of why these 2000 to 2009 changes occurred in specific countries should also examine disaggregated scores. For the United States, because no data are available for reading in 2006, such an investigation should disaggregate the reading trends by examining the 2000 to 2003 and 2003 to 2009 periods separately. For mathematics, a similar investigation would be appropriate, with the addition of disaggregating trends for the 2003 to 2006 and 2006 to 2009 periods.

In the next series of tables, we show how, for each social class group, PISA achievement in reading and math changed in the United States and in each of the comparison countries from 2000 to 2009. Because, as noted above, PISA changed its books-in-the-home categories in 2003, social class groups in 2000 do not exactly match the categories in 2009. Thus, to make an estimate of average social class group score changes from 2000 to 2009, we interpolate average scores for books-in-the-home categories in 2000 in order to create average test scores by social class groups that are comparable to those in 2009. We use these estimates to calculate test score differences by social class groups from 2000 to 2009.

Reading, 2000–2009

Table 9A displays how reading achievement changed from 2000 to 2009 in the United States and the three similar post-industrial countries.

Table 9A Reading score changes, scale scores by social class group for U.S. and similar post-industrial countries, PISA 2000–2009 France Germany U.K. U.S. 2000 2009 Change 2000 2009 Change 2000 2009 Change 2000 2009 Change Group 1 (Lowest) 430 403 -27 361 413 52 440 424 -17 418 442 23 Group 2 464 458 -7 404 455 52 470 455 -16 455 471 15 Group 3 503 498 -5 465 496 31 508 490 -19 499 504 5 Group 4 526 533 8 502 523 21 539 522 -17 528 529 1 Group 5 553 559 6 536 555 19 565 555 -10 556 563 7 Group 6 (Highest) 548 573 26 549 551 1 577 562 -15 560 563 3 National average reading score 505 496 -9 484 497 13 523 494 -29 504 500 -5 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2000 database, with authors' interpolations of average test scores; Tables 1 and 4 for 2009 data Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 9B displays data on how reading gaps between U.S. students and comparable social class students in the three similar post-industrial countries changed from 2000 to 2009. (Positive numbers describe gains for U.S. performance relative to the performance of comparison countries. Negative numbers describe deteriorated U.S. performance relative to that of comparison countries.)

Table 9B Reading score gap changes, U.S. vs. similar post-industrial countries, PISA 2000–2009 Gap changes, U.S. versus: France Germany U.K. Group 1 (Lowest) +50 -29 +40 Group 2 +22 -36 +31 Group 3 +10 -26 +24 Group 4 -7 -19 +18 Group 5 +1 -12 +17 Group 6 (Highest) -23 +1 +18 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Note: Numbers in this table take the 2009 U.S. average score for a social class group, less the 2009 comparison country's average score for the same social class group, and subtract from this result the 2000 U.S. average score for that social class group, less the 2000 comparison country's average score for the same social class group. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2000 database, with authors' interpolations of average test scores; Tables 1 and 4 for 2009 data Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Considering the full 2000 to 2009 period, U.S. reading scores improved for disadvantaged social class (Groups 1-2) students, including a substantial improvement for the lowest social class (Group 1); U.S. reading scores were about the same for middle-class and advantaged social class (Groups 3-6) students.

Considering trends in the three similar post-industrial countries in the full 2000 to 2009 period:

In France, reading scores declined substantially for the lowest social class (Group 1) students, improved for upper-middle social class (Group 4) students, improved substantially for the highest social class (Group 6) students, and were mostly unchanged for lower-middle and higher social class (Groups 3 and 5) students. Thus, whereas in 2000 U.S. disadvantaged social class (Groups 1-2) students performed below comparable French students, in 2009 these students in the United States performed better than disadvantaged students in France and, in the case of the lowest social class (Group 1) students, substantially better. Whereas in 2000 the highest social class (Group 6) students in the United States performed better than comparable French students, in 2009 they performed worse. Middle and higher social class students (Groups 3-5) in the United States and France performed at about the same level in both years.

In Germany, reading scores were mostly unchanged from 2000 to 2009 for the highest social class (Group 6) students but improved substantially for other social class (Groups 1-5) students. There were extraordinarily large gains—half a standard deviation—for disadvantaged social class group (Groups 1-2) students. Thus, although U.S. students still had higher reading scores than German students in each social class group in 2009 (except for upper-middle social class [Group 4] students, who scored about the same in the two countries in 2009), and although the lowest social class (Group 1) students in the United States continued to perform substantially better than comparable German students, German students closed the gap in all social class groups (except for Group 6) from 2000 to 2009.

In the United Kingdom, reading scores declined in every social class group, with substantial declines for lower-middle social class (Group 3) students. Thus, whereas in 2000 U.S. students performed worse than U.K. students in each social class group, by 2009 the lowest social class (Group 1) students in the United States performed substantially better than comparable students in the United Kingdom, and lower, lower-middle, and higher social class (Groups 2, 3, and 5) students in the United States performed better than comparable social class students in the United Kingdom. Upper-middle and the highest social class (Groups 4 and 6) students in the United States performed about the same in 2009 as comparable social class students in the United Kingdom.

Table 10A displays how reading achievement changed from 2000 to 2009 in the United States and the three top-scoring countries.

Table 10A Reading score changes, scale scores by social class group for U.S. and top-scoring countries, PISA 2000–2009 Canada Finland Korea U.S. 2000 2009 Change 2000 2009 Change 2000 2009 Change 2000 2009 Change Group 1 (Lowest) 467 459 -8 497 466 -31 464 461 -3 418 442 23 Group 2 490 492 1 514 495 -19 490 501 11 455 471 15 Group 3 522 518 -5 534 523 -11 518 529 11 499 504 5 Group 4 542 543 1 558 552 -6 532 546 14 528 529 1 Group 5 560 561 1 575 571 -4 546 564 19 556 563 7 Group 6 (Highest) 563 567 4 581 572 -9 556 581 25 560 563 3 National average reading score 534 524 -10 546 536 -11 525 539 15 504 500 -5 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA), 2000 database, with authors' interpolations of average test scores; Tables 1 and 5 for 2009 data Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 10B displays the data on how reading gaps between U.S. students and comparable social class students in the top-scoring countries changed from 2000 to 2009. (Positive numbers describe gains for U.S. performance relative to the performance of comparison countries. Negative numbers describe deteriorated U.S. performance relative to that of comparison countries.)

Table 10B Reading score gap changes, U.S. vs. top-scoring countries, PISA 2000–2009 Gap changes, U.S. versus: Canada Finland Korea Group 1 (Lowest) +31 +54 +26 Group 2 +14 +34 +4 Group 3 +10 +16 -6 Group 4 +0 +7 -13 Group 5 +6 +12 -11 Group 6 (Highest) -1 +11 -22 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Note: Numbers in this table take the 2009 U.S. average score for a social class group, less the 2009 comparison country's average score for the same social class group, and subtract from this result the 2000 U.S. average score for that social class group, less the 2000 comparison country's average score for the same social class group. Source: Table 10A Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Considering trends in the three top-scoring countries in the full 2000 to 2009 period:

In Canada, reading scores declined for the lowest social class (Group 1) students, and were mostly unchanged for all others (Groups 2-6). Thus, while the lowest social class (Group 1) students in the United States still performed below comparable social class students in Canada, the gap between these U.S. and Canadian students was cut by two-thirds during this period. Gaps were also narrowed for lower- and lower-middle-class (Groups 2 and 3) students, while for upper-middle and advantaged social class (Groups 4-6) students, the gap was mostly unchanged from 2000 to 2009.

In Finland, reading scores declined for disadvantaged, lower-middle, and the highest social class (Groups 1-3 and 6) students, with substantial declines for disadvantaged social class (Groups 1-2) students. Reading scores for upper-middle and higher social class (Group 4 and 5) students were about the same in both years. U.S. disadvantaged and middle social class (Groups 1-4) students still scored substantially below comparable students in Finland in 2009. The highest social class (Group 6) students also scored below comparable students in Finland, but higher social class (Group 5) students now scored about the same in the United States and Finland. The U.S.-Finland reading gap was cut by about two-thirds for disadvantaged social class (Groups 1 and 2) students, was cut in half for lower-middle and advantaged social class (Groups 3, 5, and 6) students, and by about a third for upper-middle social class (Group 4) students from 2000 to 2009.

In Korea, reading scores improved for lower and middle social class (Groups 2-4) students and improved substantially for advantaged social class (Groups 5-6) students. Korean reading scores remained the same for lowest social class (Group 1) students. U.S. lowest social class (Group 1) students narrowed substantially (but did not eliminate) their negative performance gap relative to comparable students in Korea, but the U.S. negative performance gap grew for upper-middle and advantaged social class (Groups 4-6) students, with a substantial growth in this gap for the highest social class (Group 6) students. While U.S. higher social class (Group 5) students outperformed comparable social class students in Korea in 2000, by 2009 this social class group performed about the same in the two countries.

Thus, although U.S. students still scored below each of the three top-scoring countries in reading in almost all social class groups, U.S. students narrowed the gap in many groups from 2000 to 2009. Of particular note is the substantial gap closing between the lowest social class (Group 1) students in the United States and in each top-scoring country.

The 2000 to 2009 trends just described are not always (indeed, rarely) linear. For each social class group in each country, performance may have risen and then fallen during the period, making an understanding of the causes of these trends even more difficult. Figures B1 and B2 illustrate reading trends in the United States and the six comparison countries from 2000 to 2003 to 2006 to 2009.

To make the figures easier to understand, we display trends for disadvantaged social class (Groups 1 and 2) students and advantaged social class (Groups 5 and 6) students only. As the previous discussion has made clear, it would not be accurate to assume that the trends for middle social class (Groups 3 and 4) students, not shown, in each case parallel the trends for advantaged and disadvantaged students.

Before reasonable policy conclusions can be based on PISA reading score trends from 2000 to 2009, we should attempt to understand why, in the lowest (Group 1) social class group, reading scores improved substantially for U.S. and German students but declined for U.K. and Canadian students and declined substantially for students in France and Finland. Likewise, we should attempt to understand why reading scores for U.S., German, and Canadian students in the highest (Group 6) social class group were unchanged but improved substantially for comparable social class students in France and Korea and declined for students in Finland and the United Kingdom. We should understand why there was a collapse in reading performance across all social class groups in the United Kingdom, and we should understand why in Korea there was improvement for upper-middle and advantaged social class (Groups 4-6) students only. We are not aware of differing socioeconomic trends or changes in instructional or educational policies that can help to explain these disparate reading results, and so are not persuaded by policymakers who draw conclusions from these test score trends. Simple and seemingly obvious explanations cannot account for these complex results. If curricular or instructional changes are responsible, why should they have affected different social class groups within a country differently? If (in the case of Finland, for example) immigration of less literate families explains the drop in Group 1 scores, how does this explain why Group 6 scores fell as well?

Figure B1 Reading scores, by social class group, U.S. compared with similar post-industrial countries, PISA 2000–2009 Note: U.S. data for 2006 are unavailable and therefore linearly interpolated. Source: Authors' analysis of PISA 2000, 2003, 2006, and 2009 databases; authors’ calculations of mean test scores by books in the home (BH) Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Figure B2 Reading scores, by social class group, U.S. compared with top-scoring countries, PISA 2000–2009 Note: U.S. data for 2006 are unavailable and therefore linearly interpolated. Source: Authors' analysis of PISA 2000, 2003, 2006, and 2009 databases; authors’ calculations of mean test scores by books in the home (BH) Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

As noted above, socioeconomic, instructional, or educational changes anywhere from 1985 (the birth year of students taking PISA in 2000) through 2009 could help explain these changes in performance of 15-year-olds over the nine years from 2000 to 2009. Complicating matters further, average scores for countries, or for separate social class groups, did not trend in a straight line from 2000 to 2009. In some cases an overall increase was the consequence of a drop during one interim period but a larger gain in another. Attempting to explain changes in performance over interim periods, however, would be even more difficult than attempting to explain them over the full nine years.

Mathematics, 2000–2009

Table 11A displays how math achievement changed from 2000 to 2009 in the United States and the three similar post-industrial countries.

Table 11A Mathematics score changes, scale scores by social class group for U.S. and similar post-industrial countries, PISA 2000–2009 France Germany U.K. U.S. 2000 2009 Change 2000 2009 Change 2000 2009 Change 2000 2009 Change Group 1 (Lowest) 458 413 -45 381 433 52 458 435 -23 416 434 18 Group 2 484 460 -24 418 466 49 483 455 -29 446 464 17 Group 3 517 498 -19 471 509 39 519 487 -32 490 491 1 Group 4 537 529 -8 500 535 36 540 517 -23 510 510 0 Group 5 558 562 4 537 571 34 563 547 -16 543 548 4 Group 6 (Highest) 544 569 24 550 570 20 579 551 -29 554 548 -6 National average math score 517 497 -20 490 513 23 529 492 -37 493 487 -6 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Source: Authors' analysis of OECD Program for International Student Assessment (PISA) 2000 database, with authors' interpolations of average test scores, and Tables 1 and 4 for 2009 data Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image

Table 11B displays data on how math gaps between U.S. students and comparable social class students in the three similar post-industrial countries changed from 2000 to 2009. (Positive numbers describe gains for U.S. performance relative to the performance of comparison countries. Negative numbers describe deteriorated U.S. performance relative to that of comparison countries.)

Table 11B Math score gap changes, U.S. vs. similar post-industrial countries, PISA 2000–2009 Gap changes, U.S. versus: France Germany U.K. Group 1 (Lowest) +63 -35 +41 Group 2 +41 -31 +46 Group 3 +20 -38 +33 Group 4 +8 -35 +23 Group 5 +1 -29 +21 Group 6 (Highest) -30 -27 +22 Chart Data Download data The data below can be saved or copied directly into Excel. The data underlying the figure. Note: Numbers in this table take the 2009 U.S. average score for a social class group, less the 2009 comparison country's average score for the same social class group, and subtract from this result the 2000 U.S. average score for that social class group, less the 2000 comparison country's average score for the same social class group. Source: Table 11A Share on Facebook Tweet this chart Embed Copy the code below to embed this chart on your website. Download image
