Political leaders are fond of saying the United States is in an education crisis.

The U.S. is often shown to be losing ground internationally. We revisit a Sputnik moment every time international test scores are released, and some of the Sturm und Drang over our decline is a response to America’s middling ranking among other wealthy countries. However, the U.S. has historically underperformed on such cross-national comparisons. We came in 11 out of 12 on the first international assessment of math in 1964, for instance.

“People like the simple story,” said Jack Buckley, the head of research at the College Board, who previously led the U.S. Department of Education’s research arm. “And the simple story is we’re treading water while the others are pushing ahead of us. I think [that] is the narrative of the times.”

But the truth is more complicated than the image of a U.S. education system stuck in the mire. And by one important measure, the nation’s students have been improving at a steady pace for decades.

“I don’t think there’s much evidence of decline, is the bottom line,” said Dan Goldhaber, a prominent education scholar and director of the National Center for Analysis of Longitudinal Data in Education Research. “And I would characterize it as a not very nuanced assessment.”

Since the Nixon administration, federal education administrators every few years have issued the National Assessment of Educational Progress’s long-term trend assessments, part of a government project called the Nation’s Report Card. It captures how well a representative sample of U.S. students can answer a range of rigorous questions in mathematics and reading. Between 1973 and 2012, the average student score in reading increased by 13 points for 9-year-olds, eight points for 13-year-olds and remained stalled among 17-year-olds. In math, the gains since the 1970s were even higher for 9- and 13-year-olds but also remained virtually flat among 17-year-olds.

Slight gains on a 500-point scale, right? But the data is telling you to look deeper. Upon closer inspection, you’ll notice that black and Hispanic students have made tremendous gains in math and reading on the nation’s gold standard for measuring these skills.

While the overall math averages for 9-year-olds grew by 25 points between 1978 and 2012, average scores among black and Hispanic students increased by 34 and 31 points, respectively.

Among 13-year-olds, math scores for white students increased by 21 points, while results for blacks and Hispanics increased by 34 points and 33 points, respectively. Overall, 13-year-olds improved by 26 points in math.

Seventeen-year-olds, many of whom are one year away from enrolling in college, nudged upward by six points overall between 1978 and 2012 on the math portion of NAEP, but scores for black and Hispanic students increased by 20 and 18 points, respectively.

Overall, scores for 9-year-olds taking the reading assessment grew by 11 points between 1975 and 2012; the scores for black and Hispanic students each rose by 25 points in that same period.

While scores for all 13-year-olds and white students increased by less than 10 points in reading, scores for blacks and Hispanics grew by 21 and 17 points, respectively.

Among 17-year-olds, reading scores for the overall tested population and white students grew by no more than two points between 1975 and 2012; scores for both black and Hispanic students grew by more than 20 points.

So, why haven’t minority students’ numbers boosted the overall average? There are two main reasons: Black and Hispanic students have grown as a share of all students in the U.S., yet despite the improvements of these groups, their scores still are lower than those of white students. That means the average doesn’t represent the considerable student growth at play.

In statistics, this phenomenon is called Simpson’s paradox.

“The minority students tend to do worse on the NAEP test, and they’re growing as a proportion of the population,” said Goldhaber, who also studies education issues at the University of Washington Bothell. “So, the fact that they are growing and have test scores that are below the average of white students, they’re going to drag the overall average down, even if their average is rising over time.”

Since the early 1970s, the share of white students captured by NAEP has declined from over 80 percent of U.S. students to just over half. That trend helps explain why even in the case of 17-year-olds, a flat aggregate score over time masks improvements by white, black and Hispanic students.

Fine, we’re not in neutral rolling down a hill toward academic ruin. But are we doing well enough? That comes down to how we define progress. It’s one thing to observe certain groups improving, but it’s also clear that whites are still outperforming blacks and Hispanics, in no small part because there are serious disparities in the quality of education low-income and minority students receive compared to their peers.

“When we look at achievement gaps, it’s really important to look at how those gaps are closing. We want to see all groups getting better,” said Allison Horowitz, a policy analyst at Education Trust, a think tank. “But we want to make sure that students who are low income or of color, who are too often at the bottom of the achievement gap, we want to see them closing that gap by increasing faster than their white or affluent counterparts.”

“This question gets raised in the labor market in terms of wages all the time,” Goldhaber said. “Do you care about whether your wage is going up year over year, or do you care where you stand relative to other people? And I think it’s not an either/or: We care about both. And the degree to which somebody cares about one versus the other depends on the person.”

Although NAEP shows that minority students are improving, the story is more mixed along socioeconomic lines. Stanford researcher Sean Reardon has argued that while the U.S. has made strides in closing the racial achievement gap, the economic gap between rich and poor students has widened. He looked at not only comprehensive test scores, but also other measures of academic success, such as entry into competitive universities and earning a college degree.

“I don’t think you’d find anybody serious who would dispute the fact that it’s sort of general socio-economic gaps that matter,” said Buckley, of the College Board. “Unfortunately in the U.S., a lot of those are mirrored in our race and ethnic gaps. But even within races, there are profound socio-economic gaps as well.”

U.S. students considered low income have been showing some improvement. While the NAEP’s long-term trends assessment only began capturing student performance along income lines in 2004, another version of NAEP that has been issued to students every two to four years since the 1990s shows equal and substantial gains for students who are and aren’t eligible for school lunch subsidies — which researchers use as a proxy for family incomes.

Math results for fourth-graders who qualify for the federal subsidized lunch program increased by 23 points. For students who weren’t eligible for the subsidy, they increased by 22 points. Overall, fourth-graders improved in the subject by 18 points.

Eighth-graders improved by 15 points overall in math; scores for lunch-subsidy-eligible students and their wealthier peers each rose by 20 points.

The data-keepers of NAEP discourage comparing 12th-grade math results from before 2005 to those from after 2005 because the test changed substantially. That makes the comparisons statistically unsound.

Still, the slight improvements between 2005 and 2013 are considered statistically significant for students who do and don’t qualify for the federal lunch subsidy.

Reading scores for fourth-graders rose by similar levels between 1998 and 2013. The improvements among low-income students and their wealthier peers is considered statistically significant. While reading scores for eighth-graders between 1998 and 2013 rose by five points, improvements were nearly double that for each subgroup. The gains among the subgroups are considered statistically significant.

Here, Simpson’s paradox is also at play: While overall reading scores for 12th-graders actually dipped, results for the subgroups improved in the same window of time. All changes between 1998 and 2013 are considered statistically significant.

Although NAEP shows us what’s happening, it can’t lend insight into why. NAEP captures the improved performance of low-income and minority students, but getting at the causes behind the rising test scores, and chronic achievement gaps, is beyond the Nation’s Report Card’s purview.

Of course, that doesn’t stop education movers and shakers from ascribing positive results to their preferred approaches to policy. The misuse of NAEP results is so legion that education scholars have coined a term for the data abuse: misNAEPery.

“I’ve always said that the fundamental purpose for the use of these assessments is as an indicator,” Buckley said. “It’s telling you where you stand and whether you’re making progress or not.”

Drawing conclusions from NAEP that don’t hold water isn’t limited to political leaders, of course. Journalists receive a bevy of news releases announcing key correlations between this academic intervention and that propitious bump in test scores. But scholars such as Buckley caution against attributing any particular policy to a rise or decline in NAEP scores.

Still, Buckley said data from NAEP can set scholars on a path of further exploration, cluing them into possible relationships between instructional tweaks and outcomes that buoy student performance. “We take that information and create a better research design,” he said.