Secretary of Education Arne Duncan has issued a warning to state-level school leaders telling them:

I am writing to urge you to do everything you can to ensure the integrity of the data used to measure student achievement and ensure meaningful educational accountability in your State. As I'm sure you know, even the hint of testing irregularities and misconduct in the test administration process could call into question school reform efforts and undermine the State accountability systems that you have painstakingly built over the past decade.

He goes on to say,

The successful implementation of Title I and other key programs administered by the U.S. Department of Education (Department) relies heavily on using data that are valid, reliable, and consistent with professional and technical standards.

The letter then describes a series of steps the Department of Ed wishes states to take to ensure the integrity and security of test data.

Unfortunately, Mr. Duncan is missing the biggest threat to the validity of the data being secured. It is the very high stakes that his policies place on these test scores. We have the Department of Education pushing the increased use of test scores for significant portions of teacher evaluations and pay, through Race to the Top, policies that reward or punish schools of education for the test scores of their graduates, and of course, the continued labeling of high-poverty schools as failures based on test scores. All this creates intense pressure to increase test scores, which, no doubt, accounts for Mr. Duncan's concern about test security. As was recently seen in Washington, DC, under Michelle Rhee, increased pressure to boost scores often leads to cheating.

But even in the absence of outright cheating, attaching such high stakes to test scores has the effect of decreasing their value as indicators of learning. As Campbell's Law states, "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Harvard scholar Daniel Koretz explained the problem in detail here two years ago.





As I explain in more detail in Measuring Up, a test is a small sample of behavior that we use to estimate mastery of a much larger "domain" of achievement, such as mathematics. In this sense, it is very much like a political poll, in which the preferences of a very small number of carefully chosen people are used to estimate the likely voting of millions of others. In the same way, a student's performance on a small number of test items is used to estimate her mastery of the larger domain. Under ideal circumstances, these small samples, whether of people or of test items, can work pretty well to estimate the larger quantity that we are really interested in.



However, when the pressure to raise scores is high enough, people often start focusing too much on the small sample in the test rather than on the domain it is intended to represent. What would happen if a presidential campaign devoted a lot of its resources to trying to win over the 1,000 voters who participated in a recent poll, while ignoring the 120 million other voters? They would get good poll numbers if the same people were asked again, but those results would be no longer represent the electorate, and they would lose. By the same token, if you focus too much on the tested sample of mathematics, at the expense of the broader domain it represents, you get inflated scores. Scores no longer represent real achievement, and if you give students another measure--another test, or real-world tasks involving the same skills--they don't perform as well. And remember, we don't send kids to school so that they will score well on their particular state's test; we send them to school to learn things that they can use in the real world, for example, in later education and in their work.

We have seen evidence of this in New York, where it appears that schools became better at teaching what would be on the high stakes tests, causing these scores to rise. Meanwhile, actual student learning, as measured by the low-stakes National Assessment of Educational Progress (NAEP) test did not increase.

All the test security in the world does not make this problem go away. Attaching high stakes to test scores makes these scores unreliable as valid indicators of student learning. If Secretary Duncan and President Obama are sincere when they say they do not want educators "teaching to the test," they should stop pursuing policies that make teacher pay and job security dependent on these scores.

This is one of the messages we will be taking to Washington, DC, at the Save Our Schools March on July 30th.



What do you think? Does making test data more secure ensure its validity? Or do the high stakes we attach to the data destroy its value?

