Guest post by Educators for Shared Accountability.

A new group, Educators for Shared Accountability (ESA), has issued the first-ever Value-Added Measurement (VAM) evaluation of Secretary of Education Arne Duncan. Secretary Duncan was rated "ineffective," based on four indicators.

The United States Department of Education had a discretionary budget of $68.3 billion in fiscal year 2011. This amount was up from $64.1 billion the year before, and up from $29.4 billion in 2000. When the Department of Education was established in 1979, Congress appropriated an annual budget of $14.2 billion.

In the past 33 years, the budget for the Department of Education (DoEd) has increased almost fivefold.

Overseeing this massive department is US Secretary of Education Arne Duncan. Secretary Duncan earns a base salary of $179,700.

Mr. Duncan's time at the DoEd has coincided with a dramatic emphasis on efficiency and accountability in K-12 education at the local level. Since the dawn of NCLB ten years ago, schools have been rated by the federal government according to their effectiveness on raising student test scores. Most public school districts are also rated by their states. The rating systems vary from state to state, but virtually all of them are based on some combination of student standardized test scores, graduation rates, and other student-level data.

More recently, there has been a concerted effort to move from school-level accountability to government entities (as opposed to local school administrators) holding individual teachers accountable for student performance. Under Arne Duncan, the US Department of Education has offered financial incentives in an effort to coerce states into guaranteeing that a percentage of teachers' evaluations will be based on the performance of their students on standardized tests, using complex formulas known as Value-Added Measures (VAM). VAM formulas are intended to ensure that a teacher isn't rated poorly just because he or she teaches a greater proportion of struggling students. Basically, instead of rating teachers on students' absolute scores, a VAM formula is used to rate teachers by gauging student performance against a performance level anticipated for each student by looking at prior performance and the performance of peers.

VAM is serious business. In the case of New York City and Los Angeles, individual teachers have seen their performance levels published in local newspapers. One poor lady was named by the New York Post as the "city's worst teacher" based on her VAM scores. After the release of teacher ratings created by the Los Angeles Times, one teacher committed suicide.

The validity of VAM ratings is doubted in some quarters. Not only has prominent reform skeptic Diane Ravitch questioned VAM's accuracy, but even the New York Times called the ratings "controversial" and noted that education officials "cautioned against drawing conclusions" from them. The Times went on to note various data integrity problems reported by teachers, pointing out for example that English teacher Donna Lubniewski was actually rated based on her students' math scores.

Many education officials and pundits appear undeterred by data integrity questions. In the aftermath of the publication of teacher ratings in Los Angeles, Arne Duncan praised the action and said "Silence is not an option." He recently did an about-face, and criticized the posting of VAM scores in New York, but his policies have pushed these systems into schools across the nation. Duncan has long been a proponent of accountability for student outcomes at the local level.

Strangely missing in all of this, of course, is any sort of mechanism for holding people like Arne Duncan publicly accountable for student-level data attributable to their performance in an important position of leadership. With a salary significantly higher than that of the highest-paid teacher in America, and with a budget that dwarfs that of any local school district anywhere on the planet, we are supposed to just take Arne Duncan's word for it that he is doing a good job.

One would think Duncan would lead teachers across the nation by example and submit himself to a system rather similar to the type he advocates for teachers in the trenches.

Apparently, however, at the US Department of Education silence IS an option.

Teachers have been waiting for years for education policymakers and bureaucrats--those who are paid hefty salaries to theoretically improve educational outcomes for American children--to join them on the front lines of punitive data-driven accountability. Last July, one school administrator at the Save Our Schools march in Washington, DC, asked officials this question: "Why don't you...join me in the crucible of accountability?" Many front-line educators find it disconcerting and demoralizing when this nation's educator-in-chief urges the states to pass out labels to teachers but goes out of his way to avoid any label for his own efforts. While teachers are "objectively" rated by independent auditors using "scientific" formulas, personnel at the DoEd are content to have their actions judged subjectively on the basis of breathless press releases and heavily-massaged conclusions drawn from carefully selected data.

In the absence of any sort of effort on the part of Duncan or his staff to develop a means of legitimately holding themselves publicly accountable for positive student outcomes, a group calling itself Educators for Shared Accountability (ESA) has stepped into the gap. Their "Outcomes-Based, Value-Added Measurement of US Secretaries of Education" weighs every US Secretary of Education in the history of the department by comparing data at the beginning of each of their terms with data from the end of their terms. The data used to measure the effectiveness of each secretary includes two "student quality of life" data points and two data points based on student performance on academic tests. These four data points reflect improvement (or lack thereof) in the following areas during a secretary's term:

1. Student employability

2. Student pregnancy rates

3. Math performance

4. Reading performance

Some may wonder why the first two data points are included. One of the primary goals of K-12 education, of course, is to produce students who are ultimately employable. Furthermore, data suggests that "school achievement...helps reduce the risk of teen pregnancy." If the policies pursued by a US Secretary of Education result in effective schooling for American children, one could fairly assume that more employers would want to hire those students and fewer of those student-aged Americans would get pregnant.

The designers of this system felt it was important to include multiple measures (instead of looking exclusively at test scores) in order to arrive at a fair assessment of which education secretaries most effectively improved life outcomes for American students during their times in office.

ESA acknowledges that questions may arise regarding the validity of these rankings. Nevertheless, in the interest of sunlight, ESA has decided to release their outcomes-based ratings for US secretaries of education as they are. ESA is pleased to share this information and feels that any discussion following the publication of these rankings will be healthy. Like many in the field of education today, ESA is determined not to let the perfect be the enemy of the good. Students simply can't wait another day for the US Department of Education to develop an objective means for measuring the effectiveness of its highest paid staff member. Citizens, parents, and taxpayers have a right to know this information.



Data Point 1: Teen Employability

One critical aim of the American education system is the holistic development of children. While test scores indicate the content area knowledge and/or the test-taking prowess of students, few dependable measures of a truly well-rounded education exist. How can one measure students' critical thinking skills, communication skills, interpersonal social aptitude, and problem-solving abilities? Fortunately, there is an arena where those precise skills are valued and rewarded--the job market. That being the case, the first data point examined in this study is the employability of the American teen. Using data from the Bureau of Labor Statistics, ESA's crack research team analyzed the seasonally-adjusted employment population ratio for Americans aged 16-19 years. Each secretary of education was assigned a number of points equal to this ratio for the quarter immediately before he or she took office (which was tallied as the "Beginning" value), and for the quarter immediately after leaving office (tallied as the "Ending" value).

Note: the latest quarter available for the current secretary of education, Arne Duncan, was the fourth quarter of 2011.



The data used for this portion of the value-added measure was gathered using the search feature found here (using these search criteria: both sexes, all races, all origins, 16-19 years, all educational levels, all marital statuses, Employment-population ratio, seasonally adjusted, quarterly).

Data Point 2: Teen Pregnancy

The second data point--also tallied as a "Beginning" and "Ending" value--is the teen birth rate, found here. Each secretary was assigned a "Beginning" teen birth rate and an "Ending" teen birth rate for his or her term in office. Unfortunately, at the time this study was conducted, the latest teen birth rate data available from the CDC was for the year 2008. Since the teen birth rate for 2009 and later was unavailable, Margaret Spellings was assigned the figure from 2008 as her ending figure. For Arne Duncan--who took office in 2009--teen birth rate data was entirely unavailable for his term. That being the case, Duncan was assigned the respective average of all other secretaries' figures for his "Beginning" and "Ending" figures. (This solution was inspired by a similar method used by the state of Tennessee to compensate for a lack of testing data for teachers of non-tested subjects. Such teachers are assigned the average scores of teachers in their school who teach a tested subject, in order to come up with a number for them and determine whether or not they add value to their students.)

Note: unlike the other data points used in this study, teen birth rates improve by dropping. As a result, the teen birth rate immediately preceding a secretary of education's term was listed under the "Ending" category, and the teen birth rate immediately following each secretary's term was listed under the "Beginning" category. Reversing the placement of the two figures enabled an increase in the number to indicate improvement. This was necessitated so that teen birth rate could be combined with the other three data points to generate an overall increase or decrease in the aggregate score of each secretary.

Data Points 3 and 4: Math and Reading Proficiency

Progress in the mathematics and reading proficiency of students during each secretary of education's time in office was gauged based on overall NAEP scores. The "Beginning" figure was the NAEP score immediately prior to each secretary's taking office. Similarly, the "Ending" score consisted of NAEP results for the test administration immediately following a secretary's departure from office. Specifically, this study looked at the nationwide NAEP scores of 13-year-olds. The scores themselves can be viewed here.

Unfortunately, NAEP scores were only available through the year 2008 at the time this study was conducted. Because of this, Margaret Spellings was assigned the latest data available for her "Ending" score, and Arne Duncan was assigned the latest data available for both his "Beginning" and "Ending" scores. ESA fully intends to adjust the VAM scores for both Spellings and Duncan when new NAEP scores become available.

Methodology

The "Beginning" and "Ending" data for the four data points described above were summed, and a total "Beginning" and "Ending" figure was determined for each secretary. An increase in the figure from "Beginning" to "Ending" indicated improvement; a decrease indicated a downgrade in student performance.

Absolute improvement in the data was considered an insufficient measure for establishing whether a secretary of education added value to students during his or her term in office. Instead, ESA researchers determined an average rate of improvement in the data across all secretaries of education. That average rate of improvement--2.8763888889, to be exact--became the target for each secretary of education to attain to, the measure by which all were judged. A secretary who exceeded the average rate of improvement in his or her data was considered to have been an effective educational leader, while a secretary whose rate of improvement fell short of that target was judged to be an ineffective leader.

The actual VAM score of a Secretary of Education is the number of points (positive or negative) difference between his or her rate of improvement from the beginning to the end of his or her term and the average rate of improvement for all secretaries of education.

In order to assist the public in interpreting these VAM scores, clear and easy-to-understand labels were applied to a simple distribution of the scores. All VAM scores ranged from -12.9764 on the low end to 20.62361 on the high end. The label "Superior" was assigned to scores between 1.523611 and 20.62361. The label "Average" was assigned to scores between -2.07639 and 1.523610. The label "Inferior" was assigned to scores from -12.9764 to -2.07638. Additionally, any score below 0--i.e., any VAM that fell below the average rate of improvement for all secretaries of education--was assigned the label "Ineffective," a label which trumped all other labels. (This trumping mechanism was inspired by the policy devised in New York State that requires that, while student test scores account for 40% of a teacher's evaluation, a failing mark on that 40% will invalidate the remaining 60% of the evaluation and require the teacher to be found ineffective.)

You can download the VAM report for all nine secretaries of education here.

For ease of reference, the US secretaries of education were also ranked from first to ninth based on their VAM scores. (click the image for a larger view)

ESA wishes to congratulate former education secretary Richard Riley for the outstanding performance revealed by his value-added score. During his time in office, student data soared to remarkable heights: Riley's data surpassed the VAM target by over 20 points, or roughly 6 times better than the improvement shown by the second-best secretary of education, Lamar Alexander. Current and future secretaries of education in the United States would do well to examine their practices and policies against those of Mr. Riley, who added far more value to American students than any education secretary before or since. Richard Riley was a Blue Ribbon secretary of education.

On a sad note, the data indicates that five out of the nine secretaries of education we have had actually reduced value for their students, forcing researchers at ESA to conclude that the nation would have been better off with no education secretary at all during their terms.

What do you think of the Value Added Method ratings for Secretary Duncan? Does this indicate he has been ineffective during his term?

Image provided by Educators for Shared Accountability, used by permission.