Nothing however, was neglected by the anxious father, and by the men of virtue and learning whom he summoned to his assistance, to expand the narrow mind of young Commodus, to correct his growing vices, and to render him worthy of the throne for which he was designed. But the power of instruction is seldom of much efficacy, except in those happy dispositions where it is almost superfluous. -Edward Gibbon, The History of the Decline and Fall of the Roman Empire, 1776.



There are plenty of professions in which individual human performance matters a great deal but is difficult to measure directly in a simple or obvious way. However, when that level of performance becomes important – which usually means there is a huge amount popular interest and/or money at stake – then that creates the incentive for statistics and quant types to plunge into the numbers and develop those metrics.

Teams sports provides a good example of a case in which an individual player’s contributions to the desired end state – victory – can be difficult to assess. But there is interest and money, and so, for example, in the sport of baseball, we have the famous Bill James with his Sabermetrics and Win-Shares, and Michael Lewis’ sketch of Oakland A’s manager Billy Beane’s analytical magic in Moneyball. And you can bet that all other sports now have their own metrics cults and prophets.

But when it comes to the recent fad of measuring individual teacher or school performance, or the efficacy of alternative pedagogical styles, it seems to me that governments are using a completely misguided approach in simply looking at student test scores.

This is because the fair and accurate way to assess teachers is not PC, but whatever way we use must be PC, so PC makes us dumb yet again, and unfortunate and innocent teachers are the victims of the collateral damage who bear the brunt of our collective insanity and unwillingness to come to grips with reality.

And we really need that fair and accurate way to measure teacher performance, because (1) teachers are particularly vulnerable to getting tarnished with an unjustified bad reputation courtesy of teenage angst, self-perpetuating warped perceptions, and the school rumor mill, and (2) the knee-jerk reaction is to go get smarter teachers based on test scores, certification exams, and the place they received their diploma under the assumption that these characteristic will also make them ‘better’ teachers, but (2)(A) there is little evidence to support that assumption (see Education Realist on the subject) and (2)(B) Bizarrely, and most un-PC of all, it will definitely mean the unjustifiable replacement of lots of perfectly adequate black and hispanic teachers by whites and asians but without any likely gains in student achievement. The value and power of a sane and reasonable metric to prevent this inequitable nonsense cannot be overstated.

How should we assess teachers then if we’re going to go beyond end of year standardized test scores? We should borrow a page from finance. We don’t care about a fund manager’s gains alone. We care whether he can consistently beat the market at the same level of risk. That’s called ‘Alpha‘. And we aren’t looking for a teacher’s test scores either, we are looking for his Educational Alpha. How do we find it? We move there in a sequence of steps.

1. From Scores to Yield



First, we should recognize that it’s not a teacher’s fault if a student arrives in his class with knowledge well below the standard expectation, and neither should it be to that teacher’s credit if his student arrives knowing 50% of the class material on day one. That leads us to the concept of ‘value-added‘, which is nothing really new. You generate two standardized exams which are distinct but test the same range of knowledge, and you give the kid one at the beginning of the term and one at the end and measure the difference in scores. That’s analogous to ‘yield’ in finance. If I tell you the price of a stock at the end of the year when you sold it, that means nothing to you unless you also know the price at the beginning of the year when you bought it.

2. From Yield to Expected Yield

But yield isn’t enough. Some horses are thoroughbreds and will get a lot faster in their first racing season. Others are draft-horses and won’t. That’s not the fault of the jockey, that’s the fault of the quality of the material being worked and the hand the trainer is dealt. Kids are the same; some are quick and smart, and some aren’t. We are used to hearing about the awful teachers in America’s urban public schools, but it’s possible that some of them are doing the best that anyone can do with what the kids they’ve got and should be getting medals and applause instead of criticism and disdain. On the basis of yield alone, if you give a teacher a bunch of Einsteins, that instructor is going to look great even if they do nothing, while another teacher given a class full of Beavis and Butthead clones is going to look awful.

So one needs to figure out to what degree one expects a particular child to improve during the course of a term, and then compare the student’s actual performance to that forecast. The expected yield is like the market benchmark in the finance analogy.

3. Generating Expected Yield

But how do we calculate such a forecast for a child? The best we can do is continuously collect a vast amount of data on a wide variety of variables from a large number of students and perform some kind of statistical regression analysis. This regression analysis will show us which variables have the strongest correlation coefficients and are most explanatory and predictive of yields, and a subsequent factor analysis will help tell us which of those variables are strongly correlated with each other so we can reduce the forecasting model to the absolute minimum number of factors while retaining the accuracy of the prediction.

This model isn’t going to be a perfect, but it will probably get us in the right ballpark most of the time for most kids.

The problem is the question of which variables will one be throwing into the statistical sausage factory. If I had to guess, the strongest predictor would be IQ, but good luck giving every kid an IQ test in this climate. Fortunately, there are reasonably good proxies for IQ in the results of certain standardized tests that are given to young students, so that’s one way to get around the politics.

But what about other factors? Some things – like peer groups and life circumstances – are just hard to capture. Other things are easy to capture, but are politically sensitive and tend to give rise to controversy: race, gender, socioeconomic status, family situation, height and weight, etc.

For the latter group of factors, one faces two main problems. The first is that this kind of regression analysis is going to produce some very unpopular and taboo results that contradict some of societies most important pretty lies in a way that will threaten the careers of anyone involved in the process of producing them, and the second is that using those results to generate different profiles and expectations for different students is going to drive the usual suspects completely crazy when they notice certain patterns.

But this is the minimum of what you have to do if you are genuinely interested in measuring teacher quality and performance. The fact that no one is doing it is evidence that, despite all the signalling to the contrary, no one is really interested in measuring teachers if it means we have to look squarely in the face of the part of the problem which lies in the students themselves. I don’t completely agree with Robin Hanson’s quip that “School Isn’t About Learning” but advocating for school quality isn’t about teacher performance if one isn’t willing to adjust completely accurately for the composition of the class that teacher has to manage, and based on the sometimes ugly truths of reality instead of utopian fantasies.

So, the profile and the model is the hard part, but let’s assume we get it done anyway, and for any student we can plug his vitals into the computer and out pops his expected yield. That is like an individualized, custom ‘Beta‘ of 1.

4. From β to Δ

So, for any particular student in a teacher’s class, we have an expected change in subject test score and, at the end of the year, the actual change. The difference is Delta – Δ, and we would expect a lot of statistical noise, and small positive and negative deltas amongst the various students. But we aren’t measuring students, we’re measuring the teacher’s performance, so we need to add up all the student deltas and take an average, . And you would want to normalize the deltas to measure them in terms of the standard deviation of the normal student distribution of test scores for that subject.

5. From to α

One expects a teacher to have good classes and bad classes, and good years and bad years. But if you take all the ‘s and average them as well, then the ups and downs should cancel out, and what you have left is the sustained ability to impact students above or below what would have been expected with a merely ‘average’ teacher. That’s Educational Alpha, that’s fair and accurate, and that’s what we should be measuring. But we’re not.

And there are definitely some political reasons why we’re not, and why we probably won’t be doing it anything in the future either. However, since the No Child Left Behind era we have been collecting oodles of data on students and teachers alike (here’s an example from LA), and while they are still doing this wrong, I’m sure some enterprising statistician among you can extract the Alpha scores through a little clever manipulation of the existing dataset. What would we see then?

Some Predictions.

1. The Null Hypothesis In Education == The Efficient Market Hypothesis

Bryan Caplan has his signalling model of and case against education and Arnold Kling has what he calls the null hypothesis in education (see here: 1, 2, and 3). The basic idea from both concepts is that, on average, school quality, teacher performance, pedagogical style, teacher test scores, and dozens of other usual suspect considerations in fact make very little difference for test scores and life outcomes, and the primary driver of those outcomes is the cognitive talent and character of the student himself, on which the educational system – really any educational system – can only have the smallest of impacts, if any. Mostly, the kids are born bright or dull, and unless you stunt them, they’re going to develop their minds and mental skill at their innate rates, no matter what you do.

In other words, it’s really hard for a teacher to beat the student market.

What that means is that we would predict most Alpha scores to be close to zero, with just a few slightly negative or slightly positive, and I’d guess a bias to the negative since one would reasonably expect it’s easier to skunk an entire class than to bring everyone up above their expected level of performance.

And as with repeatedly successful fund managers, there will be a few teachers with sustained and consistently high alpha scores, and it will be very difficult to explain why, what they are doing that is so special, or whether in fact their cases are mere statistical flukes. In either case, whatever the secret sauce is to their magic, it will prove impossible to replicate and scale across the educator population.

If this is true, then the frame of all our entire education debate and all our over-politicized discourse is completely wrong. And this is something we could, conceivably, discover right now.

Teachers are right to push back against unfair evaluations and obsession with test scores, but they should be agitating for this kind of evaluation program so they can prove their case instead of constantly appearing like they have something embarrassing to hide and are just trying to avoid scrutiny.

2. Losing The Alibis

One of the terrific shames of our age is that PC makes it impossible for most people to speak forthrightly about their core interests lest in the course of conversation they accidentally step on one of a multiplying numbers of taboo land-mines. That gives rise to an insatiable demand for alibi-frames, or cover stories that allow us to ‘justify’ our actions and desires in the modes our society currently tolerates, whether or not they make any sense or correspond to reality.

But if people invent these alibis out of whole cloth, they’ll just be accused of using racist code-words and dog-whistles and such, and so they have little choice but to ride the wave generated by the influential people who control the bounds of respectable discourse and the direction of political policy, and use rhetorical judo to leverage those ‘acceptable concerns’ into a rationale that will also allow them to get a little of what they want too.

Here’s what happened. Education reform advocates, social scientists, and progressive policy makers have been facing down the full standard deviation racial gap in test scores for generations under the assumption of the neurological uniformity of all population groups and the corollary belief that they could close the gap through ‘resources’ (i.e. money) and ‘the best teachers’ and pedagogical methods.

It hasn’t happened. Nothing seems to work. But that hasn’t stopped the reformers who can’t be convinced to pull the plug and thus keep trying increasingly desperate interventions to save their patient. But all of those efforts rely on keeping a certain seductive myth alive: that the explanation for the gap is not genes but because of a certain kind of ‘privilege’ which is that all the smartest teachers with all the positive alpha are locked up in the nice white and Asian suburban schools. And, if only we could get Harvard’s finest to do a single tour in the ghetto before predictably burning out and bailing for jobs in administration or academia, we could solve this problem once and for all.

It’s a fairy tale. But if you keep the myth of untapped alpha alive, don’t be surprised when other people start using it in ways you don’t appreciate. That’s practically the only thing to get a non-progressive initiative accomplished in this political environment.

There is a lot of dissatisfaction with the current public school system and a lot of people want out and the ability to pursue alternatives, but without having to pay for private school on their own, which they can’t afford, or to buy a house in an elite school district, which they also can’t afford. What do these parents really want for their children out of the educational system? Who knows – lots of different things. Some want out from under the government’s thumb so they can choose their own curriculum and disciplinary rules. Others want their kids to have the highest quality peer group. There’s a thousand different desires. But the one thing these parents are allowed to say they want is better quality teachers and better quality schools, relying on the assumption that these things are meaningful concepts and, you know, exist.

That is, they are allowed to say they want to go to a place where the teachers have more alpha. How can you tell them no when you’ve been running a massive ‘get more alpha’ campaign for generations? Hence charters and vouchers and so on. And a brain-dead never-ending education policy debate.

However, when we actually start measuring teacher alphas, and if we fail to reject the null hypothesis in education, then the legitimacy of the frame of all these arguments and alibis and cover stories will suddenly evaporate.

One the one hand, that’s an unfortunate result for someone like me who supports the maximum amount of educational variety, freedom, and entrepreneurship. A genuine free market in education won’t produce a company that can magically make Johnny smarter, but it would satisfy what his parents want, instead of some school board bureaucrat. But progressives will use the result to shut down charters and vouchers as ‘unjustifiable’ based on performance, and thus force everyone into identical public schools for the sake of their collectivist and egalitarian principles and for propagating narratives most compatible with their own ideological perspective. They’ll also stop anything the unions don’t like, such as the evaluations themselves, and experiments like performance pay.

On the other hand, they might just stop obsessing about ‘the gap’ and let schools go back to tracking students by ability so that teachers can have more cognitively homogenous classes, which are easier and more efficient to teach. If we could even catch up with 50 years ago, we’ll move far ahead of where we’re at today.