is there such a thing as static teacher quality?

I have been arguing for some time now that there’s a fundamental and essential missing step in our educational debates, which is trying to better understand how dependent individual student outcomes in performance metrics are on teacher inputs. It’s an immensely important question. For all of the effort school reformers have invested in treating education policy as a simplistic matter of ECON 101, there’s simply a broad difference between the kind of control a factory or worker has over the quality of a widget and the kind of control a school or teacher has over the outcomes of a student. The former simply enjoys far, far more control than the latter.

I believe that a preponderance of the extant evidence suggests that in fact student-side variables are vastly more determinative than teacher-side variables, that most people involved in the debate significantly overestimate how much control a teacher actually has over outcomes, and that this has to condition how we approach education on the level of pedagogy and on the level of policy. For me, that is purely speaking from an empirical and theoretical framework, totally independent on my thoughts on unions and workplace fairness. But when I make this argument, I am typically accused of protecting a favored political consituency

So consider this report that comes courtesy of Diane Ravitch. The post is simply essential and I encourage absolutely anyone to read the whole thing. I find it comprehensively damning of New York’s current teacher assessment efforts, describing basic failures to utilize appropriate statistical and sampling techniques. So do read the whole thing. But I want to pay attention to this essential question– can we identify such a thing as static teacher quality that operates independent of student inputs and which can be fairly assessed in a context in which the portion of the variance controlled by teachers might be quite small? Consider this passage.

There was no correlation between a teacher’s value-add score from year to year. It was, in fact, close to random. A teacher in the 90+ percentile one year had only a 1 in 4 chance of remaining there the next. A teacher in the bottom 10% had only a 7% chance of remaining there the following year. Only 7% of teachers landed above the median for 3 years in a row, with lots of movement between the upper half and the bottom third. Predictions about future student achievement assumed by the formula were not accurate. Scores were biased against teachers of high performing students. There was a 3+:1 ratio of teachers who taught high-performing students rated below average versus above average. A single extra question correct on the exams of a teacher in this group raised their value-add score by 10-20 points while an incorrect answer lowered their score by 20-50 points.

The reports did not control for school level factors and class size. For example, a teacher was 7.3% less likely to receive a good rating for each additional student increase in average class size. A teacher’s score one year predicted only 5-8% of the next year’s score. The value-added scores of teachers who taught similar groups of students with similar pre-test scores for two years in a row showed almost no correlation. 43% of teachers with very high value-add scores in 2009 did not meet that mark in 2010. Of the thousands of teachers in the top 20% in 2005-06 only 14 math teachers and 5 ELA teachers remained there each year through 2009-10.

Now, these numbers are particularly stark, but this is not really a surprising result, if you been paying attention. Why did New York end its teacher performance pay program in the first place? In large part because of incoherent results: teachers would be rated as terrible in one class and excellent in another, within the same semester. Teachers that had been among the top performers one year would be among the worst performers the next. Teachers that were believed by administrators and parents to have serious performance issues would be rated highly; teachers that were believed by administrators and parents to be among a school’s best would be rated poorly. On and on.

Now this is not an argument that teachers don’t matter at all or that there’s no difference between one teacher and the next. The question is, given the definition of teacher quality as merely the ability to improve student standardized test scores, whether meaningful, statistically robust, and consistent metrics for teacher quality can ever be developed. I’m increasingly skeptical. In a more holistic, natural sense, I think teacher quality is real and important. But unfortunately, most of those within the reform world are uninterested in any definition of quality that does not make it easier to break teacher unions or fire teachers.

More than anything, I would like to be able to talk about whether teacher quality can ever really be validly and reliably separated from student inputs without being accused of politicizing or arguing in bad faith, and without people making appeals to incredulity.


  1. I could imagine teacher input being a relatively small contributor to student performance compared to student-side inputs, but it’s like Kevin Drum’s Lead Crime Argument: it doesn’t really suit the needs of the various parties who really argue about the topic. It almost suggests that schooling doesn’t need to be more than cheap and perfunctory, because as long as the students aren’t messed up they’ll learn one way or another.

  2. Although the Ravitch article is indeed full of interesting data about the Bloomberg reforms, and has a lot to say about how the reforms have been implemented unequally and unfairly, I think on the point of teacher performance, it has very little to add to a difficult question. In particular, I’m looking at her dismissal of value-add ratings at the same time that she makes sweeping use of student standardized test performance data to measure the effectiveness of new teachers from programs such as TFA and the NY Teaching Fellows program. Maybe I’m missing something, but isn’t this just about having your cake and eating it too?

    Unfortunately, at the end of the day we need a way to evaluate teacher performance. It may not be the method envisioned by Bloomberg (and probably not), and it may even be impossible to reach a level of reliability and consistence that this blogger can conscience. However, it is the only way to maintain accountability in a system that desperately needs it. Not to make an appeal to incredulity, just my two cents.

Leave a Comment

Your email address will not be published. Required fields are marked *