man is the most dangerous game hardest variable to control

As in many other aspects of my intellectual life, when it comes to educational assessment I often play two contradicting roles, depending on my immediate audience. Speaking as someone whose primary scholarly interest is writing assessment, in the academy I’m surrounded by many people who are invested in the traditional values of the humanities, qualitative inquiry, and political skepticism towards many of the forces powering the assessment movement. I often find myself making the case for a careful, internally-skeptical embrace of educational assessment as a natural and inevitable part of the educational enterprise, as well as a tool for egalitarianism if used well. On the other hand, I often have to work the other side of the aisle in my political life. I frequently argue with those who imagine educational testing a simple business and that a test score represent an unambiguous, objective truth. I counsel people to recognize that there are serious empirical hurdles to sorting teacher performance from student-side factors. I warn them that all test design includes value-laden choices that impact interpretation. In fact, I think the relationship between teacher performance and student outcomes is one of the most difficult questions we face in public policy, from the most apolitical perspective of pure social science.

Yet so many people in the discussion insist on treating the relationship between teacher performance and student outcomes as simple, direct, and consistent from one student to the next. Among the abundant acts of dishonesty and distortion in Waiting for Superman, to pick one example, is the absurd analogy of education as pouring knowledge into the heads of children, as if teaching were a simple task of equal difficulty regardless of the student. But that’s par for the course.

One frustrating tic of the ed reform crowd is  to invoke Econ 101 bromides when advancing their solutions. They’ll say, for example, that schools are like factories that create widgets;  let the market sort them out, based on how many customers choose to get their widgets from whichever factory. Those factories that build the best widgets get the most customers, and so those schools that prepare the best students will get the most business. Why, it works for [X consumer product]!

Let’s take that analogy seriously for a second. First, for the sake of this discussion let’s suppose that the sole value of a widget (student) lies in how well it performs academically on quantitative indicators. Now then: if we are really to apply this logic, we need to recognize that factories (schools) don’t create their widgets. Widgets show up at their door, after having spent 9 months being assembled and 4 or 5 years getting finished by the people who created them, or sometimes, by those who have to look out for the widgets because their creators are gone. In those pre-factory years, different widgets are kept in vastly different environments, constrained by historical and  demographic factors that are outside of the control not only of the widgets but of their creators. That inequality persists in the factories, which have widely different resources. Which widgets get assigned to which factories are themselves a matter of conditions that neither the widgets nor their creators nor the employees of the factories control. And the widgets don’t just get handed over to the workers in the factory permanently, either. They get dropped off for 6 to 8 hours a day, and then get handed right back to their creators, or those the creators have enlisted in helping them bring up the widgets.

Most of the creators of the widgets are good people who want the best for their widgets. But they are constrained, as all humans are, by circumstance. Many of them simply can’t provide for their widgets financially and also spend as much time with their widget as they would like to. Many of them just don’t have the resources to give the widgets all the things a young widget needs. Some of them suffer from understandable human problems, like drug addiction, mental illness, or simple loneliness and unhappiness. And I’m sorry to say that, while most of them are truly trying their best, some of them just aren’t good people. Some of them beat their widgets up. Some of them hurt their widgets through neglect. Some, for whatever sad reason, just don’t care about their widgets. It’s a sad fact of life: not all  widgets have good creators, and having a good creator has a big impact on how well that widget will perform.

Now, all of that being true, tell me: how confident would you be that the performance of your widget as a widget was actually an indication of the quality of the employees in the factory?

Some true believers are adamant that, say, value added measures are sufficiently robust to all of this noise to sort out the question. I find that, frankly, laughably optimistic. But at least we should acknowledge: there is an immense amount of construct-irrelevant variance to sort out! This is not a political argument. It’s an argument about empiricism, about how we make knowledge. It’s an argument about what strikes me as a truly unfortunate lack of skepticism in how we interpret and act on quantitative indicators that are, even in the eyes of their creators, limited and contingent metrics.

The degree to which a student’s teachers,  environment, luck, parents, and own behavior influence outcomes is a matter of considerable controversy that will endure. My own belief is that the last three parts of that list are dramatically underrepresented in our conversations, thanks to the fact that we have no real policy levers with which to change them. In that sense,  education policymakers are like the guy looking for his keys near the streetlight, not because that’s where he dropped them but because that’s where it’s easiest to look. But whatever the admixture, it would be nice if we all acknowledged that the question is unresolved, complex, and powerfully important.