notes for 3/31/17

  • The first book review should be out for subscribers today! It’s a reprint of an academic review I wrote, but in the future they’ll all be new content. Just need a week to read a couple more books appropriate for this blog. This is the first time I’ve ever distributed a reward through Patreon so please let me know if you didn’t receive an email or can’t access the review. If you’re not a subscriber yet, think it over! I’m exploring some cool options for other rewards and I hope to let you know about some of them soon.
  • Be sure to spread the word about this project if you like it.
  • Some readers have pointed out that the rates for SAT participation in some states are so low (mentioned in this post) because those states require the ACT as a learning assessment. Which is certainly true! But note that the point of that post isn’t to say “look at how low these participation rates are” but rather to explore selection bias, which in the case of ACT-dominant regions would be even more pronounced – only the very motivated students, particularly those looking to attend elite private institutions, would be likely to take the SAT.
  • I have gotten a fair amount of pushback on the idea that randomized trials of charter school efficacy aren’t really random. I agree that this is an idea that I need to explore at greater length in the future. In addition to what I suspect is lurking non-random distribution, I think the bigger question is whether “charter school” even makes sense as a condition suitable for randomization. More to come.
  • On the other side, I appear to have been too kind to the CREDO studies. To call survivorship bias a demonstration of quality on the part of charters is just… not cool.
  • The first Study of the Week post should come out on Monday. It’s a big meaty one and I’m really happy with how it’s shaping up. Not 100% sure but I’m guessing I’ll distribute book reviews on the weekend and do Study of the Week on Monday or Tuesday. And feel free to email me with suggestions or requests.

looking beyond test scores in defense of after school programs

The Trump administration has proposed cutting funding for a program that provides after school programs for low-income students. At the Atlantic, Leah Askarinam defends the programs. I’m on board with continuing to fund them, but I find her defense counterproductive.

Askarinam’s argument is kind of strange. The Brookings Institution ran several large-n studies in the middle of the aughts that showed, without much ambiguity at all, that the quantitative learning gains from these programs are minimal. Askarinam fixates on the age of those studies as a reason to question their validity. It’s true that the latest study on the efficacy of these programs is about a decade old, which isn’t ideal, but also isn’t unusual; it’s really hard, far harder than most people think, to run effective large-scale social science research projects. More to the point, why would we assume that something fundamental has changed in the outcomes of these programs in the past 10 years? She notes that the federal policy situation was different then, but that hardly seems to be sufficiently explanatory for me – the federal education policy situation changes all the time, without seeing systematic differences in student outcomes. (Indeed, the irrelevance of federal education policy to student outcomes is the source of great lamentation.) Consider the standard here: if ten+ year old studies had shown robust learning gains, would Askarinam now say that they were too old to be trusted? Such a standard would cut both ways, after all.

And while it’s true that absence of evidence isn’t evidence of absence, absence of evidence is… absence of evidence. Askarinam offers some anecdotal evidence of academic improvement, discusses internal research, and speaks generally of gains not captured by those older studies. That’s fine as far as it goes, but none of it amounts to responsible evidence for the kind of quantitative gains the Brooking studies were looking for. More study is needed, obviously – you can use that phrase like a comma when you’re talking about ed research – but as with pre-K programs, I think if the question is “are test score and other quantitative gains in outcomes sufficient to justify the expense of publicly-funded after school programs?,” the answer is clearly no.

So am I opposed to funding for after school programs? No, not at all. I just think we should fund them for defensible reasons. Askarinam quotes David Muhlhausen of the conservative Heritage Foundation, “It’s a place to have their kids while the parents are at work,” Muhlhausen said. “That’s the real key to these programs and why they’re popular—not that they provide any benefits to the students. It’s basically a babysitting program for parents who aren’t home.”

Sounds good to me.

The birth of publicly-funded, federally-guaranteed education for children aged 5-18 was one of the greatest advancements in human well-being in history. It helped move millions of children into formal education, providing not only the various benefits of schooling to them but also the essential ancillary benefit of childcare. This in turn made it easier for both parents to work. While we might lament the fact that it’s now necessary for most households to have two incomes to survive, the fact is that it is necessary, and without the free childcare that public schools provide, family life would be impossible for much of the country. Public education also helps our slow, imperfect march towards gender equality. And in a world where digital technologies make it easier and easier to avoid interacting with people who are outside our immediate familial and friend networks, formal schooling can help make the kinds of connections between people from radically different backgrounds that are essential for a functioning democracy.

The cost of these programs is around $1 billion a year, or about one quarter of one percent of what we’ve spent on the failed F-35 jet project in the past 15 years.

In an era of stagnant real incomes for most workers and spiraling costs of housing, healthcare, and higher education, programs that provide safe supervision for children are worth supporting. “Traditional values” conservatives should embrace programs that make child rearing feasible for more families; liberals and leftists should appreciate expanding government assistance and taking more social goods (like childcare) out of the market.

Askarinam’s defensiveness, it seems to me, reflects the way that the widespread acceptance of test score obsession has boxed us in. Too many well-meaning progressives have adopted this reductive view of the purpose of education; they then end up unable to defend programs they favor when the results of those programs on test scores are inevitably small or nonexistent. The universal pre-K debate is a perfect example. The endless back-and-forth involves credible arguments from both supporters and skeptics, but few would question that the test score and other quantitative gains we’re arguing over are modest. So stop arguing through that frame. As long as test scores are taken as the criterion of interest, we’ll be playing defense. Instead, we should argue that the basic benefit of pre-K and after school programs is to provide essential childcare support to struggling families, and to provide social and personal enrichment that has value even if uncorrelated with test score increases. We need to expand our definitions of the purpose of education outside of the quantitative, rather than staying rooted in a frame that often doesn’t help us. Askarinam describes an after school program that offers social and emotional health benefits. That’s worth fighting for on its own. So articulate that case, and do the same with pre-K. Argue from strength, not weakness.

why selection bias is the most powerful force in education

Imagine that you are a gubernatorial candidate who is making education and college preparedness a key facet of your campaign. Consider these two state average SAT scores.

                                    Quantitative            Verbal         Total

Connecticut                   450                       480             930

Mississippi                     530                       550             1080

Your data analysts assure you that this difference is statistically significant. You know that SAT scores are a strong overall metric for educational aptitude in general, and particularly that they are highly correlated with freshman year performance and overall college outcomes. Those who score higher on the test tend to receive higher college grades, are less likely to drop out in their freshman year, are more likely to complete their degrees in four or six years, and are more likely to gain full-time employment when they’re done.

You believe that making your state’s high school graduates more competitive in college admissions is a key aspect of improving the economy of the state. You also note that Connecticut has powerful teacher unions which represent almost all of the public teachers in the state, while Mississippi’s public schools are largely free of public teacher unions. You resolve to make opposing teacher unions in your state a key aspect of your educational platform, out of a conviction that getting rid of the unions will ultimately benefit your students based on this data.

Is this a reasonable course of action?

Anyone who follows major educational trends would likely be surprised at these SAT results. After all, Connecticut consistently places among the highest-achieving states in educational outcomes, Mississippi among the worst. In fact, on the National Assessment of Educational Progress (NAEP), widely considered the gold standard of American educational testing, Connecticut recently ranked as the second-best state for 4th graders and the best for 8th graders. Mississippi ranked second-to-worst for both 4th graders and 8th graders. So what’s going on?

The key is participation rate, or the percentage of eligible juniors and seniors taking the SAT, as this scatter plot shows.

As can be seen, there is a strong negative relationship between participation rate and average SAT score. Generally, the higher the percentage of students taking the test in a given state, the lower the average score. Why? Think about what it means for students in Mississippi, where the participation rate is 3%, to take the SAT. Those students are the ones who are most motivated to attend college and the ones who are most college-ready. In contrast, in Connecticut 88% of eligible juniors and seniors take the test. (Data.) This means that almost everyone of appropriate age takes the SAT in Connecticut, including many students who are not prepared for college or are only marginally prepared. Most Mississippi students self-select themselves out of the sample. The top performing quintile (20%) of Connecticut students handily outperform the top performing quintile of Mississippi students. Typically, the highest state average in the country is that of North Dakota—where only 2% of those eligible take the SAT at all.

In other words, what we might have perceived as a difference in education quality was really the product of systematic differences in how the considered populations were put together. The groups we considered had a hidden non-random distribution. This is selection bias.


My hometown had three high schools – the local coed public high school (where I went), and both a boys and girls private Catholic high school. People involved with the private high schools liked to brag about the high scores their students scored on standardized tests – without bothering to mention that you had to score well on such a test to get into them in the first place. This is, as I’ve said before, akin to having a height requirement for your school and then bragging about how tall your student body is. And of course, there’s another set of screens involved here that also powerfully shape outcomes: private schools cost a lot of money, and so students who can’t afford to attend are screened out. Students from lower socioeconomic backgrounds have consistently lower performance on a broad variety of metrics, and so private schools are again advantaged in comparison to public. To draw conclusions about educational quality from student outcomes without rigorous attempts to control for differences in which students are sorted into which schools, programs, or pedagogies – without randomization – is to ensure that you’ll draw unjustified conclusions.

Here’s an image that I often use to illustrate a far broader set of realities in education. It’s a regression analysis showing institutional averages for the Collegiate Learning Assessment, a standardized test of college learning and the subject of my dissertation. Each dot is a college’s average score. The blue dots are average scores for freshmen; the red dots, for seniors. The gap between the red and blue dots shows the degree of learning going on in this data set, which is robust for essentially all institutions. The very strong relationship between SAT scores and CLA scores show the extent to which different incoming student populations – the inherent, powerful selection bias of the college admissions process – determine different test outcomes. (Note that very similar relationships are observed in similar tests such as ETS’s Proficiency Profile.) To blame educators at a school on the left hand side of the regression for failing to match the schools on the right hand side of the graphic is to punish them for differences in the prerequisite ability of their students.

Harvard students have remarkable post-collegiate outcomes, academically and professionally. But then, Harvard invests millions of dollars carefully managing their incoming student bodies. The truth is most Harvard students are going to be fine wherever they go, and so our assumptions about the quality of Harvard’s education itself are called into question. Or consider exclusive public high schools like New York’s Stuyvesant, a remarkably competitive institution where the city’s best and brightest students compete to enroll, thanks to the great educational benefits of attending. After all, the alumni of high schools such as Stuyvesant are a veritable Who’s Who of high achievers and success stories; those schools must be of unusually high quality. Except that attending those high schools simply doesn’t matter in terms of conventional educational outcomes. When you look at the edge cases – when you restrict your analysis to those students who are among the last let into such schools and those who are among the last left out – you find no statistically meaningful differences between them. Of course, when you have a mechanism in place to screen out all of the students with the biggest disadvantages, you end up with an impressive-looking set of alumni. The admissions procedures at these schools don’t determine which students get the benefit of a better education; the perception of a better education is itself an artifact of the admissions procedure. The screening mechanism is the educational mechanism.

Thinking about selection bias compels us to consider our perceptions of educational cause and effect in general. A common complaint of liberal education reformers is that students who face consistent achievement gaps, such as poor minority students, suffer because they are systematically excluded from the best schools, screened out by high housing prices in these affluent, white districts. But what if this confuses cause and effect? Isn’t it more likely that we perceive those districts to be the best precisely because they effectively exclude students who suffer under the burdens of racial discrimination and poverty? Of course schools look good when, through geography and policy, they are responsible for educating only those students who receive the greatest socioeconomic advantages our society provides. But this reversal of perceived cause and effect is almost entirely absent from education talk, in either liberal or conservative media.

Immigrant students in American schools outperform their domestic peers, and the reason is about culture and attitude, the immigrant’s willingness to strive and persevere, right? Nah. Selection bias. So-called alternative charters have helped struggling districts turn it around, right? Not really; they’ve just artificially created selection bias. At Purdue, where there is a large Chinese student population, I always chuckled to hear domestic students say “Chinese people are all so rich!” It didn’t seem to occur to them that attending a school that costs better than $40,000 a year for international students acted as a natural screen to exclude the vast number of Chinese people who live in deep poverty. And I had to remind myself that my 8:00 AM writing classes weren’t going so much better than my 2:00 PM classes because I was somehow a better teacher in the mornings, but because the students who would sign up for an 8:00 AM class were probably the most motivated and prepared. There’s plenty of detailed work by people who know more than I do about the actual statistical impact of these issues and how to correct for them. But we all need to be aware of how deeply unequal populations influence our perceptions of educational quality.

Selection bias hides everywhere in education. Sometimes, in fact, it is deliberately hidden in education. A few years ago, Reuters undertook an exhaustive investigation of the ways that charter schools deliberately exclude the hardest-to-educate students, despite the fact that most are ostensibly required to accept all kinds of students, as public schools are bound to. For all the talk of charters as some sort of revolution in effective public schooling, what we find is that charter administrators work feverishly to tip the scales, finding all kinds of crafty ways to ensure that they don’t have to educate the hardest students to educate. And even when we look past all of the dirty tricks they use – like, say, requiring parents to attend meetings held at specific times when most working parents can’t – there are all sorts of ways in which students are assigned to charter schools non-randomly and in ways that advantage those schools. Excluding students with cognitive and developmental disabilities is a notorious example. (Despite what many people presume, a majority of students with special needs take state-mandated standardized tests and are included in data like graduation rates, in most locales.) Simply the fact that parents typically have to opt in to charter school lotteries for their students to attend functions as a screening mechanism.

Large-scale studies of charter efficacy such as Stanford’s CREDO project argue confidently that they have controlled for the enormous number of potential screening mechanisms that hide in large-scale education research. These researchers are among the best in the world and I don’t mean to disparage their work. But given the enormity of the stakes and the truth of Campbell’s Law, I have to report that I remain skeptical that we have truly ever controlled effectively for all the ways that schools and their leaders cook the books and achieve non-random student populations. Given that random assignment to condition is the single most essential aspect of responsible social scientific study, I think caution is warranted. And as I’ll discuss in a post in the future, the observed impact of school quality on student outcomes in those cases where we have the most confidence in the truly random assignment to condition is not encouraging.

I find it’s nearly impossible to get people to think about selection bias when they consider schools and their quality. Parents look at a private school and say, look, all these kids are doing so well, I’ll send my troubled child and he’ll do well, too. They look at the army of strivers marching out of Stanford with their diplomas held high and say, boy, that’s a great school. And they look at the Harlem Children’s Zone schools and celebrate their outcome metrics, without pausing to consider that it’s a lot easier to get those outcomes when you’re constantly expelling the students most predisposed to fail. But we need to look deeper and recognize these dynamics if we want to evaluate the use of scarce educational resources fairly and effectively.

Tell me how your students are getting assigned to your school, and I can predict your outcomes – not perfectly, but well enough that it calls into question many of our core presumptions about how education works.

welcome to the ANOVA

Hi there, my name is Freddie deBoer. I’ve been blogging off and on since 2008. I’ve also written for many newspapers, magazines, and websites. (You can see some of my published writing by clicking the My Work tab above.) In my professional life, I work at Brooklyn College in the City University of New York in the Office of Academic Assessment, where I work with faculty to help them develop and implement faculty-led assessments of student learning, and as coordinator of the Writing Across the Curriculum program. This project, a new blog called the ANOVA, is designed to combine those two parts of my life while narrowing and focusing my engagement.

The ANOVA will be about education research and education policy. That way, I can continue to work and research in education in my professional life, and take the reading and engagement I’m doing and make them useful for a popular audience. I will discuss major trends in education, legislation and federal policy related to education, new and existing research in the field, and the philosophy and purpose of education. I expect I will post 3-4 times a week. One of these posts will be a Study of the Week, where I look at a prominent, problematic, or interesting research study in education, whether old or new, discussing the findings and what they mean for the broader world.

I will be attempting to monetize this blog through Patreon, so please consider pledging to support this project financially. Those who contribute $5 a month or more will get access to a weekly book review. If the amount of contributions exceeds my expectations, I will think of other ways to reward patrons. You can also make a one-time donation on PayPal.

Why “the ANOVA”? Because the term, which stands for Analysis Of VAriance, refers to a statistical technique commonly used in education research; because the attempt to define how variance in educational outcomes are determined by predictor variables is perhaps the essential question in quantitative study of education; and because it’s a beautiful word.

I will not avoid talking about the political dimensions of education. Education is an inherently political topic. However, this will not be a political blog and will feature no political writing that is not narrowly focused on education. I will not, for example, weigh in on the campus political wars in this space. When in doubt, I will err on the side of not engaging if a subject is not clearly directly concerned with education. Please bear that in mind if you’re thinking about contributing. It should go without saying that this project will not be affiliated with or endorsed by Brooklyn College in any way, and that I will not be working on it during my regular work hours.

I’ve gotten a lot out of writing online, but it has had downsides, especially concerning people targeting my employment. Online politics, are not good for my mental well-being. As someone with poor impulse control and bipolar disorder, it’s best to limit my political engagement in digital mediums that favor immediacy over thoughtfulness. I also have found much better ways to utilize my political energy in recent months. Since moving to New York I’ve gotten involved in my own union, in a tenant’s union, and in local education politics, along with attending many protests. This has been wonderful for my mood and sense of political purpose. Online politics leave me discouraged and unhappy; offline politics make me hopeful and energized. So I intend to keep my political engagement squarely offline.

This is a modest project with modest goals. I want an outlet where I can write for a small audience of interested people and share a little of my expertise and my opinions. I’m hoping to carve out a niche where I can engage productively and professionally about topics related to my expertise and which I am passionate about. I hope you join me.