Study of the Week: When It Comes to Student Satisfaction, Faculty Matter Most

In an excellent piece from years back arguing against the endless push for online-only college education, Williams College President Adam Falk wrote:

At Williams College, where I work, we’ve analyzed which educational inputs best predict progress in these deeper aspects of student learning. The answer is unambiguous: By far, the factor that correlates most highly with gains in these skills is the amount of personal contact a student has with professors. Not virtual contact, but interaction with real, live human beings, whether in the classroom, or in faculty offices, or in the dining halls. Nothing else—not the details of the curriculum, not the choice of major, not the student’s GPA—predicts self-reported gains in these critical capacities nearly as well as how much time a student spent with professors.

I was always intrigued by this, and it certainly matched with my experience as both a student and instructor – that what really makes a difference to individual college students is meaningful interaction with faculty. I have always wanted the ability to share this impression in a persuasive way. But Falk was making reference to internal research at one small college, and I had no way to responsibly discuss those findings. We just didn’t have the research to back it up.

But we do now, and we have for a couple years, and that research is our Study of the Week.

The Gallup-Purdue Index came together while I was getting my doctorate at the school. (Humorously, it was always referred to as the Purdue-Gallup Index there and no one would ever make note of the discrepancy.) It’s the biggest and highest-quality survey of post-graduate satisfaction with one’s collegiate education ever done, with about 30,000 responses in both 2014 and 2015. The survey is nationally representative across different educational and demographic groups. That might not leap out at you, but understand that in this kind of longitudinal survey data, that’s all very impressive. It’s incredibly difficult to get this a good response rate in that kind of alumni research. For example, a division of my current institution sent an alumni survey out seeking some simple information on post-college outcomes. They restricted their call only to those who they had contact information known to be less than two years old and who had had some form of interaction with the university in that span. Their response rate was something like 3%. That’s not at all unusual. The Gallup-Purdue Index had an advantage: they had the resources necessary to provide incentives for participants to complete the survey.

Why student satisfaction and not, say, income or unemployment rate? To the credit of the researchers, they wanted to use a deeper concept of the value of college than just pecuniary. Treating college as a simple dollars-and-cents investment happens in ed talk all the time – take this piece by Derek Thompson on the value of elite schools, relative to others, once you control for student ability effects. (There isn’t much.) I get why Thompson frames it this way, but this sort of thinking inevitably degrades the humanistic values on which college was built. It also unfairly discounts the value of the humanities and arts, which often attract precisely the kinds of students who are most likely to choose non-financial reward over financial. There’s still a lot of value talk in there – this was, it’s worth mentioning, a Mitch Daniels-influenced project, and Mitch is as corporate as higher ed gets – but I’ll take it as an antidote to Anthony Carnevale-style obsession with financial outcomes.

So what do the researchers find? To begin with: from the perspective of student satisfaction, American colleges and universities are doing a pretty good job – unless they’re for-profit schools.

alumni rated on a five-point scale whether they agreed their education was worth the cost. Given that many families invest heavily in higher education for their children, there should be little doubt about its value. However, only half of graduates overall (50%) were unequivocally positive in their response, giving the statement a 5 rating on the scale ranging from strongly disagree (1) to strongly agree (5). Another 27% rated their agreement at 4, while 23% gave it a 3 rating or less.

This figure varies only slightly between alumni of public universities (52%) and alumni of private nonprofit universities (47%), but it drops sharply to 26% among graduates of private for-profit universities. Alumni from for-profit schools are disproportionately minorities or first-generation college students and are substantially more likely than those from public or private nonprofit schools to have taken out $25,000 or more in student loans.

This tracks with a broader sense I’ve cultivated for years, particularly in assembling my dissertation research on standardized testing of college learning, that the common perception that we are in some sort of college quality crisis is incorrect. I suspect that the average college graduate, as opposed to student, learns a great deal and goes on to do OK. The problems, financial and otherwise, mostly afflict the students we have to sadly label “some college” – those who have taken on student loan debt for degrees that they then don’t complete. They are a tragic case and, like student loan borrowers writ large, they desperately need government-provided financial relief.

(Incidentally, this dynamic where people think there’s a crisis with colleges writ large but rate their own school highly happens with parents and public schools too.)

Most importantly than the overall quality findings, this research tells us what ought to have always been clear: that faculty, and the ability for faculty to form meaningful relationships with students, are the most important part of a satisfying education. Check it out.

The three strongest predictors of seeing your college education as worth the cost are about faculty. And not just about faculty in general, but about relationships with faculty, the ability to have meaningful interactions with them.

There are certainly conclusions you could draw from this that not all faculty would like. In particular, this would lend credence to those who think that faculty at research universities are too disconnected from their students because of the drive for prestige and the pressure to publish. But two conclusions that most any faculty members could get on board with are these: one, that the adjunctification of the American university cuts directly against the interests of students, and two, that online-only or online-dominant education cuts students out of the most important aspect of college.

The all-adjunct teaching corps is bad not because adjuncts are bad educators or because they don’t care; many adjuncts are incredibly skilled teachers who care deeply for their students. Rather, adjuncts must teach so much, and balance such hectic schedules, that reaching out to students personally in this way is just not possible. At CUNY, I believe the rule is that adjuncts can work three classes on one campus and two at another. I know it’s difficult for people who don’t teach to understand how much work and time a college class takes up, but 5 classes in a semester is a ton. (Even at my most efficient, as a teacher I could never spend less than about three hours out of class for every one hour in class, between lesson planning, logistics, and grading.) Add to that having to go from, say, Hostos Community College in the Bronx to Brooklyn College here in south Brooklyn, and you can get a sense of the schedule constraints. Many adjuncts don’t even have an office space to meet students in outside of class. How could they be expected to form these essential relationships, then?

Online courses, meanwhile, cut the face-to-face interaction entirely. Can those relationships still be fostered? I’m sure it’s not impossible, but I’m also sure that it’s much, much harder. People have expressed confusion to me over my skepticism towards online classes for a long time, but I’m confused by their confusion. Teaching is a social activitiy. There seems to be a large number of aspects of face-to-face teaching that we don’t even attempt to replicate in online spaces. As both a teacher and a student, I have found my online courses to be deeply alienating, lacking any of the organic sense of mutual commitment that is essential to good pedagogy. My biggest concern is the lack of human accountability in online education. As an educator, I often felt that my first and most important role was not as a dispenser of information, much less as an evaluator of progress, but as a human face – one that brought with it the sort of sympathetic responsibility that underwrites so much of social life. What I could offer to students was support, yes, but of a certain kind: the support that implies reciprocity, the support that comes packaged with the expectation that the student would then recognize his or her commitment to doing the work. Not for me, but for the project of education, for the process.

I know that this is all the sort of airy talk that a lot of policy and technology types wave away. But in a completely grounded and pragmatic sense, in my own teaching the students who did best were those who demonstrated a sense of personal accountability to me and my course. They were the ones who realized that a class is an agreement between two people, teacher and student, and that each owed something to the other. How could I foster that if I never saw anything else beyond text and numbers on a screen? And note too that the financial motivation for online courses is often put in stark terms: you can dramatically upscale the number of students in an online course per instructor. Well, you can’t dramatically upscale a teacher’s attention, and there is no digital substitute for personal connection.

Does this mean there’s no place for online courses? No. In big lecture hall-style classes, there isn’t a lot of mutual accountability and social interaction, either. I don’t doubt that online courses can be part of a strong education. But I feel very bad for any student who is forced to go through college without repeated and extended opportunities for faculty mentorship – particular students who aren’t among the top high school graduates, given the durable finding that the students who do best in online courses are those who need the least help, likely because they already have the soft skills so many in the policy world take for granted.

The contemporary university is under enormous pressures, external and internal. We ask it for all sorts of things that cut against each other – educate more students, but be more selective; keep standards high, but graduate a higher percentage; move students through the system more cheaply, more quickly, and with higher quality. Meanwhile we lack a healthy job market for those who don’t have a college education, making the pressure only more intense. There are no magic bullets for this situation. But it is remarkable that so little of our attempts to make progress involve recognizing that the teachers are the heart of any institution of learning. We’ve systematically devalued the profession, doing everything we can to replace experienced, full-time, career academics with cheaper alternatives. Perhaps it’s time to listen to our intuition and to our alumni and begin to rebuild the professoriate, the ones who will ultimately shepherd our students to graduation. If we’re going to invest in college, let’s start there.

selection bias: a parable

To return to this blog’s favorite theme, how selection bias dictates educational outcomes, let me tell you a true story about a real public university in a large public system. (Not my school, for the record.) Let’s call it University X. University X, like most, has an Education department. Like at most universities, University X’s Education majors largely plan on becoming teachers in the public school system of that state. Unfortunately, like at many schools across the country, University X’s Education department has poor metrics – low graduation rate, discouraging post-graduation employment and income measures, poor performance on assessments of student learning relative to same-school peers. (The reasons for this dynamic are complex and outside of the scope of this post.)

The biggest problem, though, is this: far too many of University X’s graduates from the Education program go on to fail the test that the state mandates as part of teacher accreditation. This means that they are effectively barred from their chosen profession. Worse, going to another state to work as a public teacher is often not feasible, as accreditation standards will sometimes require them to take a year or more of classes in that state’s system post-graduation to become accredited. So you end up with a lot of students with degrees designed to get them into a profession they can’t get into, and eventually the powers that be looked askance.

So what did University X’s Education department do? Their move was ingenious, really: they required students applying to the major to take a practice version of the accreditation test, one built from using real questions on the real test and otherwise identical to the real thing. They then only accepted into the major those students who scored above benchmarks on the test. And the problem, in a few years, was solved! They saw their graduates succeed on the accreditation test at vastly higher rates. Unfortunately, they also saw their number of majors decline precipitously, which in turn put them in a tough spot with the administration of University X, who used declining enrollment as a reason to cut their funding.

Now here’s the question for all of you: was introducing the screening mechanism of the practice accreditation test a cynical ploy to artificially boost their metrics? Or by preventing students from entering a program designed to lead to a job they ultimately couldn’t get, were they doing the only moral thing?

the Official Dogma of Education (version 1.0)

What follows is my first crack at articulating what I call the Official Dogma of Education. The Official Dogma is a set of presumptions and values that operate in the background of our educational discourse and which are accepted as true by most everyone without often being voiced out loud. The Official Dogma is ideology, in the old sense – that is, these are political stances that are not recognized as such; they are human points of view that are unconsciously accepted as truths of the universe. The idea is not that any individual person has ever expressed these beliefs explicitly. Indeed, the point of the Official Dogma is that its tacit nature helps to make it impossible for us to examine and critique it. The Official Dogma and its constituent elements are not universally accepted by all individuals, but an embrace of something like the Official Dogma is bipartisan, cross-ideological, and generally uncontroversial in contemporary American life. In particular, the Official Dogma is the doctrine of institutions. It is the philosophy of the nonprofits, the corporations, the political parties, the unsigned editorials and the corporate mission statements, the institutional cultures of organizations that shape policy. The Official Dogma, in other words, is the educational philosophy of managerialism, which is the truly dominant ideology of our times.

I expect that I will tinker with and refine this many times in the future.

The Official Dogma of Education

1. All students regardless of context have essentially the same prerequisite ability to meet arbitrary performance benchmarks in all educational tasks. The persistence of variation in academic outcomes is the result of pathology, whether systemic (bad schools, bad teachers) or individual (bad work ethic, lack of grit, refusal to delay gratification).

2. Academic outcomes are permanently and universally plastic; that is, no matter where they are currently, any given student or group of students can be moved to any rank or performance benchmark in any given academic ranking or task.

3. As there is limited or no ability to affect parents and parenting through policy, parents and parenting are not to be discussed in consideration of academic outcomes.

4. Academic outcomes are dominantly or exclusively the product of school-side variables or teacher-side variables, not student-side. That is, teachers and schools control most or all of the variation in quantitative educational metrics for any given student.

5. Accordingly, teachers and schools are to blame for achievement gaps between groups, a failure to excel in international academic comparisons, or a general sense that students are not learning sufficiently. Efforts to address these issues are thus to be exclusively focused on teachers, schools, and their presumed failings.

6. The purpose of education, from a policy perspective, is predominately or purely financial/vocational; civic education, humanistic inquiry, socialization, aesthetic appreciation, cultivation of emotional intelligence or compassion, or similar are presumed to be of secondary importance if they are deemed important at all.

7. To the degree that the non-financial/vocational virtues listed in point 6 are to be valued, they are to be valued as proxies/predictors, not for their own sake. Skills in subjects like the arts or humanities are presumed to be valuable only to the degree that they buttress skills that are measured quantitatively and are already valued by the policy apparatus.

8. Quantitative indicators are presumed to be most predictive of the financial/vocational success that is the first or only priority of education and thus quantitative indicators are to take first priority in policy discussion.

9. Education is both a system for creating broad societal equality and for separating individuals into rigid tiers of relative performance. The tensions between these functions are to remain unexamined.

10. Our educational policy succeeds when it improves the academic performance of all students, and when individual students rise above and leave their peers behind. The tensions between these goals are to remain unexamined.

11. Education is the cure for poverty at the societal level no matter what the empirical evidence tells us about the relationship between the two.

12. Education is the cure for income inequality at the societal level no matter what the empirical evidence tells us about the relationship between the two.

13. Education is the cure for slow economic growth at the societal level no matter what the empirical evidence tells us about the relationship between the two.

14. Relative performance on international comparisons of educational outcomes dictates relative economic performance between countries no matter what the empirical evidence tells us about the relationship between the two.

15. Economic and social inequalities between students may exist, but they are never to be seen as equally dispositive of student outcomes as school or teacher quality. Dwelling on economic and social inequalities between students represents an attempt to evade accountability and demonstrates a lack of commitment to educational progress.

16. Education is always to be considered the cure for those economic and social inequalities. Addressing economic and social inequalities is never to be considered a necessary step in addressing inequalities in educational outcomes.

17. The primary or sole determiner of overall education quality in a given society is that society’s will. A society that genuinely commits to achieving particular educational outcomes will do so. A society that has not achieved particular educational outcomes has not really committed to doing so. Therefore the task of improving education lies primarily or solely in marshaling the political will to do so, and the enemies of educational progress are those who question, critique, or otherwise oppose the agenda of those who accept the Official Dogma as truth.

FYI

Hey all, I just wanted to say that my old Purdue email address is officially dead, despite my protestations. Some few people still communicate with me there. Please switch to freddie7 AT gmail DOT com for the future. If you’re writing about issues relating to my job at Brooklyn College, you can email me at fredrik.deboer AT brooklyn.cuny.edu.

it turns out we’re all locked up in here together

In this Washington Post piece on American University shutting down a fraternity event for appropriating the 17th-century French term “bourgeois,” I discover a sentence that has left me sitting in stunned silence, overcome with awe and sincere admiration.

“I want to continue empowering a culture of controversy prevention among [Greek] groups.”

This sentence, by assistant director of fraternity and sorority life Colin Gerker, is in its own way achingly beautiful, if only for its ability to pack in so much information about late capitalism and our culture without really intending to. I could not machine together such a sentence if I labored on it for a year. It is perfect.

I am studiously avoiding the campus politics wars on this blog. But I am willing to talk about the nature of the modern university. So many campus controversies are represented as clashes between different kinds of people – liberals vs. conservatives, activists vs. educators, the black bloc vs. the alt right. But all of those groups, ultimately, are powerless within the system. Neither Milo Yiannopoulos’s little brood nor the black hoodies that came to meet them will decide the future of college. The 21st century university is owned by the chief litigation officers, by the media liaisons, by the marketing department. Whose values will win? What do values have to do with it? Somebody’s crisis response manual somewhere, carefully put together through the actuarial science of risk prevention, says who wins and who loses on campus. I hate to say I told you so.

People ask me questions. How are the kids these days, really? Are they principled activists or coddled children of affluence? Are they really so deeply opposed to free speech and intellectual freedom? When I read some strategic action plan, put together by a consultant who learned about intersectionality from a PowerPoint at a conference about cultivating the alumni donors of tomorrow, I feel compelled to answer back… what’s the difference?

Study of the Week: Discipline Reform and Test Score Mania

This week’s study considers how quantitative educational indicators (read: test scores) are affected by serious disciplinary action against students.

The context

We’re in the midst of a criminal justice reform movement in this country. The regularity of police killing, particularly of black men, and our immense prison population have led to civic unrest and widespread perception that something needs to change. We’ve even seen rare bipartisan admissions that something has gone wrong, at least to a point. But we’ve made frustratingly little in the way of actual progress thus far.

One of the salutary aspects of this movement has been the insistence, by activists, as seeing crime and policing as part of broader social systems. You can’t meaningfully talk about how crime happens, or why abusive policing or over-incarceration happen, without talking about the socioeconomic conditions of this country. In particular, there’s been a great deal of interest in the “school to prison pipeline,” the way that some kids – particularly poor students of color – are set up to fail by the system. One aspect of our school system that clearly connects with criminal justice reform is our discipline system. Students who receive suspensions and other serious disciplinary action frequently struggle academically, and they are disproportionately likely to be students of color. As activists have argued, in many ways these students begin their lives in an overly punitive system and continue to suffer in that condition into adulthood.

In an era of test score mania, it’s inevitable that people will ask – how does academic discipline impact test scores? And how can we assess this relationship when there are such obvious confounds? In this week’s paper, the University of Arkansas’s Kaitlin P. Anderson, Gary W. Ritter, and Gema Zamarro attempt to explore that relationship, and arrive at surprising results.

The data

What’s really remarkable about this research is the size and quality of the data set, summarized like this:

This study uses six years of de-identified demographic, achievement (test score), and disciplinary data from all K-12 schools in Arkansas provided by the Arkansas Department of Education (2008-09 through 2013-14). Demographic data include race, gender, grade, special education status, limited English proficiency-status, and free-and-reduced-lunch (FRL) status.

That’s some serious data! We’re talking in the hundreds of thousands of observed students, with longitudinal (multiple observations over time) information and a good amount of demographic data for stratification. I’d kill for a data set like this. (If I could get such anonymized data for students who go through NYC public schools and enroll in CUNY, man.)

Why does the longitudinal aspect matter? Because of endogeneity.

The arrow of causation again, or endogeneity

In last week’s Study of the Week, I pointed out that experimental research in education is rare. Sometimes this is a resource issue, but mostly it’s an issue of ethics and practicality. Suspensions and their impact on academic performance are a perfect example: you can’t go randomly assigning the experimental condition of suspension to kids. That means that the negative academic outcomes typically associated with suspensions, discussed in the literature review of this study, might be caused by them. But it may be that kids who struggle academically are more likely to be suspended. You might presume that if one follows another that’s sufficient to prove causation, but what if there was some preceding event that caused both? (Parents divorcing, say.) It’s tricky. Because experimental designs physically intercede and are randomly controlled, they don’t have this problem, but again, nobody should be suspending kids at random in the name of science.

This research question has an endogeneity problem, in other words. Endogeneity is a fancy statistical term that, like many, is often not used in a particularly precise way. The Wikipedia for it is awful, but the first couple pages here are a good resource. In general, endogeneity means that something in your model is reliant on something else in your model but that’s not expressed in your model’s studied relationship. That is, there’s a hidden relationship within your model that potentially confounds your ability to assess causation. Often this is defined as your error term being correlated with your independent variable(s) (your input variables, the predictors, the variables you suspect may influence the output variable you’re looking at).

Say you’re running a simple linear regression analysis and your model looks at the relationship between income and happiness as expressed on some interval scale. Your model will always include an error term, which contains all the stuff impacting your variable of interest (here happiness) that’s not captured by your model. That’s OK – real world models are never fully explanatory. Uncontrolled variability is inevitable and fine in many research situations. The trouble is that some of the untested variables, the error portion, are likely to be correlated with income. If you’re healthy you’re likely to have a better income. If you’re healthy you’re likely to be happier (depending on type of illness). If you just plug income in as a predictor of happiness and income correlates with health and health correlates with happiness then you can end up overestimating the impact of income. If you’re really just looking for associations, no harm done. But if you want to make a causal inference, you’re asking for trouble. That’s (one type of) endogeneity.

Now, there, you have an endogeneity problem that could be solved by putting more variables into your model, like some sort of indicator of health. But sometimes you have endogeneity that stems from the kind of arrow of causation question that I’ve talked about in this space before. The resource I linked to above details a common example. Actors who have more status are perceived as having more skill at acting. But of course having more skill at acting gives you more status as an actor. Again, if you’re just looking for association, no problem. But there’s no way to really dig out causation – and it doesn’t matter if you add more variables to the model. That problem pops up again and again in education research.

Endogeneity is discussed at length in this paper. The study’s authors attempt to confront it, first, by throwing in demographic variables that may help control for unexplained variation, such as whether students qualify for federal school lunch assistance (a standard proxy for socioeconomic status). They also use a fixed effects panel data model. Fixed effects models attempt to account for unexplained variation by looking at how particular variables change over time for an individual research subject/observation. Fixed effect data is longitudinal, in other words, rather than cross-sectional (looking at each subject/observation only once). Why does this matter? There’s a great explanation by example in this resource here regarding demand for a service and that service’s cost. By using a fixed effect model, you can look at correlations over time within a given subject or observation.

Suppose I took a bunch of homeless kids and looked at the relationship between the calories they consumed and their self-reported happiness. I do a regression and I find, surprisingly, that even among kids with an unusually high chance of being malnourished, calories are inversely correlated with self-reported happiness – the more calories, the lower the happiness. But we know that different kids have different metabolisms and different caloric needs. So now I take multiple observations of the same kids. I find that for each individual kid, rising caloric intake is positively correlated with happiness. Kids who consume less calories might be happier, but the idea that lower calories causes higher happiness has proven to be an illusion. Looking longitudinally shows that for each kid higher calories are associated with higher happiness. That’s the benefit of a fixed effect model. Make sense?

The authors of this study use a panel data (read: contains longitudinal data) fixed effects model as an attempt to confront the obvious confounds here. As they say, most prior research is simply correlational, using cross-sectional approaches that merely compare incidence of suspensions to academic outcomes. By introducing longitudinal data, they can use a fixed effects model to look at variation within particular students, which helps address endogeneity concerns. Do I understand everything going on in their model, statistically? I most certainly do not. So if I’ve gotten something wrong about how they’re attempting to control endogeneity with a fixed effects model, please write me an email and I’ll run it and credit you by name.

The findings

What the authors find, having used their complex model and their longitudinal data, is counterintuitive and goes against the large pool of correlational studies: students who receive serious disciplinary actions don’t suffer academically, at least in terms of test scores, when you control for other variables. In fact there are statistically significant but very small increases in test scores associated with serious disciplinary action. This is true for both math and language arts. The effects are small enough to not be worth representing positively, in my view. (This is why we need to report effect sizes.) But still, the research directly cuts against the idea that serious disciplinary action hurts test scores.

This is just one study and will need replication. But it utilizes a very large, high-quality data set and attempts methodologically to address consistent problems with prior correlational research. So I would lend a good deal of credence to its findings. The question is, what do we do with it?

Keeping results separate from conclusions

To state the obvious: it’s important that we do research like this. We need to know these relationships. But it’s also incredibly important that we recognize the difference between the empirical outcomes of a given study and what our policy response should be. We need, in other words, to separate results from conclusions.

This research occurs in a period of dogged fixation on test scores by policy types. This is, as I’ve argued many times, a mistake. The tail has clearly come to wag the dog when it comes to test scores, with those quantitative indicators of success now pursued so doggedly that they have overwhelmed our interest in the lives of the very children we are meant to be advocating for. And while I don’t think many people will come out and say, “suspensions don’t hurt test scores, so let’s keep suspending so many kids,” this research comes in a policy context where test scores loom so large that they dominate the conversation.

To their credit, the authors of this study express the direct conclusion in a limited and fair way: “Based on our results, if policymakers continue to push for changes to disciplinary policies, they should do so for reasons other than the hypothesized negative impacts of exclusionary discipline on all students.” This is, I think, precisely the right way to frame this. We should not change disciplinary policy out of a concern for test scores. We should change disciplinary policy out of a concern for justice. Do the authors agree? They are cagey, but I suspect they show their hands several times in this research. They caution that the discipline reform movement is leading to deteriorating “school climate” measures and in general concern troll their way through the final paragraphs of their paper. I wish they would state what seems to me to be the most important point: that while we should empirically assess the relationship between discipline and test scores, as they have just done admirably well, the moral question of discipline reform is simply not related to that empirical question. When it comes to asking if we’re suspending too many kids, test scores are simply irrelevant.

I am not a “no testing, ever” guy. That would be strange, given that I spend a considerable portion of my professional life researching about educational testing. I see tests as a useful tool – that is, they exist to satisfy some specific pragmatic human purpose and are valuable to us as long as they fulfill that purpose and their side effects are not so onerous that they overwhelm that positive benefit. As I have said for years, “no testing” or “test everyone all the time” is a false binary; we enjoy the power of inferential statistics, which make it possible to know how our students are doing at scale with great precision. And since relative standardized testing outcomes (that is, individual student performance relative to peers) tend to be remarkably static over life, we don’t have much reason to worry about test data suddenly going obsolete. Careful, responsibly-implemented random sampling with stratification can give us useful data without the social and emotional costs on children that limitless testing imposes. No kids lie awake at night crying because they’re stressed about having to take the NAEP.

The only people who are harmed by reducing the amount of testing in this way are the for-profit testing companies and ancillary businesses that suck up public funds for dubious value, and the politically motivated who use test scores as an instrument to bash teachers and schools. Whether you see their interests as equal to those of the people most directly affected – the students who must endure days of stress and boredom and the teachers who must turn their classes into little more than test prep factories – is an issue for you and your conscience.

Ultimately, the conclusions we must draw about the use of suspensions and other serious disciplinary actions must be moral and political in their nature. As such, good empiricism can function as evidence, context, and support, but it cannot solve the questions for us. To their credit, the researchers behind this study conclude by saying “as we seek to better understand these relationships, we must also consider the systemic effects.” Though I might not reach the same political conclusions as they do, I agree completely.

Many thanks to the American Prospect‘s outstanding education reporter Rachel Cohen for bringing this study to my attention.

lots of fields have p-value problems, not just psychology

You likely have heard of the replication crisis going on, where past research findings cannot be reproduced by other researchers using the same methods. The issue, typically, lies with p-value, an essential but limited statistic that we use to establish statistical significance. (There are other replication problems than just p-value, but that’s the one that you read about the most.) You can read about p-value here and the replication crisis here.

These problems are often associated with the social sciences in general and the fields of psychology and education specifically. This is largely due to the inherent complexities of human-subject research, which typically involves many variables that researchers cannot control; the inability to perform true control-grouped experimental studies due to practical or ethical limitations; and the relatively high alpha thresholds associated with these fields, typically .05, which are necessary because effects studied in the social sciences are often weak compared to those in the natural or applied sciences.

However, it is important to be clear that the p-value problem exists in all manner of fields, including in some that are among the “hardest” of scientific disciplines. In a 2016 story for Slate, Daniel Engber writes of much cancer research, “much of cancer research in the lab—maybe even most of it—simply can’t be trusted. The data are corrupt. The findings are unstable. The science doesn’t work,” because of p-value and associated problems. In a 2016 article for the Proceedings for the National Academy of Sciences of the United States, Eklund, Nichols, and Knutsson found that inferences drawn from fMRI brain imaging are frequently invalid, sharing concerns voiced in a 2016 eNeuro article by Katherine S. Button about replication problems across the biomedical sciences. A 2016 paper by Erik Turkhemier, an expert in genetic heritability of behavioral traits, discussed the ways that even replicable weak associations between genes and behavior prevent researchers from drawing meaningful conclusions about the relationship between genes and behavior. In a 2014 article for Science, Erik Stokstad expressed concerns that ecology literature was more and more likely to list p-values, but that the actual explained effects were becoming weaker and weaker, and that p-values were not adequately contextualized through reference to other statistics.

Clearly, we can’t reassure ourselves that p-value problems are found only in the “soft” sciences. There is a far broader problem with basic approaches to statistical inference that affect a large number of fields. The implications of this are complex; as I have said and will say again, research nihilism is not the answer. But neither is laughing it off as a problem inherent just to those “soft” sciences. More here.

I am not, it turns out, the agent of faculty death at Brooklyn College

When I got this job, one of the excitable, obsessive boys at Lawyers Guns and Money, Erik Loomis, announced that I was now a neoliberal administrator bent on destroying the professoriate. Now, this has much less to do with my actual job and more to do with the team over there’s ongoing, weird fixation on me – Loomis got tenure the other day and immediately rushed to his blog to talk about me, which is sadder and weirder than I can imagine. (Bear in mind that this is a man who once mocked the quality of my education despite the fact that I got my MA at the university that employs him.) But still, having been here going on eight months now, perhaps it’s time to take stock of his accusations.

I’ve been at my current job since the end of September. I can tell you that, despite some bumps and snags, things have gone pretty well and I’m happy with the position and the job I’m doing. I can also tell you, happily, that the predictions of Erik Loomis have not come true. Not that there was ever much chance of that.

When I went to interview with Brooklyn College, I had already been on the academic job market for almost two years. Though my CV was a perfect match for the job, I was hesitant. My position is administrative, and I am on record as believing (and still believe) that there’s too much administrative hiring in the academy. (Of course, I wanted a tenure track job more than anything, but I couldn’t get one in two years of trying.) Plus, assessment is a touchy subject, an endeavor that if undertaken clumsily and without faculty oversight can indeed erode faculty control. So when I went on my campus interview I reiterated a point to the hiring committee that I had made in my Skype interview: that I would accept the position only under the condition that the job would be mutually understood to be a faculty support position. I said this to the hiring committee; I said it to the Associate Provost who would go on to be my boss. They assured me this was what they were looking for, and they have held up their end of the bargain.

Faculty control just about every step of the assessment process here. Faculty write the mission statements for their departments. Faculty devise the student learning outcomes. Faculty decide on the assessment tools they want to use. Faculty decide how best to analyze the data. Faculty ultimately decide what the data means and what changes to make because of it. Assessment involves shared governance between faculty and administrators, but the particulars of how given departments are assessed are firmly in the control of faculty, and should be.

What do I do? I take on a lot of the grunt work that faculty don’t want to. When faculty are crafting student learning outcomes, I give them advice about what I think will make learning easily measurable, when asked. When faculty choose particular tools like tests, portfolios, or surveys, I talk to them about what some of the options are and what I think is most pragmatically feasible, when asked. I research what other institutions do and lay out what are commonly thought of as best practices for a given field. I do a lot of the busywork for actual data collection and analysis – I wrangle spreadsheets, I organize shared folders on servers, I assign numbers to anonymized student work, I let department chairs know when documents have been turned in. Sometimes, I’m the one that does the stats work, again only if asked. I don’t insist on doing any of this, and there are departments here that choose to handle all of that themselves. They just send me reports when they have them, and that’s fine. Other departments have asked me to do a lot of the heavy lifting, out of concern for the workload of faculty who are already stretched far too thin by teaching and research requirements. I’m happy to help when they do. The point is that in every way that matters, it is faculty who ultimately control the assessment process. And while I work underneath the Provost’s office, I report to an Academic Assessment Council where faculty have a substantial ability to dictate policy.

Maybe the most important point is that, regardless of your take on my character or my commitment to faculty independence, I’m just not important enough here to do the kind of destructive work Loomis claimed. I don’t have that kind of power. Brooklyn College, I’m happy to say, has an unusually powerful faculty. Curricular decisions, to a rare degree, are made by the professors. It helps that it’s a public school in a state where public sector workers are powerful. We also have an activist faculty union – a union that I’m a member of. Despite Loomis’s contention that I would work against the union, I have in fact been active in the PSC from the start; I’ve attended every meeting of my own chapter since I arrived, and I’m starting to get involved in the Brooklyn College chapter too. I hope to get organize during our upcoming contract fight. In any event, trust me: no one will soon be made to bow down before my great power here, and I would never attempt to do so because of my academic beliefs, my investment in labor solidarity, and my conscience.

Put it this way: I’m sure there are many faculty members here who don’t know I exist. All-conquering administrators should be far less anonymous.

There were other complaints. Loomis says I’ll be a provost someday. But, of course, I wouldn’t ever take such a job. I know because I’m me. I have no interest in that. I could stay in this position permanently and, thanks to the benefits and a collectively bargained contract, feel pretty good about that. Or I may in the future look for other positions within CUNY, as appropriate. Who can say? But I will never be looking at executive jobs because I am not interested in doing so. Commenters insisted that I’d be overpaid, but I am in fact at the exact same salary bumps as CUNY faculty of equivalent experience. That was a selling point for me: it helps to know that I share compensation levels with professors. I just can’t get tenure. So if you’re saying I’m overpaid, you’re saying that CUNY professors are overpaid which, well, that’s a remarkable idea given our endless contract battles and the precarious state of our funding.

Ultimately my job is like a lot of jobs: it’s not perfect, it can be frustrating, but I can see real ways that I’m helping the larger community. Faculty that I’ve worked with have been universally cordial, and I’ve enjoyed helping them develop assessment plans for their departments. Besides: this work is going to be done. The question is whether it’s done well and whether it’s done in a way that is minimally invasive to faculty. The fact is that assessment is inevitable, particularly in large public systems. The accreditation agencies mandate (and have always mandated) regular assessment. And for reasons I won’t get into, in recent years Brooklyn College has been under immense pressure to improve our assessment efforts for accreditors. You can lament the impact of accreditation agencies but they are a fact of life. Another fact of life is that a lot of faculty simply don’t want to do the kind of work that I do. I can’t blame them! They’re already brutally overworked. That’s why my job exists, so that I can use my expertise and experience in assessment of student learning to take on some of the inevitable burden that is coming down from the college, from CUNY, from the state, from our accreditation agency, and from the feds. Is that worth the cost of my salary? I can’t possibly be the one to judge. Paying my rent depends on my believing that it is worth it. Members of this community will just have to judge for themselves.

It happens that I also think there is a profound social justice component to assessment writ large – that an American higher education system that leaves millions of students with loan debt but no degree needs to take a hard look at its learning systems to come together, as a community, and figure out how to fix things, not in a way antagonistic to faculty but with faculty as the inevitable and essential leaders of such a project. But that’s a bigger issue and one for another day.

None of this, of course, will matter to Loomis. I could have gotten a job that perfectly matched with his politics – say, Assistant Professor of Centrist Democrat Studies at Rahm Emanuel University – and he would have been mad. But it matters to me. I got a good job at a great college in a wonderful city, and I’m slowly becoming part of a community of teachers and researchers that I respect and admire. I’m thrilled to have it. It’s not perfect but I’m making the most of it. And I’m so grateful to be here.

quick and dirty: economic inequality and test scores

Is there a relationship between a country’s performance on international education benchmarks like the PISA tests and that country’s economic inequality as measured by the Gini coefficient?

Math
Reading
Science

Sure looks like it! Those are some healthy correlations there. (Lower Gini = lower inequality.) The math plot in particular is striking. I’m sure there’s noise here and if I get scolded by somebody I’ll update the post.

Of course, this invites the classic question about the arrow of causation in education: are these societies more equal because they have better education? Or are their education results better because their economies are more equal? You can probably guess what I think.

Plots by me. Data: PISA, Gini.

Update: But take care because, as Scott Alexander points out to me, measures of inequality are hard on correlational study, so don’t take this too seriously. I’m gonna expand the scope of the analysis and see what we can see, but like I said – quick and dirty, so don’t hold me to it!

from the archives: physical restraint as the least bad option

This piece originally appeared on my blog in July of 2014.

I have seen now some dozen people share this ProPublica map, about the use of restraining holds on school children, on various social networks and websites. It makes me sad, because this issue is sad. But the kind of reactions that are being provoked also make me sad, because they demonstrate the ways in which the world of sharing and likes and shallow understanding destroys nuance and creates a bogus conception of a black-and-white world.

It happens that I have some experience in this regard. For about a year and a half, I worked in a public school that had a special, segregated section for kids with severe emotional disturbance. Some of the students were significantly mainstreamed into the general ed population, but many couldn’t be, as they posed too much of a risk to other students and to themselves.

Those risks were neither hypothetical nor minor. The more severe of these cases were children who typically could not last a single school day without inflicting harm on themselves or on others. I have personally witnessed a 10 year old lift his 40-pound desk from the floor and hurl it towards the head of another student. I have witnessed a student jump from her seat to claw and bite at another, with almost no provocation. I have seen kids go from seeming calm to punching other kids repeatedly in the back of the head without warning. The self-harm was even worse. I had to intervene when a child, frustrated with his multiplication homework, struck himself repeatedly in the face with a heavy fake gold medallion, to the point where he drew his own blood. I saw a student try to cut his own lip with safety scissors. I saw a girl tear padding from a padded wall and eat it; when she eventually had to be removed from the school via ambulance, she urinated on herself, rubbed her face with her urine, and attempted to do the same to paramedics.

Mental illness is powerful and terrible and that’s the world we live in.

Part of the response to this kind of behavior was restraint. I didn’t enjoy doing it; none of the staff did. Hated it, in fact. We were all trained in how to provide restraint as safely as possible, but that didn’t mean we were under any illusion: we knew that these techniques were uncomfortable and potentially harmful to students. Injuries to staff members were common. A fellow staff member badly broke her tailbone in the process of restraining a child, an injury that left her unable to work for a calendar year. There was something gross about the euphemism “therapeutic hold,” and we talked about the trainings with black humor. I left, after that year and a half or so, because I could not take the emotional toll. There were women there who had been working with such children for over 30 years. I couldn’t make it two. The notion that these women were somehow callous or unconcerned about these children is ludicrous and defamatory. They had dedicated their lives to helping these kids, for terribly low pay. They had to watch these kids grow up and get shipped to the middle school level where there was no similar program. And we were the last stop, for these kids, before the state mental health system. That was the stark choice: if it didn’t work here, the only alternatives were either special private schools, which given that the students were overwhemingly from poverty, was not an option at all, or being committed to the state mental system, which most likely meant institutionalization and constant medication. Those were the stakes.

I have struggled to write about that period of my life for years, as I am still unable to adequately process the emotions I felt. I do know and will loudly say that the women (and besides me they were almost all women) who worked as teachers and paraprofessionals were an inspiration in the true sense, working quietly and without celebration to bring a little education and relief for children who life had treated terribly. They shame me with their dedication. To see them and people like them repeatedly represented as serial abusers who don’t care for if they harm children is infuriating, baseless, and wrong.

The question I have for someone like Heather Vogell, who wrote this sensationalistic and damaging piece for ProPublica, and for all of the people sharing that map with breathless outrage, is this: what alternative would you propose? I am not kidding when I tell you that dozens of times, there was no choice but to physically restrain a child. The only alternative was to allow that child to badly hurt another or him- or herself. If you think that a 7 year old is incapable of badly harming another person, I assure you, you’re wrong. I have seen many people arguing that there is never a situation where such restraint is necessary, and all I can say is that you’re ignorant, and that your ignorance is dangerous. To say that all children can be verbally calmed in all situations is to betray a stunning lack of understanding of the reality of childhood mental illness. Vogell mentions in passing that there are situations in which restraint is necessary, then spends thousands of words ignoring that fact. At every time when she is faced with a journalistic or stylistic choice, she opts for the most sensationalistic and unsympathetic presentation possible, minimizing the other side and failing to even pretend to have genuinely wrestled with the topic before coming to a conclusion. It’s not just that she insults thousands of nameless, faceless public servants who no capacity to fight back or even be seen as potentially-sympathetic human beings. It’s just lousy journalism, written for a clickbait culture, utterly credulous to one set of opinions and utterly dismissive of another. It’s an embarrassment.

Meanwhile, childhood mental illness continues to wreak its terrible havoc, and educators will be forced to make terrible choices. I hated restraining those children, but I saw with my own two eyes the incredible violence that mental illness made possible, and I do not for one minute regret properly restraining children when that was the only way to save that child or another from bodily harm. I invite Vogell, or any of the people loudly expressing their outrage, to take jobs in special education or child mental health services. You can actually get involved, you know. See it with your own eyes. Help actual human lives get a little bit better. See what choice you’re able to make when it is clear that you must intervene or allow injury to another person. But I’m afraid that takes more time and effort then launching a tweet.

Years from now, when people like Vogell are no longer wasting a second of their time thinking about physical restraint of children who are a danger to themselves and others, the women in my old program will be working, quietly and selflessly and for awful compensation, trying to help the children they are now accused of abusing.