The CLA+ and the Two Cultures: Writing Assessment and Educational Testing

Whenever I start a sentence “My dissertation…,” at least when speaking to a non-academic, I smile a little inside. It makes me think I’m in an episode of 30 Rock. I guess it’s just that I’ve absorbed the ambient cultural  critique that grad students are inherently ridiculous. Like a lot of grads I have a kind of built-in presumption of other people’s antipathy towards what I do. It’s self-defense. Well: I’ll still risk saying that my dissertation is finished, it’s been in the hands of my committee for the last 10 days or so, and I think it’s good. I’m happy with it and I think it talks about important stuff and that it was worth doing.

That’s not universally true. A lot of people find their dissertations are compromised documents, whether because of a lack of resources (such as inadequate sample size) or institutional constraints or disagreements with their advisors or just because it doesn’t turn out the way they thought it would. I know someone who had more than half of her sample drop out over the course of her research, for example. So I feel particularly fortunate that I’ve ended up with a couple hundred pages that I think represents high-quality work about issues that are of pressing importance in our society. And while researching and writing it was often draining and frustrating, and there were several times when I thought I would simply not be able to complete the necessary research, it was also one of the most fulfilling things I’ve ever done. It ate all my time and kept me from a hundred different other projects, but then it’s supposed to; it’s supposed to be an obsession, after all. That enables a kind of deep dive that is very rare for most adults. For example, at one point I literally read six volumes of Eisenhower-era educational reports. It was boring, sure, but it was also great to dive in that way. If you aren’t wired that way (and it’s healthier if you aren’t), that might not make sense to you, but for someone like me it’s a joy.

When I started out, I wasn’t sure that this was a book-type project. The hyper-localism of my original research seemed to limit the audience. But I’ve been encouraged by faculty, who feel strongly that the text has great relevance to other institutions, and the way that the conflict between the Mitch Daniels administration and the faculty over the test played out makes it a very useful lens through which to consider neoliberal higher education reform. As a book, I’d love to make the text more political, to pull in more questions about the future of higher ed in general, and also to make a more direct, critical response to Academically Adrift, which used the CLA as its empirical mechanism and which is rife with methodological and theoretical flaws. Ideally I’d like to make the book an academic-popular hybrid. Once I’m done with revisions and have deposited the dissertation with my institution, I’ll start talking to faculty about potential publishers and writing up a proposal.

Below you’ll find a little information about my project. I don’t blame you if you’re not interested! I defend today at noon.

*****

My dissertation is titled The CLA+ and the Two Cultures: Writing Assessment and Educational Testing. It concerns a standardized test of college learning, the Collegiate Learning Assessment+, and its proposed implementation here at Purdue University. I locate the current higher education assessment movement in a historical context of a perpetual crisis narrative concerning our colleges and universities, demonstrating that the notion of a crisis is adapted to fit contemporary national concerns. So in the Truman era, it’s educating millions of soldiers returning home from war; in the Eisenhower admin, it’s the Red Scare; then the space race; economic competition from Japan and West Germany; in the Reagan era, perceived moral decline; and so on. Crisis becomes the justification for enacting controversial changes. The higher education assessment movement has culminated in the Obama administration’s proposal to generate college rankings based on “value” and to tie those rankings to availability of federal aid, a threat that even the most deep-pocketed colleges can’t afford to ignore.

The Collegiate Learning Assessment+ is one of several tests that is currently competing to become the primary instrument in the creation of such rankings. The CLA+ is a product of the Council for Aid to Education, a New York-based nonprofit that has traditionally researched philanthropic giving to higher education. I will say up front that, though I am skeptical of this type of instrument for reasons I will discuss, I believe that the CAE are the good guys in the educational testing industry. People like Richard Shavelson, Steve Klein, and Roger Benjamin are genuinely committed to improving college education, and while I think they are sometimes misguided in that pursuit, I think their dedication is genuine. That is not always true of people within the educational testing industry, which is big business.

The CLA+ is made up of two major parts, the Performance Task, which is the larger portion of the test and student scores, and the Selected Response section. The Performance Task is a written response by students that places them in a scenario, provides them with several different types of information, and asks them to reach and explain a decision using several different types of evidence. The test is rated by CAE personnel that use a rubric concerning Analysis and Problem Solving, Writing Mechanics, and Writing Effectiveness. The Selected Response, meanwhile, is a fairly conventional set of multiple choice questions concerning critical reading and argument critique. A model Performance Task prompt is below; you can peruse an entire practice test here (PDF).

Performance Task

Like all tests, the CLA+ has its strengths and weaknesses. I find the Performance Task to be a novel and intelligent means to test how different student abilities work together in concert. The CLA+ uses a criterion sampling approach that intends to examine student abilities in concert, as the CAE argues that student abilities cannot be meaningfully de-aggregated into constituent elements. (How that comports with the various divisions in their rubrics is unclear.) I am also pleased that the test evaluates student writing itself. But there are major challenges to this type of test. Typical concerns of scaling, ceiling effects, attrition, natural maturation, and so on apply. No challenge is deeper than the issue of student motivation.

Consider a test like the SAT. Whatever criticisms of the test we might have, we can say with great confidence that most students who take it apply their best effort. They do because they have intrinsic motivation to do so; they want to get into the best possible college. The problem with a great deal of educational testing, and particularly collegiate testing, is that students have no such intrinsic motivation to do their best work. You can (and have to) provide incentives for the students to show up and take the test, typically things like low-value gift cards or discounts on graduation regalia, but this does not ensure that students will work hard. Empirical research demonstrates the threat that this poses to the validity of these tests. A major 2012 study found that students who were told their test results were high-stakes and would follow them in their later lives significantly and consistently outperformed students who were not. Differences in motivation therefore present a serious confound for interpreting these results. Other research (PDF) indicates that time on task, a crude but effective proxy for motivation, has a significant impact on student scores, with many students using far less than the maximum time allotted. That is particularly concerning given that the CLA+ is intended to demonstrate “value added” through a test-retest mechanism, and we can conjecture that graduating seniors will be even less motivated to apply their best effort than incoming freshmen. Even Benjamin, the president of the CAE, has admitted that this motivation issue is a major challenge to the validity of these instruments.

These types of challenges are part of what has been at issue in a major conflict between the faculty of Purdue University, my institution, and Mitch Daniels, former Republican governor of Indiana, presidential candidate, and current university president. Daniels has been controversial from the start of his presidency, as is perhaps to be expected with a lifelong politician with essentially no academic expertise or experience. Daniels’s selection by the Board of Trustees — many of whom he nominated himself when serving as governor — was seen by many as part of an ongoing  corporate takeover of the American university. Controversies like the Howard Zinn debacle haven’t helped. Still, there have been aspects of the Daniels presidency worth praising. For example, his efforts to freeze tuition are abundantly necessary in an era of exploding student loan debt, and his work to consolidate various administrative functions and in doing so reduce payroll are a necessary element of stopping the rampant rise of administrative bloat in higher education. And while I disagree with Daniels about many aspects of education, I do believe that his efforts to enact assessment at the university stem from a genuine belief that  doing so is in the best interest of our students.

How to assess, however, is the question, and that is the issue that has caused a long-simmering conflict between the faculty and Daniels to turn into a boil. That conflict is laid out here in this story from our local newspaper. At issue is not just the specifics of the assessment effort but the notion of faculty control of curriculum and, in the larger sense, of the university. The administrative wrangling of the implementation of the CLA+ or a test like it is the focus of my original research. For the past year, I have investigated what’s been going on at the university, what the fight is about, what the sides are, what the stakes are, and what it all says about the future of the American university. I’ve conducted interviews and collected texts and assembled information, working more as a journalist than as a typical academic researcher. I’ve gone through it all because I feel strongly that the  research of higher education is far too biased towards the bird’s eye view, with many books and articles concerning policy decisions made at 10,000 feet but far too little examining how these changes actually occur in real, local institutions. I wanted to know: what’s the distance between the rhetoric at the top and the reality in the local world?

It’s been a remarkable investigation, and a remarkably frustrating one. That’s because I’ve been met with a great deal of avoidance and obstruction. Though I talked with dozens of people in the university community casually, only a small handful consented to undertaking a formal interview with me, with a couple others agreeing to discuss with me only under the condition of anonymity. (I admit that it’s something of a thrill to write a dissertation that include phrases like “a senior administrator said under the condition of anonymity…”.) While I was able to assemble all of the information I needed, and in fact gained access to a great deal of material that was initially confidential, I was repeatedly frustrated by my inability to get people to talk to me. Both those on the administrative side and the faculty side seemed to find little reward and potential risk in speaking to a researcher like me. For example, despite repeated requests, the only direct communication I ever received from Daniels came in the form of a brief email. This is one of the more important findings of my dissertation: that while these assessment efforts are represented as matters of accountability, they lack transparency, making those claims somewhat toothless. This isn’t just true of institutions like Purdue, but of test developers like the CAE, which jealously guard the secrets of their tests and prevent us from taking a real, close, independent look at their mechanisms. How can we trust that their tests do what they say they do when they  constantly cry “test security” or “industry secrets” when we ask to look under the hood?

I am happy with the history that I’ve assembled, but in my research — much of which I have recently  deleted, out of a somewhat paranoid fear that I will be compelled to reveal it — there is a much deeper, more inflammatory story. Due to reasons of ethics and institutional policy, I am not able to reveal some of the things that I learned in the course of my research from interested parties who were willing to share information with me but never to go on the record. For example, faculty members and administrators alike shared private memorandums and emails with me, some of which contained frank language that would be quite embarrassing to those involved if made public. I have no interest in causing such embarrassment. I do wish that more people involved had been willing to let me go on the record, as the distance between the carefully managed outward formality common to institutions of higher learning and the private communications about the same says a lot about institutional culture.

The ultimate decision about the assessment effort at Purdue has been delayed, but there is no doubt that assessment is coming, both to Purdue and to many other institutions of higher education. How we assess, and how assessment drives institutional and pedagogical change, is the key question. I’m often asked if I’m pro or anti the CLA+; many people, meanwhile, assume that I am blanket opposed to such instruments, given my thoughts on ed reform writ large. That’s not the case. The real answer is, it depends. It depends on how the test is interpreted. If it is viewed with appropriate skepticism, if it guides decisions that are institutional rather than individual, if it is taken as one piece of much more varied ways to assess how well we are doing, such as the Purdue – Gallup Index, then I’m fine with the test. Whatever else is true, there is little question that it is perfectly legitimate to ask how well we are doing and how much our students are learning. The problem is that too often these conversations devolve into  misleading, reductive, and politically-motivated arguments like that of Academically Adrift. Compared to some other tests, I find the CLA+ a useful instrument and its developers principled. The implementation is all.

Writing studies is particularly vulnerable in this conversation. For a long time, writing studies featured a robust wing of empirical research, alongside its theoretical, political, pedagogical, and aesthetic work. My paternal grandfather was a member of the field, or of the proto-field, and he published a great deal of empirical research alongside more traditional English scholarship. But that wing of writing studies has shrunk considerably, largely due to the “social turn” of the 1990s, in which scholars like Carl Herndl, Elizabeth Flynn, and James Berlin argued against empiricism as inherently masculinist, hegemonic, or similar. Empirical research examining the contents of our biggest journals and the programs of our largest conferences has demonstrated that the field has largely abandoned empiricism, and along with it the specific focus on student writing and writing pedagogy that is our traditional purview. We instead publish a tremendous amount of work on cultural studies, pop culture, digital theories, and multimodality, leaving traditional writing pedagogy and empirical investigations of the same to marginal status. I have written a great deal about this recently.

I am part of a movement within the field to return to more empirical work (whether qualitative or quantitative) and to more writing pedagogy in the traditional sense of prose, of putting words into order in a way to  satisfy one’s communicative and argumentative needs. The CLA+ and tests like it are part of the reason why. What my research has demonstrated to me is that we are totally marginal in these processes. We do not  contribute meaningfully  to these debates, largely because so many of us refuse to engage in the discourse of empiricism. While I lament the ways in which empiricism is defined reductively or simplistically, and I wish that there was more room for traditional humanistic ways of meaning in policy debates, the fact is that the discourse of the social sciences is privileged in that environment. We have used that discourse in the past, and we can again. Doing so does not mean abandoning our values or our political commitments. It means instead utilizing our best rhetoric, matching our discourse to our context in order to defend our institutional autonomy and integrity. I firmly, firmly believe that we can engage in policy conversations, using the language and assumptions of the social sciences and empiricism, in a way that ultimately strengthens our discipline and protects other types of work. But we must be willing to pull our heads out of the sand and to see the reality around us.

The two cultures named in my title are the cultures of writing assessment and educational testing, traditionally divided by these epistemological, political, and theoretical worldviews. I do not believe that we must surrender to the educational testing industry. Instead, I believe that we can, and must, adapt our research to make it a more effective check on that industry, to meet empiricism with better empiricism, and to refuse to cede that ground to for-profit entities. If we do that, we can operate more effectively both within our institutions and in the national policy conversation. The humanities can, in fact, be defended. The doors are not  completely closed to us. But if we self-marginalized, we will certainly be silenced.

The higher education assessment movement might peter out. Such policy initiatives have a way of being enormously important and then suddenly forgotten. But the broader questions of who owns the university, how we can save it from itself, and what the future holds will not go away. This dissertation has been an attempt to write my own  little chapter in that long book, and I’m so happy to have written it, and if you’ll forgive me, to have written it well.

*****

A chapter-by-chapter breakdown and the Table of Contents are printed below.

Chapter One provides an overview of my study and establishes exigency for this project by placing it into a socioeconomic and political context. By situating my project within Purdue University, writing studies, and higher education, I argue that college educators must study tests like the CLA+ in order to respond to the unique challenges and opportunities such tests represent.

Chapter Two provides an in-depth history of the higher education assessment movement. I place the recent push for standardized assessment of higher education in a historical framework, explaining the recent and historical policy initiatives that have led us to this current moment. I describe how a crisis narrative has taken root in the public conception of higher education, and demonstrate that from era to era, the crisis narrative is perpetuated to meet particular political needs. I demonstrate how recent changes to the American economy  contribute to both this narrative and the perceived need for standardized assessment of college learning.

Chapter Three considers the CLA+ in great depth, discussing its history, its assessment mechanisms, its competitors and analogs, and the extant empirical research conducted using it. I consider the test’s context among other tests of secondary and post-secondary education, consider the strengths and weaknesses of its approaches to assessment, and discuss the policies and procedures that its developer enacts around its implementation. I discuss possible challenges to the validity and reliability of the instrument and the ways in which the test attempts to measure “value added.”

Chapter Four uses the CLA+ and higher education assessment movement to consider the traditional cultural and epistemological divide between the field of writing studies and the field of educational testing. I provide a brief history of practitioner writing assessment, and describe the differences in how writing instructors and researchers have typically cast concepts such as validity and reliability when compared to the educational testing community. I investigate the traditional sources of this cultural divide, and detail some of the consequences, particularly in terms of the (in)ability of writing studies to influence policy arguments. I ultimately argue that the true conflict is within writing studies, regarding its long turmoil about the appropriate place of epistemology in the discipline.

Chapter Five develops a local history of the assessment effort at Purdue University, detailing the rise of the Mitch Daniels administration and its extensive controversies. I examine the selection of Daniels as Purdue president, his many reforms on campus, and the development of what would become the CLA+ assessment effort. I interview multiple stakeholders and detail various perspectives from faculty, administrators, and other Purdue community members. I present information about the piloting efforts undertaken by the Office of Institutional Assessment as part of the assessment effort. I discuss the conflict that arose between the faculty senate and the Daniels administration over the test, and what that conflict says about higher education assessment writ large.

Chapter Six concludes the dissertation and presents my perspective on the various issues contained within it. I discuss the dangers that the current state of higher education presents to writing studies, the humanities, and the American university system itself. I claim that the lack of transparency in the development and implementation of standardized assessments undermines claims that these are accountability systems and reduce public information about high-stakes, high-expenditure systems within  education. I argue that scholars in writing studies must become more conversant in the techniques of empiricism, social science, statistics, and educational testing, in order to defend our traditional values and institutional autonomy, in a hostile political and policy environment.

TABLE OF CONTENTS

ABSTRACT

CHAPTER 1: INTRODUCTION
A Growing Movement for Change
The Assessment Mandate
The Role of Writing
Understanding the Present, Facing the Future
Statement of the Problem
Data & Methods
IRB
Textual/Archival
Interviews
Chapter Summaries

CHAPTER 2: THE HIGHER EDUCATION ASSESSMENT MOVEMENT

Truman, Eisenhower, Kennedy: Three Reports
A Nation at Risk
Response From Accreditation Agencies
The Spellings Commission
The Obama Administration
Conclusions 

CHAPTER 3: HISTORY AND THEORY OF THE COLLEGIATE LEARNING ASSESSMENT

Early Precursors
The Old Standards: The GRE and Similar Entrance Exams
The Council for Aid to Education
The Collegiate Learning Assessment
The Performance Task
The Analytic Writing Section
From CLA to CLA+
Validity
Reliability
Criterion Sampling and Psychometric Assessment
The CLA and the SAT: Is Another Test Necessary?
The Slippery Measurement of Value Added
Future Directions

CHAPTER 4: THE TWO CULTURES

A Brief History of Practitioner Writing Assessment
Sources of Friction
The Higher Education Assessment Movement and the Two Cultures
The Contested Role of Quantification in Writing Studies
The Road Ahead: Reasons for Optimism?

CHAPTER 5: LOCAL CONTEXT, LOCAL CONTROVERSIES

Local Contexts
Previous Assessment: Accreditation
A Controversial Catalyst: the Administration of Mitch Daniels
Perceived Needs and the Foundations of Excellence Plan
Identified Issue: Administrative Redundancy
Identified Issue: A Campus Divided
An Early Reform: the Core Curriculum
The Initial Assessment Push
The Roots of Conflict
Piloting
Initial Results
Internal Skepticism
Faculty Resistance
Was the CLA+ Preordained?
Buying Time
The Road Ahead

CHAPTER SIX: CONCLUSIONS
Some Form of Assessment is Likely Inevitable
Critical Thinking Measures are Inadequate
Accountability Cuts Both Ways
Writing Studies Must Adapt to Thrive

BIBLIOGRAPHY

APPENDICES

VITA