the higher education assessment sour spot

As I’ve mentioned before, I’m a qualified supporter of the Collegiate Learning Assessment+, the Council for Aid to Education’s standardized test of college learning that is the subject of my dissertation. For awhile now, I’ve been poking away at a post about why; it sometimes disturbs my fellow education reform skeptics to hear that I am supportive in the use of a standardized test (as long as that test is used with critical, careful understanding). I hope to finish that piece before the new school year starts, so you can get a better idea of my thinking. Suffice is to say for now that, like Richard Shavelson, one of the developers of the CLA, I don’t think that any one test can tell us everything we need to know about college learning, but that some tests can tell us some things of interest, and that there are reasons to believe the CLA+ is superior to some alternatives. In the meantime, I do want to mention one pitfall not just for the CLA but for any standardized test: the low student stakes/high institutional stakes trap.

One of the foremost criteria for any test instrument is that test’s validity. In simple terms, validity refers to whether a test measures that which it purports to measure. (This straightforward notion of validity is now often referred to as “face validity.”) There is a vast literature on the various kinds of validity and how to assess them, and that kind of meta-research is some of the most fascinating and complex I’ve read. But even aside from the grander questions, validity is important for everyone whose life is impacted by a test. We need to feel confident that a test measures what it is understood to measure.

One question in test validity is the question of student motivation. When we give a test, we want students to work at the best of their ability; otherwise, we introduce construct-irrelevant variance, which undermines our ability to interpret the test’s results. In many or most educational contexts, student motivation isn’t a problem: because tests help determine grades, and grades have direct stakes for students, we can generally assume that students are trying their hardest. Similarly, voluntary tests of academic or intellectual aptitude like the SAT, GRE, or LSAT generally are only taken by those who are motivated to score highly. Someone with no interest in attending graduate school would be very unlikely to take the GRE, while someone who is intent on attending graduate school would try their hardest. (Whether students are trying their hardest on the vast number of standardized tests now being implemented in our K-12 schools is a question I leave to you to ponder.)

A test like the CLA+, currently, is not like that. The CAE has talked at length about their hopes that the CLA+ will become a recognized standard for employers and graduate schools (here’s their information for employers), but at present, it’s unlikely that there is much advantage for students putting their CLA+ scores on their resumes, or much chance that a particular employer would know how to interpret those scores. A certain critical mass of students and institutions participating would have to be reached before the potential benefit to students on the job market is realized. This difficulty is compounded by the fact that there are competitors to the CLA+, other tests of college student learning that could potentially be adopted by colleges themselves. (The Spellings Commission report, though it mentions the  CLA by name, only calls for “interoperable” test measures, not one universal test of college learning.) Currently, colleges often have to provide some sort of incentive for their students to take the test, such as discounts for graduation or similar. As it stands, I think most anyone would conclude that the CLA+ is a  low-stakes test for students.

And yet, if the Obama administration gets its way, the test will have high stakes for colleges and universities. As has been much-discussed, the Obama White House has called for the creation of a set of national college rankings, based on which schools do the best job teaching undergraduates and which provide the most “value.” Assessments like the CLA+ are to be a key part of the creation of the rankings. Those rankings, in turn, will be tied to how much federal aid and subsidies colleges are able to access. While we can debate the wisdom or efficacy of this plan, or the values and conceptions of education that are implicit in these rankings, most anyone would say that this makes the test high-stakes for institutions.

That low-stakes/high stakes divide represents a challenge to the fair use of the test, particularly given that student perception of the stakes involved has a direct impact on student performance. In 2010, Braden Hosch, an administrator at Central Connecticut State University (my alma mater!), published a study on the administration of the CLA at CCSU. He found that student motivation played a strong role in determining test scores, and that strong student motivation was not universal. Last year, a major study by researchers from the Educational Testing Service demonstrated that motivation made a large impact on performance on ETS’s Proficiency Profile, one of those competitors to the CLA+. The researchers told one group of students that their test results would be linked to them in the future, that their professors and college would have access to this data and use it to assess them. Those students performed consistently and significantly better than those who were not told that the test’s results would follow them. Clearly, then, a student’s perception of a test’s importance plays a strong role in their test scores.

We can therefore easily imagine a “sour spot” for this type of assessment. Students could, sensibly, continue to see the test as an unimportant task for their own lives, while institutions could face serious consequences if their students don’t perform to the peak of their ability. Since the CLA+ is a value-added metric, this problem would be particularly acute if seniors take the test less seriously than freshmen do. Given the tendency of freshmen to be so malleable and gung-ho in comparison to upperclassmen — I’ve often joked that first-semester freshmen would consent to washing my car without blinking an eye, if I put it on a syllabus — that’s a legitimate concern. This difference in the intrinsic stakes for these tests between students and institutions is one of my foremost fears. It could cause public policy to go wrong in a very serious way.

The easiest way to ameliorate this problem is for administrators, politicians, and policy makers to maintain an appropriate skepticism towards this kind of test in general, and to see such assessments as only one part of a broad perspective on what a college does and should do. That type of probity, I’m sorry to say, can be hard to find in a politicized educational environment.


  1. I’d go a step further: it would be very weird for a student to try hard at a long, dull, standardized test does that does not impact his/her individual GPA. Very weird. Like, send ’em-to-the-campus-psychologist level weird.

    And the assessment lobby’s whole premise is that EVERY student is this weird?

    1. It might, but that would entail choosing one winner of a test in the way the Spellings Commission said the federal government wouldn’t do, which means all types of procedural and infrastructural problems, in my opinion.

Leave a Comment

Your email address will not be published. Required fields are marked *