Back in high school, I was a pretty classic example of a kid that teachers said was bright but didn’t apply himself. There were complex reasons for that, some of them owing to my home life, some of it my failure to understand the stakes, and some of it laziness and arrogance. Though I wasn’t under the impression that I was a genius, I did think that in the higher placement classes there were people who got by on talent and people who were striver types, the ones who gritted out high grades more through work than through being naturally bright.
This is, of course, reductive thinking, and was self-flattery on my part. (In my defense, I was a teenager.) Obviously, there’s a range of smarts and a range when it comes to perseverance and work ethic, and there are all sorts of aspects of these things that are interacting with each other. And clearly those at the very top of the academic game likely have both smarts and work ethic in spades. (And luck. And privilege.) But my old vague sense that some people were smarties and some were grinders seems pervasive to me. Our culture is full of those archetypes. Is it really the case that intelligence and work ethic are separate, and that they’re often found in quite different amounts in individuals?
Kind of, yeah.
At least, there’s evidence for that in a recent replication study performed by Clemens Lechner, Daniel Danner, and Beatrice Rammstedt of the Leibniz Institute for the Social Sciences, which I will talk about today for the first Study of the Week, and which I’ll use to take a quick look at a few core concepts.
Construct and Operationalization
Social sciences are hard, for a lot of reasons. One is the famously complex number of variables that influence human behavior, which in turn makes it difficult to identify which variables (or interactions of variables) are responsible for a given outcome. Another is the concept of the construct.
In the physical sciences, we’re general measuring things that are straightforward facets of the physical universe, things that are to one degree or another accessible and mutually defined by different people. We might have different standards of measure, we might have different tools to measure them, and we might need a great deal of experimental sophistication to obtain these measurements, but there is usually a fundamental simplicity to what we’re attempting to measure. Take length. You might measure it in inches or in centimeters. You might measure it with a yard stick or a laser system. You might have to use complex ideas like cosmic distance ladders. But fundamentally the concept of length, or temperature, or mass, or luminosity, is pretty easy to define in a way that most every scientist will agree with.
The social sciences are mostly not that way. Instead, we often have to look at concepts like intelligence, reading ability, tolerance, anxiety…. Each of these reflect real-world phenomenon that most humans can agree exist, but what exactly they entail and how to measure them are matters of controversy. They just aren’t available to direct measurement in the ways common to the natural and physical sciences. So we need to define how we’re going to measure them in a way that will be regarded as valid by others – and that’s often not an uncomplicated task.
Take reading. Everybody knows what reading is, right? But testing reading ability turns out to be a complex task. If we want to test reading ability, how would we go about doing that? A simple way might be to have a a test subject read a book out loud. We might then decide if the subject can be put into the CAN READ or CAN’T READ pile. But of course that’s quite lacking in granularity and leaves us with a lot of questions. If a reader mispronounces a word but understands its meaning, does that mean they can’t read that word? How many words can a reader fail to read correctly in a given text before we sort them into the CAN’T READ pile? Clearly, reading isn’t really a binary activity. Some people are better or worse readers and some people can reader harder or easier texts. What we need is a scale and a test to assign readers to it. What form should that scale take? How many questions is best? Should the test involve reading passages or reading sentences? Fill in the blank or multiple choice? Is the ability to spot grammatical errors in a text an aspect of reading, or is that a different construct? Is vocabulary knowledge a part of the construct of reading ability or a separate construct?
You get the idea. It’s complicated stuff. We can’t just say “reading ability” and know that everyone is going to agree with what that is or how to measure it. Instead, we recognize the social processes inherent in defining such concepts by referring to them as a construct and to the way we are measuring that construct as an operationalization. (You are invited to roll your eyes at the jargon if you’d like.) So we might have the concept “reading ability” and operationalize it with a multiple choice test. Note that the operationalization isn’t merely an instrument or a metric but the whole sense of how we take the necessarily indistinct construct and make it something measurable.
Construct and operationalization, as clunky as the terms are and as convoluted as they seem, are essential concepts for understanding the social sciences. In particular, I find the difficulty merely in defining our variables of interest and how to measure them a key reason for epistemic humility in our research.
So back to the question of intelligence vs. work ethic. The construct “intelligence” is notoriously contested, with hundreds of books written about its definition, its measurement, and the presumed values inherent to how we talk about it. For our purposes, let’s accept merely that this is a subject of a huge body of research, and that we have the concepts of IQ and g in the public consciousness already. We’ll set aside all of the empirical and political issues with IQ for now. But what about work ethic/perseverance/”grinding”? How would we operationalize such a construct? Here we’ll have to talk about psychology’s Five Factor Model.
The “Big Five” or Five Factor Model
The Five Factor Model is a vision of human personality, particularly favored by those in behavioral genetics, that says there are essentially only five major factors in human personality: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism, sometimes anagrammed to OCEAN. To one degree or another, proponents of the Five Factor Model argue that all of our myriad terms for personality traits are really just synonyms for these five things. That’s right, that’s all there is to it – those are the traits that make up the human personality, and we’re all found along a range on those scales. I’m exaggerating, of course, but in the case of some true believers not by much. Steven Pinker, for example, flogs the concept relentlessly in his most famous book, The Blank Slate. That’s not a coincidence; behavioral genetics, as a field, loves the Five Factor Model because it fits empirically with the maximalist case for genetic determinism. (Pinker’s MO is to say that both genetics and other factors matter about equally and then to speak as if only genetics matter.) In other words the Five Factor Model helps people make a certain kind of argument about human nature, so it gets a lot of extra attention. I sometimes call this kind of thinking the validity of convenience.
The standard defense of the Five Factor Model is, hey, it replicates – that is, its experimental reliability tends to be high, in that different researchers using somewhat different methods to measure these traits will find similar results. But this is the tail wagging the dog; that something replicates doesn’t mean it’s a valid theoretical construct, only that it tracks with some persistent real-world quality. As Louis Menand put it in a New Yorker review of The Blank Slate that is quite entertaining if you find Pinker ponderous,
When Pinker and Harris say that parents do not affect their children’s personalities, therefore, they mean that parents cannot make a fretful child into a serene adult. It’s irrelevant to them that parents can make their children into opera buffs, water-skiers, food connoisseurs, bilingual speakers, painters, trumpet players, and churchgoers—that parents have the power to introduce their children to the whole supra-biological realm—for the fundamental reason that science cannot comprehend what it cannot measure.
That results of the Five Factor model can be replicated does not mean that the idea of dividing the human psyche into five reductive factors and declaring that the whole of personality is valid. It simply means that our operationalizations of the construct are indeed measuring some consistent property of individuals. It’s like answering the question “what is a human being?” by saying “a human being is bipedal.” If you then send a team of observers out into the world to measure the number of legs that tend to be found on humans, you will no doubt find that different researchers are likely to obtain similar findings when counting the number of legs of an individual person. But this doesn’t provide evidence that bipedalism is the sum of mankind; it merely suggests that legs are a thing you can consistently measure, among many. Reliability is a necessary criterion for validity, but it isn’t sufficient. I don’t doubt that the Five Factor Model describes consistent and real aspects of human personality, but the way that Pinker and others treat that model as a more or less comprehensive catalog of what it means to be human is not justified. I’m sure that you could meet two different people who share the same outcomes on the five measured traits in the model, fall madly in love with one of them, and declare the other the biggest asshole you’ve ever met in your life. We’re a multivariate species.
That windup aside, for this particular kind of analysis, I think a construct like “conscientiousness” can be analytically useful. That is, I think that we can avoid the question of whether the Five Factors are actually a comprehensive catalog of essential personality traits while recognizing that there’s some such property of educational perseverance and that it is potentially measurable. (Angela Lee Duckworth’s “grit” concept has been a prominent rebranding of this basic human capacity, although it has begun to generate some criticism.) The question is, does this trait really exist independent of intelligence, and how effective of a predictor is it compared to IQ testing?
Intelligence, Achievement, Test Scores, and Grades
In educational testing, it’s a constant debate: to what degree do various tests measure specific and independent qualities of tested subjects, and to what degree are they just rough approximations of IQ? You can find reams of studies concerning this question. The question hinges a great deal on the subject matter; obviously, a really high IQ isn’t going to mean much if you’re taking a Latin test and you’ve never studied Latin. On the other hand, tests like the SAT and its constituent sections tend to be very highly correlated with IQ tests, to the point where many argue that the test is simply a de facto test for g, the general intelligence factor that IQ tests are intended to measure. What makes these questions difficult, in part, is that we’re often going to be considering variables that are likely to be highly correlated within individuals. That is, the question of whether a given achievement test measures something other than g is harder to answer because people with a high g are also those who are likely to score highly on an achievement test even if that test effectively measures something other than g. Make sense?
Today’s study offers two main research questions:
first, whether achievement and intelligence tests are empirically distinct; second, how much variance in achievement measures is accounted for by intelligence vs. by personality, whereby R2 increments of personality after adjusting for intelligence are the primary interest
I’m not going to wade into the broader debate about whether various achievement tests effectively measure properties distinct from IQ. I’m not qualified, statistically, to try and separate the various overlapping sums of squares in intelligence and achievement testing. And given that the g-men are known for being rather, ah, strident, I’d prefer to avoid the issue. Besides I think the first question is of much more interest to professionals in psychometrics and assessment than the general public. (This week’s study is in fact a replication of a study that was in turn disputed by another researcher.) But the second question is interesting and relevant to everyone interested in education: how much of a given student’s outcomes are the product of intelligence and how much is the product of personality? In particular, can we see a difference in how intelligence (as measured with IQ and its proxies) influences test scores and grades and how personality (as operationalized through the Five Factor Model) influences them?
The Present Study
In the study at hand, the researchers utilized a data set of 13,648 German 9th graders. The student records included their grades; their results on academic achievement tests; their results on a commonly-used test of the Five Factors; and their performance on a test of reasoning/general intelligence (a Raven’s Standard Progressive Matrices analog) and a processing speed test, which are often used in this kind of cognitive research.
The researchers undertook a multivariable analysis of variance analysis called “exploratory structural equation modeling.” I would love to tell you what that is and how it works but I have no idea. I’m not equipped, statistically, to explain the process or judge whether it was appropriate in this instance. We’re just going to have to trust the researchers and recognize that the process does what analysis of variance does generally, which is to look at the quantitative relationships between variables to explain how they predict, or fail to predict, each other. The nut of it is here:
First, we regressed each of the four cognitive skill measures on all Big Five dimensions. Second, we decomposed the variance of the achievement measures (achievement test scores and school grades) by regressing them on intelligence alone and then on personality and intelligence jointly.
(“Decomposing” variables, in statistics, is a fancy way of saying that you’re using mathematical techniques to identify and separate variables that might be otherwise difficult to separate thanks to their close quantitative relationships.)
What did they find? The results are pretty intuitive. There is, as to be expected, a strong (.76) correlation between performance on the intelligence test and performance on achievement tests. There’s also a considerable but much weaker relationship between achievement tests and grades (.44) and the general intelligence test and grades (.32). So kids who are smarter as defined by achievement and reasoning tests do get better grades, but the relationship isn’t super strong. There are other factors involved. And a big part of that unexplained variance, according to that research, is personality.
The Big Five explain a substantial, and almost identical, share of variance in grades and achievement tests, amounting to almost one-fifth. By comparison, they explain less than half as much—<.10—of the variance in reasoning, and almost none in processing speed (0.07%).
In other words, if you’re trying to predict how students will do on grades and achievement tests, their personalities are pretty strong predictors. But if you’re trying to predict their pure reasoning ability, personality is pretty useless. And the good-at-tests, bad grades students like high school Freddie are pretty plentiful:
the predictive power of intelligence is markedly different for these two achievement measures: it is much higher than that of personality in the case of achievement—but much lower in the case of school grades, where personality alone explains almost two times more variance than intelligence alone does.
So it would seem there may be some validity to the concept of the naturally bright and grinders after all. And the obverse, the less naturally bright but highly motivated grinder types?
Conscientiousness has a substantial positive relationship with grades—but negative relationships with both achievement test scores and reasoning.
In other words, the more conscientious you are, the better the grades you receive, even though you score lower on achievement and intelligence tests. Unsurprisingly, Conscientiousness (the “grit,” perseverance, stick-to-itiveness factor) correlated most highly with school grades, at .27. The ability to continue to work diligently and through adversity makes a huge difference on getting good grades but is much less important when it comes to raw intelligence testing.
What It Means
Ultimately, this research result is intuitive and matches with the personal experience of many. As someone who spent a lot of his life skating by on being bright, and only really became academically focused late in my undergraduate education, there’s something selfishly comforting here. But in the broader, more socially responsible sense, I think we should take care not to perpetuate any stigmas about the grinders. On the one hand, our culture is absolutely suffused with celebrations of conscientiousness and hard work, so it’s not like I think grinders get no credit. And it is important to say that there are certain scenarios where pure reasoning ability matter; if you’re intent on being a research physicist or mathematician, for example, or if you’re bent on being a chess Grandmaster, hard work will not be sufficient, no matter what Malcolm Gladwell says. On the other hand, I am eager to contribute in whatever way to undermining the Cult of Smartness. We’ve perpetuated the notion that those naturally gifted with high intelligence are our natural leaders for decades, and to show for it we have immense elite failures and a sickening lack of social responsibility on Wall Street and in Silicon Valley, where the supposed geniuses roam.
What we really need, ultimately, from both our educational system and our culture, is a theme I will return to in this blog again and again: a broader, more charitable, more humanistic definition of what it means to be a worthwhile human being.
(Thanks to SlateStarCodex for bringing this study to my attention.)