“Evaluating the Comparability of Two Measures of Lexical Diversity”

I’m excited and grateful to share with you the news that my article “Evaluating the Comparability of Two Measures of Lexical Diversity” has been accepted for publication in the applied linguistics journal System. The article should appear in the journal in the next several months. I would like to take a minute and explain it in language everyone can understand. I will attempt to define any complex or unusual terms and to explain processes as simply as possible, but these topics are complicated so getting to understanding will take some time and effort. This post will probably be of interest to very few people so I don’t blame you if you skip it.

What is lexical diversity?

“Lexical diversity” is a term used in applied linguistics and related fields to refer to the displayed range of vocabulary in a given text. (Related terms include lexical density, lexical richness, etc., which differ based on changes in how these terms are used and defined — for example, some systems use weightings based on the relative rarity of words.) Evaluating that range seems intuitively simple, and yet developing a valid, reliable metric for such evaluation has proven unusually tricky. A great many attempts to create such metrics have been undertaken with limited success. Some of the more exciting attempts now utilize complex algorithmic processes that would not have been practically feasible before the advent of the personal computer. My paper compares two of them and provides empirical justification for a claim about their mechanism made by other researchers.

Why do we care about it?

Lexical diversity and similar metrics have been used for a wide variety of applications. Being able to display a large vocabulary is often considered an important aspect of being a sophisticated language user. This is particularly true because we recognize a distinction between active vocabulary, or the words a language user can utilize effectively in their speech and writing, and a passive vocabulary, or the words a language user can define when challenged, such as in a test. This is an important distinction for real-world language use. For example, many tests of English as a second language involve students choosing the best definition for an English term from a list of possible definitions. But clearly, being able to choose a definition from a list and being able to effectively use a word in real-life situations are different skills. This is a particularly acute issue because of the existence of English-language “cram schools” where language learners study lists of vocabulary endlessly but get little language experience of value. Lexical diversity allows us to see how much vocabulary someone actually integrates into their production. This has been used to assess the proficiency of second language speakers; to detect learning disabilities and language impairments in children; and to assess texts for readability and grade-level appropriateness, among other things. Lexical diversity also has application to machine learning of language and natural language processing, such as is used in computerized translation services.

Why is it hard to measure?

The essential reason for the difficulty in assessing diversity in vocabulary lies in the recursive, repetitive nature of functional vocabulary. In English linguistics there is a distinction between functional and lexical vocabulary. Functional vocabulary contains grammatical information and is used to create syntactic form, and contains categories like the articles (determiners) and prepositions; lexical vocabulary delivers propositional content and contains categories like nouns and verbs. Different languages have different frequencies of functional vocabulary relative to lexical. Languages with a great deal of morphology — that is, languages where words change a great deal depending on their grammatical context– have less need for functional vocabulary, as essential grammatical information can be embedded in different word forms. Consider Latin and its notorious number of versions of every word, and then contrast with Mandarin, which has almost no similar morphological changes at all. English lies closer on the spectrum to Mandarin than to Latin; while we have both derivational morphology (that is, changes to words that change their parts of speech/syntactic category, the way -ness changes adjectives to nouns) and inflectional morphology (that is, changes to words that maintain certain grammatical functions like tense without changing parts of speech, the way -ed  changes present to past tense), in  comparison to a language like Latin we have a pretty morphologically inert language. To substitute for this, we have a) much stricter rules for word order than a language like Latin and b) more functional vocabulary to provide structure.

What does this have to do with assessing diversity in vocabulary? Well, first, we have a number of judgement calls to make when it comes to deciding what constitutes a word. There’s a whole vast literature about where to draw the line to determine what makes a word separate from another, utilizing terms like wordlemma, and word family. (We have a pretty good sense that dogs is the same word as dog but what about doglike or He was dogging me for days, etc.) I don’t want to get too far into that because it would take a book. It’s enough to say here that in most computerized attempts to measure lexical diversity, such as the ones I’m discussing here, all constructions that differ by even a single letter are classified as different terms. In part, this is a practical matter, as asking computers to tell the difference between inflectional grammar and derivational grammar is currently not practical. We would hope that any valid measure of lexical diversity would be sufficiently robust to account for the minor variations owing to different forms.

So: the simplest way to assess the amount of diversity would simply be to count the number of different terms in a sample. This measure has been referred to in the past as Number of Different Words (NDW) and is now conventionally referred to as Types. The problem here is obvious: you could not reliably compare a 75-word sample to a 100-word sample, let alone a 750-word sample. To account for this, researchers developed what’s called a Type-to-Token  Ratio (TTR). This figure simply places the number of unique words (Types) in the numerator and the number of total words (Tokens) in the denominator, to generate a ratio that is 1 or lower. The highest possible TTR, 1, is only possible if you never repeat a term, such as if you are counting (one, two, three…) without repeating. The lowest possible TTR, 1/tokens, is only possible if you say the same word over and over again (one,  one, one….). Clearly, in real-world language samples, TTR will lie somewhere in between those extremes. If half of all your terms are new words, your TTR would be .50, for example.

Sounds good, right? Well, there’s a problem, at least in English. Because language is repetitive by its nature, and particularly because functional vocabulary like articles and prepositions are used constantly– think of how many times you use the words “the” and “to” in a given conversation– TTR has an inevitable downward trajectory. And this is a problem because, as TTR inevitably falls, we lose the ability to discriminate between language samples of differing lengths, which is precisely why TTR was invented in the first place. For example, a 100-word children’s story might have the same TTR as a Shakespeare play, as the constant repetition of functional vocabulary overwhelms the greater diversity in absolute terms of the latter. We can therefore say that TTR is not robust to changes in sample size, and repeated empirical investigations have demonstrated that this sensitivity can apply even when the difference in text lengths are quite small. TTR fails to adequately control for the confounding variable it was expressly intended to control for.

A great many attempts have been made to adjust TTR mathematically– Guiraud’s Root TTR, Somer’s S– but none of them have worked.

What computational methods have been devised to measure lexical diversity?

Given the failure of straightforwardly mathematical attempts to  adjust TTR, and with the rise of increasingly powerful and accessible computer programs for processing text, researchers turned to algorithmic/computational models to solve the problem. One of the first such models was the vocd algorithm and the metric it returns, D. D stands today as one of the most popular metrics for assessing diversity in vocabulary. For clarity, I refer to D as “VOCD-D” in this research.

Developed primarily by the late David Malvern, Gerard McKee, and Brian Richards, along with others, the vocd algorithm in essence assesses the change in TTR as a function of text length and generates a measure, VOCD-D, that approximates how TTR changes as a text grows in length. Consider the image below, which I’ve photographed from Malvern et. al’s 2004 book Lexical diversity and language development: Quantification and assessment. (I apologize for the image quality.)

ttr over tokens

What you’re looking at is a series of ideal curves depicting changing TTR ratios over a given text length. As we move from the left to the right, we’re moving from a shorter to a longer text. As I said, the inevitable trajectory of these curves is downward. They all start in the same place, at 1, and fall from there. And if we extend these curves far enough, they would eventually end up in the same place, bunched together near the bottom, making it difficult to discriminate between different texts. But as these curves demonstrate, they do not fall at the same rate, and we can quantitatively assess the rate of downward movement in a TTR curve. This, in essence, is what vocd does.

The depicted curves here are ideal in the sense that they are artificial for the process of curve fitting. Curve fitting procedures are statistical methods to match real-world data, which is stochastic (that is, involves statistical distortion and noise), to approximations based on theoretical concepts. Real-world TTR curves are in fact far more jagged than this. But what we can do with software is to match real-world curves to these ideal curves to obtain a relative value, and that’s how vocd returns a VOCD-D measurement. The algorithm contains an equation for the relationship between text length, TTR, and VOCD-D, processes large collections of texts, and returns a value (typically between 40-120) that can be used to assess how diverse the vocabulary is in those texts. (VOCD-D values can really only be understood relative to each other.) The developers of the metric define the relationship between TTR, for tokens, and D for a given value along a TTR curve as TTR = D/N[(1+2n/d)^1/2 - 1]

Now, vocd uses a sampling procedure to obtain these figures. By default, the algorithm takes 100 random samples of 35 tokens, then 36 tokens, then 37, etc., until 50 tokens are taken in the last sample. In other words, the algorithm grabs 100 randomly-chosen samples of 35 words, then 36, etc., and returns an average figure for VOCD-D. The idea is that, because different segments of a language sample might have significantly different levels of displayed diversity in vocabulary, we should draw samples of differing sizes taken at random from throughout each text, in order to ensure that the obtained value is a valid measure. (The fact that lexical diversity is not consistent throughout a given text should give us pause, but that’s a whole other ball of wax.) Several programs that utilize the vocd algorithm also run through the whole process three times, averaging all returned results together for a figure called Doptimum.

VOCD-D is still affected by text length, and its developers caution that outside of an ideal range of perhaps 100-500 words, the figure is less reliable. Typical best practices involve combining VOCD-D with other measures, such as the Maas Index and MTLD (Measure of Textual Lexical Diversity), in order to make research more robust. Still, VOCD-D has shown itself to be far more robust across differing text lengths than TTR, and since the introduction of widely-available software that can measure it, notably the CLAN application from Carnegie Mellon’s CHILDES project, it has become one of the most commonly used metrics to assess lexical diversity.

So what’s the issue with vocd?

In a series of articles, Phillip McCarthy of the University of Memphis’s Institute for Intelligent Systems and Scott Jarvis of Ohio University identified a couple of issues with the vocd algorithm. They argue that the algorithm produces a metric which is in fact a complex approximation of another measure that is a) less computationally demanding and b) less variable. McCarthy and Jarvis argued that vocd‘s complex curve-fitting process actually approximates another value which can be statistically derived from a language sample based on hypergeometric sampling. Hypergeometric sampling is a kind of probability sampling that occurs “without replacement.” Imagine that you have a bag filled with black and white marbles. You know the number of marbles and the number of each color. You want to know the probability that you will withdraw a marble of a particular color each time you reach in, or what number of each color you can expect in a certain number of pulls, etc. If you are placing the marbles back in the bag after checking (with replacement), you use binomial sampling. If you don’t put the stones back (without replacement), you use hypergeometric sampling. McCarthy and Jarvis argued, in my view persuasively, that the computational procedure involved in vocd simply approximated a more direct, less variable value based on calculating the odds of any individual Type (unique word) appearing in a sample of a given length, which could be accomplished with hypergeometric sampling. VOCD-D, according to Jarvis and McCarthy, ultimately approximates the sum of the probabilities of a given type appearing in a sample of a given length. The curve-fitting process and repeated random sampling merely introduces computational complexity and statistical noise. McCarthy and Jarvis’s developed an alternative algorithm and metric. Though statistically complex, the operation is simple for a computer, and this metric has the additional benefit of allowing for exhaustive sampling (checking every type in every text) rather than random sampling. McCarthy and Jarvis named their metric HD-D, or Hypergeometric Distribution of Diversity.

(If you are interested in a deeper consideration of the exact statistical procedures involved, email me and I’ll send you some stuff.)

McCarthy and Jarvis found that HD-D functions similarly to VOCD-D, with less variability and requiring less computational effort. The latter isn’t really a big deal, as any modern laptop can easily churn through millions of words with vocd in a reasonable time frame. What’s more, McCarthy and Jarvis explicitly argued that research utilizing VOCD-D does not need to be thrown out, but rather that there is a simpler, less variable method to generate an equivalent value. But we should strive to use measures that are as direct and robust as possible, so they advocate for HD-D over VOCD-D, as well as calling for a concurrent approach utilizing other metrics.

McCarthy and Jarvis supported their theoretical claims of the comparability of VOCD-D and HD-D with a small empirical evaluation of the equivalence. They did a correlational study, demonstrating a very strong relationship between VOCD-D and HD-D, supporting their argument for the statistical comparability of the two measures. However, their data set was relatively small. In a 2012 article, Rei Koziumi and Yo In’nami argued that Jarvis and McCarthy’s data set suffered from several drawbacks:

(a) it used only spoken texts of one genre from L2 learners; (b) the number of original texts was limited (N = 38); and (c) only one segment was analyzed for 110-200 tokens, which prevented us from investigating correlations between LD measures in longer texts. Future studies should include spoken and written texts of multiple genres, employ more language samples, use longer original texts, and examine the effects of text lengths of more than 200 tokens and the relationships between LD measures of equal-sized texts of more than 100 tokens.

My article is an attempt to address each of these limitations. At heart, it is a replication study involving a vastly larger, more diverse data set.

What data and tools did you use?

I used the fantastic resource The International Corpus Network of Asian Learners of English, a very large, very focused corpus developed by Dr. Shin’ichiro Ishikawa of Kobe University. What makes the ICNALE a great resources is a) its size, b) its diversity, and c) its consistency in data collection. As the website says, “The ICNALE holds 1.3 M words of controlled essays written by 2,600 college students in 10 Asian countries and areas as well as 200 English Native Speakers.” Each writer in the ICNALE data set writes two essays, allowing for comparisons across prompts. And the standardization of the collection is almost unheard of, with each writer having the same prompts, the same time guidelines, and the same word processor. Many or most corpora have far less standardization of texts, making it much harder to draw valid inferences from the data. Significantly for lexical diversity research, the essays are spell checked, reducing the noise of misspelled words which can artificially inflate type counts.

For this research, I utilized the ICNALE’s Chinese, Korean, Japanese, and English-speaking writers, for a data set of 1,200 writers and 2,400 texts. This allowed me to compare results between first- and second-language writers, between writers of different language backgrounds, and between prompts. The texts contained a much larger range of word counts (token counts) than McCarthy and Jarvis’s original corpus.

I analyzed this data set with CLAN, in order to obtain VOCD-D values, and with McCarthy’s Gramulator software, in order to obtain HD-D values. I then used this data to generate Pearson product-moment correlation matrices comparing values for VOCD-D and HD-D across language backgrounds and prompts, utilizing the command-line statistical package SAS.

What did you find? 

My research provided strong empirical support for McCarthy and Jarvis’s prior research. All of the correlations I obtained in my study were quite high, above .90, and they came very close to the measures obtained by McCarthy and Jarvis. Indeed, the extremely tight groupings of correlations across language backgrounds, prompts, and research projects strongly suggests that the observed comparability identified in McCarthy and Jarvis’s work is the result of the mathematical equivalence they have identified. My replication study delivered very similar results using similar tools and a greatly expanded sample size, arguably confirming the previous results.

Why is this a big deal?

Well it isn’t, really. It’s small-bore, iterative work that is important to a small group of researchers and practitioners. It’s also a replication study, which means that it’s confirming and extending what prior researchers have already found. But that’s what a lot of necessary research is — slowly chipping away at problems and gradually generating greater confidence about our understanding of the world. I am also among a large number of researchers who believe that we desperately need to do more replication studies in language and education research, in order to confirm prior findings. I’ve always wanted to have one of my first peer-reviewed research articles be a replication study. I also think that these techniques have relevance and importance to the development of future systems of natural language processing and corpus linguistics.

As for lexical diversity, well, I still think that we’re fundamentally failing to think through this issue. As well as VOCD-D and HD-D work in comparison to a measure like TTR, they are still text-length dependent. The fact that best practices require us to use a number of metrics to validate each other suggests that we still lack a best metric of lexical diversity. My hunch (or more than a hunch, at this point) is not that we haven’t devised a metric of sufficient complexity, but that we are fundamentally undertheorizing the notion of lexical diversity. I think that we’re simply failing to adequately think through what we’re looking for and thus how to measure it. But that’s the subject of another article, so you’ll just have to stay tuned.

tl;dr version: Some researchers said that two ways to use a computer to measure diversity in vocabulary are in fact the same way, really, and provided some evidence to support that claim. Freddie threw a huge data set with a lot of diversity at the question and said “right on.”


no comments for awhile

No offense to my loyal readers, but I’m going to be turning comments off by default for the time being. Just frustrated by the quality of comments lately. I’ll turn them back on at some point I’m sure.

Freddie’s zoo

My brother and his wife have moved to Guam for at least a couple years, and they asked me to look after their two cats until the either decide to stay permanently or come home. So when you add their cats, Trapito and Mia, to my cat Suavecito (who is one of their kittens) and my dog Miles, it makes for quite a crew. I was worried that it would be nonstop madness. It is pretty crazy, particularly around feeding time, but really it’s been quite nice.

There is so much going on in my life right now, so much work and so many things to worry about. It doesn’t help that there was a paperwork snafu at school and my last two paychecks have been much smaller than they’re supposed to be, so the constant money worry is a little worse than usual. But then again, I just have to feel gratitude and satisfaction. I love teaching and I love the chance to read and write all day and I’m enjoying a lovely fall with good friends and professors. Had a lovely day of working and walking around the football stadium during the Michigan State game.

Here’s me and Mia.

The Giving Tree is not rational

GivingTreeThere were two takes on Shel Silverstein’s simultaneously beloved and derided The Giving Tree in the Times recently, one from Anna Holmes and one from Rivka Galchen. Holmes, though characteristically well-expressed, joins a recent history of “provocative” takes on the book that misunderstand not only its text but its purpose. Galchen is closer to the mark, but suffers from the same misunderstanding: The Giving Tree’s relationship can’t be explained in the language or thought processes we might deploy to debate the earned income tax credit because the story is not meant to be explained. The book is irrational, by design, and its irrationality is the best kind: the kind that challenges the human pretense of understanding.

The takes are, conventionally, that the book is actually acutely disturbing, a portrayal of a parasitic, unhealthy relationship that calls to mind abuse and codependence. In that, they aren’t exactly wrong, but they are wrong to think that this is unintentional, that it is wrong, or that we are meant to judge it as some arch critique. Holmes blames the book for its abusive relationship, Galchen praises it, but both misunderstand in the attempt to bend it into a form that pleases adult conceptions of meaning and sense. The Giving Tree is a children’s book, and its incredible power lies in its refusal to adopt the parent’s efforts to render primal and inhuman feelings into parables that can be understood with the thinking mind. Children, particularly young children, live with the intensity of their emotions in full flower, and have not yet erected the intricate structures of phony reason to render those emotions more psychically palatable. Silverstein understood what Freud understood: what we want from our parents is unreasonable; how much we’ll take from our parents is irrational; our relationship to our parents is indefensible. The tree isn’t the boy’s mother, nothing so dull. But the tree is a symbol of the fundamental irrationality of generosity. It is the giving tree; giving is what the tree is and does. You can never come up with an intellectually satisfying answer to why the tree gives as much as it does and to why we should find pleasure in such a thing because the book is targeting a part of you that is far older and wilder and more powerful than your thinking mind.

While I don’t offer it as an explicit Christian allegory or anything so crude, particularly given that Silverstein was Jewish, there’s a clear parallel between the book’s story (not its “message,” whatever that could be) and the best versions of Christian love. It’s hard to imagine a historical figure who has been more thoroughly abused by the distortions of rational minds than Jesus, whose message was resolutely, intensely, and combatitively irrational. They built a vast church founded on precisely the opposite kind of rigid, antiseptic didacticism not in spite of who Jesus was but because of who he was: it took an edifice as vast and self-important as the Catholic church to squeeze that wildly unreasonable man into something that might be called a philosophy. After all, human beings cannot live with the kind of challenge that Jesus presents us with. So we create theology and philosophy and we end up with Joel Osteen. As an agnostic teenager reading the Christian bible, and as an atheist now, what I admired and admire about Christianity and Jesus was the supreme irrationality of his boundless love. Does he give again and again? Yes. Does he give to the good and the bad alike? Yes. Does he love evil people? yes. Does Hitler get into heaven? Yes. Yes, he does. For now, that kind of love can live only in the pages of that bible, and it should come as a surprise to no one that here on earth there is no such thing as a Christian church.

The rational mind is the way we make progress as a species. But the direction we have to go, to reach the next stage of human enlightenment, is away from the rational mind, not towards it. For now the task is to insist, sometimes, on the unreasonable, the irrational, and indefensible, and this is the guiding light for me as a political creature. As a society, we should give what is asked for to whoever asks for it. Even if we can’t afford to give? Yes. Even if they don’t need it? Yes. Even if they’ve lied and cheated in asking before? Yes. Even if we know they’re lying and cheating now? Yes. And so Galchen’s take, that we are meant only to witness this relationship and not to bless or replicate it, is flawed too: the relationship portrayed in the book is indeed indefensible, but contrary to Galchen, it is precisely exemplary.

We’ve seen, in the age of the internet, a vast explosion in the analysis and examination of the art around us, and as frustrated as we might become with the opinions of others, it’s hard for me to see this expansion as anything else but a massive good. I am challenged and moved by other people’s thoughts about art every day, and it’s a blessing. But analysis and examination are methods of the mind, and I fear that efforts to feel with each other are far rarer than efforts to think with each other, or at each other. These efforts to cast the brute emotional power of art into the conventions of thinking  are necessary, natural, and fun. But they can result in, for example, the deep hatred for ambiguity in art, the effort to tease out of every creator what really happened. More, so many takes on art today, straining for political relevance, misunderstand that it is precisely the ability of art to express the indefensible and the disturbing that lends it enduring power. If you are yet another person online to point out that the lyrics of “Run For Your Life” off of Rubber Soul are disturbing and misogynist, you are yet another to fail to understand that John Lennon didn’t kill anybody. He wrote a song about his impulses to kill — his scary, ugly, unmentionable impulse to kill, driven by the frightening irrationality at the heart of love and desire. He put those impulses into his art because that is where they could be acknowledged without danger. His music was where the unforgivable monster of his feelings could live and do no harm.

I am thinking with you, here, not feeling. I’m just saying, in my thinking, that there are things in this book which cannot be thought through. You can get a lot from thinking about The Giving Tree, but not understanding. There are all kinds of ways of thinking about this book and all books and I appreciate them all. But there are some readings that we must reject because they are contrary to the one part of a book we all have to honor, the text itself. So the common argument that the tree is unhappy, that its stated happiness is satirical or ironic or paradoxical, cannot withstand scrutiny. What do we know for sure, at the end? We know that the tree is happy. The boy deserved nothing and took everything and left the tree bereft. And the tree was happy. Silverstein leaves us to live in that world.

Jezebel gets in on some sweet sex-shaming

Shikha Dalmia wrote some things about affirmative consent laws. I agree with some of it, some of it I am troubled by. Jezebel and Erin Gloria Ryan responded, in contrast,  by sex-shaming her for the crime of having a different opinion about a controversial law. I don’t have any right to dictate what it means to be a feminist. But it’s shameful to attack another woman’s sex life thanks to a political disagreement. Ryan knows nothing about Shiksa Dalmia, knows nothing about her sex life and how satisfying it is, and clearly barely read her piece before writing her response to it. I don’t care what the situation is. I don’t care who you write for. I don’t care how you identify yourself politically. It’s not alright to shame someone else about their sex life. Ever.

Can you imagine if I read a piece by Ryan and wrote an essay in which my sole “argument” was “This chick needs to get laid”? Can you imagine the response? If I said “boy, Erin Gloria Ryan writes like someone who has shitty sex!” I would rightfully be pilloried, because that behavior is not acceptable. And the thing about it is that Ryan will never, ever even think it over. She won’t for a second say to herself, “hey, maybe shaming another woman and making broad assumptions about her sex life isn’t the most feminist thing I’ve ever done in my life!” Because Jezebel’s whole deal is never, ever engaging in self-criticism.

There’s lots and lots I could say about the piece. Ryan simply makes up things that Dalmia believes out of whole cloth, inventing arguments that seem to exist only in Ryan’s imagination. She seems to think “the fuck?” is an argument. And Ryan demonstrates the incoherence of her own position. She’s advocating for a massive change to the legal definition of consent, but then mocks Dalmia for thinking that it’ll actually be enforced. She writes:

the piece’s most ridiculous aspect is the assumption that following every sex act, thanks to this law, authorities will sweep in and subject both parties (but mostly the man) to an exhaustive cross examination on consent as the pair of lovebirds towel their bodily fluids off of each other in a panic.

Hey, seriously. You guys. Admitting that a law won’t be enforced is not an argument for making it a law. In fact, it’s the opposite.

But none of that really matters. Mocking a woman’s sex life because you don’t like her politics is wrong. Shaming another woman because you don’t like her politics is not alright. It’s reactionary by its nature.Elizabeth Stoker Bruenig wrote about how women who explore political ideas outside of the mainstream feminist space are disciplined:

unorthodox views can, especially for women in left academic feminism, result in precisely that form of discipline: withdrawal of community, overwhelming assassination of character, a very sudden onslaught of negative feedback and demands for apology. It strikes me that this method of disciplining members is another symptom of the problem Amber gets at in her article: the community is not so concerned with what is true or false as with who is good and who is bad.

Congratulations, Jezebel. You’re the latest to discipline a woman for having an opinion, and you’ve used sex-shaming as a tool to do it. I bet you’re very proud.

Update: It occurs to me that  Ryan repeatedly saying “the fuck?” and thinking that constitutes a rebuttal is a perfect example of We Are All Already Decided. Thinking that expressing incredulity at someone else’s opinion is enough to dismiss it can only happen when you’re so steeped in an echo chamber that you forget there’s a world of people out there who don’t agree with you.

culture eats politics, baseball edition

As I pointed out the other day, many people have reacted to the “alt-lit” rape scandal by blurring the lines between their natural disgust at those accusations and their aesthetic and stylistic annoyance with the alt-lit culture. That’s gross and misguided. Being annoyed by someone else’s style and culture should not be confused with feeling revulsion towards rape accusations. Those things are not the same, and blurring those lines just undermines the effort to seriously combat sexual assault.

Now, the same dynamic is playing out in baseball. The much-hated Saint Louis Cardinals and their much-hated fanbase are celebrating yet another trip to the National League Championship Series. As a Cubs fan, I find this deeply annoying. But I don’t find it immoral, because  who wins baseball games is not a moral question. That hasn’t stopped other people who are annoyed by the Cardinals from trying, though. Because some small number of Cardinals fans acted in very shitty manner towards Ferguson protesters, many people on social media now have ammunition to say that the team they don’t like is not only annoying, but racist and conservative. The kind of people who create the cloud of performative morality that envelops the elite internet have once again found, with typical good fortune, that what they like is indistinguishable from what is good. Me, I would say that confusing your tribal athletic passions with your distaste for racism and police violence is not a progressive or helpful way to act. But that’s just me, apparently.

(Maybe worst of all is the suggestion that there’s any fanbase alive that isn’t chock full of racist fans. I promise: the team you like is beloved by some of the worst people on earth.)

I’ve said in the past that our media elites seem to believe in a juvenile moral universe — Manichean, simplistic, and filled with perfect clarity about every moral controversy, to the point where they not only already know what the answer to every moral question is, they can’t believe that you don’t already agree with them. It’s very childish, in the literal sense of being the way that children think about the world. But it’s also a convenient moral universe. It’s one where there’s no space between their moral convictions and their aesthetic preferences, where the artists and creators whose work they enjoy are also political paragons, where they and their friends occupy a different moral strata than the rest of us, and where they are always the righteous heroes of every drama. Nice work if you can get it.

cautionary tales: get it together, you guys


So I’m not one of those pedants who thinks a misplaced comma invalidates an argument or ruins an essay, but seriously, you guys. There is no Disney movie called The Rats of NIMH. There is a movie made by Don Bluth, who had a notorious falling out with Disney and left to form his own studio, called The Secret of NIMH, released by United Artists and based on the book Mrs. Frisby and the Rats of NIMH. 

I mean… 45 seconds of research.


we’re all chumps now, pharmacist edition

I’ve been banging on against the STEM shortage myth for a long time now, but this is only part of a broader argument. We’re living in an age where a lot of chin-scratching econ types a) blame people for their own unemployment on the theory that what they studied is frivolous/impractical/whatever, and b) push them to pursue “practical” degrees in fields like computer science, despite a typical lack of evidence that these fields actually ensure better employment and income outcomes. These arguments are almost exclusively delivered in an idiom of condescension and certainty, because of course there’s a shortage of computer scientists. (Actually, the latest numbers I’ve seen show an overall unemployment rate around 4% for those with bachelors degrees generally and up around 8% for those with computer science degrees.) And of course you can make a killing coding an app in your dorm room. (Actually, you almost certainly can’t.) And of course French poetry majors are to blame for the unemployment rate. (No, they aren’t.) But no one could have predicted. (We were seeing this dynamic before many current college students were born.)

Well: pity the pharm school grad. No, really. They deserve real human compassion, because like the American people in general, they’ve been sold a bill of goods. These kids were told again and again that pharmacy was a safe haven, that this was a growing field that could provide them with the good life for years to come. But as Katie Zavadski’s careful reporting shows, they were misled. The pharmacy labor market has been drying up, driving higher unemployment and lower wages. And really: of course, when you tell a generation of kids that a particular field is where the smart money is, you’re going to see a surplus. That’s how markets work! It’s bizarre to look at a supply-demand equation, propose to dramatically expand the supply, and expect to see the economic advantage remain. Every argument of the type “here’s the fields you should be pushing students into” is an argument to flood the market with graduates who are only going to be competing against each other for limited jobs. It’s a zero-sum vision that is endorsed as a long-term solution for societal economic health, and it makes no sense.

There is no such thing as practical knowledge, and so there is no such thing as a practical major. This country graduates 350,000 business majors a year. The metrics for those degrees are generally awful. But nobody ever includes them in their arguments about impractical majors, despite those bad numbers. And if you’re some 19 year old, out to choose a career path, business sure sounds practical. So they graduate with those degrees and flood the market with identical resumes and nobody will hire them. Meanwhile, they lost the opportunity to explore fields that they might have enjoyed, that might have deepened the information acquisition and evaluation skills that would allow them to adapt to a whole host of jobs, and that might have provided a civic and moral education. All to satisfy a vision of practicality that has no connection to replicable, reliable economic advantage.

What’s most depressing of all is that these changes of fortune for particular fields is always seen as worthy of mockery and not sympathy. I searched around for that New Republic article on social media and found plenty of people laughing at these kids and calling them chumps for following an educational fad. You just can’t win: if you pursue a field you actually like, they mock you for your impracticality. If you pursue a field out of a desire to chase the money, and you get unlucky, they mock you for choosing poorly. Whatever it takes to convince you that your unemployment is your own fault and not the fault of an economic system that serves only the 1%.

Read this missive from Casey Ark. Consider what his complaint is, and what it isn’t. Kid: you’re right. You were fooled. Bamboozled. Lied to. But the lie is so much bigger than the one you think you’re complaining about. Who told you that studying programming and business is more practical than studying English? And why did you believe them?

Chasing a particular employment market, for an individual, can be a good or a bad bet. But treating skill chasing as a long-term economic solution on the societal level is insane. We’ve responded to unprecedented labor market swings, and to our incredible exposure to risk through our financial system, by dramatically narrowing our notion of what skills are valuable and who gets to be considered a practically educated person. That makes zero sense, particularly in a time when automation threatens to cut the legs out from more and more workers as we move forward. We are manically pursuing a far narrower vision of what human beings can call a vocation, treating any endeavor that does not involve numbers or digital technology as useless and old-fashioned, with nothing resembling a sound evidentiary basis for believing that this will deliver better labor outcomes. (The numbers-based fields are the ones that computers will be best equipped to take over!) In a world where computers and robots will take over more and more work that was once performed by humans, we should broadening our notion of what constitutes valuable work, not shrinking it. And we should use our capacity for government-directed redistribution to share the efficiency and productivity gains of those computers and robots more widely. Instead, Google gets the money, you lose your job, and Tom Friedman makes millions telling you that it’s your own fault.

Petrochemical engineering is having a moment, in large measure because of the surprise discovery of new fossil fuel reserves. So: you want to tell an army of 18-year olds to start learning that stuff now? Even if they don’t like it? Even if they aren’t talented in that domain? You know, for that kind of job, you might need more than 4 years of school. You might need 6. You might need 12. Hope that labor market holds. Hope that bet pays off. But hey. If you’re one of the poobahs in the media telling these kids what to do, you’re not making that bet yourself.