sound research methodology can’t save you if you’re a moron

So I’m going to re-up this piece I wrote about why “correlation does not causation” leads to a research nihilism that is more of a problem than mistaking correlation for causation. I’m doing it because of the viral proliferation of this website from Tyler Vigen, a Harvard law student. The website does what every statistics professor since Karl Pearson has said to never, ever do– run random correlations in great numbers and look for those that are large. Like I wrote recently in my post about p-value weirdness, this is a disastrous practice, and a mistake that no one with an ounce of knowledge about research methods would make. And yet I have seen this website shared dozens of times now by people who treat it as some sort silver bullet against correlation, when it neither is that nor purports to be that. It’s robbed of its context and purpose just about every time it gets shared.

To take one example, here Dylan Matthews of Vox, who appears incapable of writing a word that is not steeped in undeserved superiority and condescension, presents the website and the topic of correlational data without bothering to embed his charts with the information and context that are required to understand them intelligently. The rampant representation of this website as some sort of smoking gun against one of our most basic and important types of statistics is derp, just derp presented as profundity, that’s all.

Of course, if you go looking for correlations in mass, you’ll find spurious relationships. That’s Stats 101. Thou Shalt Not Data Snoop is as basic as it gets. That you can find high correlations between phenomenon that do not have a causal relationship is something we’ve known since as long as we’ve been correlating things. Luckily, we enjoy the power of human reason, and no one, no matter how stupid they may be, thinks that the divorce rate is affected by the consumption of margarine. The very fact that the inherent absurdity of these connections is used as an argument against correlational data should clue us in: there’s no danger, whatsoever, from people mistaking these relationships for causal. I’m afraid if you think that the marriage rate in Mississippi is dictated by per capita consumption of milk, no amount of research methodology can save you. Are there dangers in using correlational data? Absolutely. You start from theory, you set a low alpha, you use care in data collection, and you embed your work in caveats and provisos. And if someone else disputes your findings, that’s their right. But they face a burden of proof too, and that is never— never— met by simply saying “correlation does not imply causation! Look ma, I’m a intellectual!”

You know who gets that? Tyler Vigen. Because if you actually click over to the About section of the website, you’ll read this:

I created this website as a fun way to look at correlations and to think about data. Empirical research is interesting, and I love to wonder about how variables work together. The charts on this site aren’t meant to imply causation nor are they meant to create a distrust for research or even correlative data. Rather, I hope this projects fosters interest in statistics and numerical research.

I’m not a math or statistics major (and there are better ways to calculate correlation than I do here), but I do have a love for science and discovery and that’s all anyone should need. Presently I am working on my J.D. at Harvard Law School.

I love that, particularly the bit about the love of science and discovery being enough to engage on these issues. Nobody has to be an expert to talk intelligently about research methods– if you did, I’d have to keep my mouth shut. But you do have to be willing to work, and apparently, clicking over one page to the About section is more effort than the vast throngs that are sharing this site and going “hardy har har!” are capable of mustering.

The meme-ification of “correlation does not imply causation” is a perfect example of how online culture can make us stupid. The tweetification of discussion favors cheap soundbites over real understanding, simplification over complexity, ease of sharing over the difficult work necessary for comprehension, and a dismissive nihilism over the effort required to present meaningful alternatives. It’s the copy-and-paste approach to sounding smart that’s an epidemic online, and it takes a very sound and necessary warning and turns it into a tool for creating more ignorance. Congratulations again, online culture: we can make any smart thing stupid, if we really put our minds to it.

In the meantime, the next time you decline to smoke cigarettes because they cause cancer, you can thank correlation for that knowledge. Or alternatively, you can play the part of a Big Tobacco lawyer, be simultaneously stupid and condescending, and say “correlation is not causation!” literally every time someone mentions correlation. Your call.


  1. I’ve wondered for awhile now if this early 21st century obsession with “data” will, in 100 years, be looked on as one of those generational oddities, much like phrenology, or spiritualism.

  2. I enjoy your writing on statistics, maybe because you and I are trying to master some of the same concepts. Where I disagree a bit is with a tone behind comments like “…and a mistake that no one with an ounce of knowledge about research methods would make…”
    My experience is that rigorous research methodology and valid statistical analysis are extremely difficult topics to master. If I had to put numbers to it, I would guess that fewer than 1 in 100 average adults have a good grasp of the core concepts and fewer than 1 in 4 people who do research- or policy-related work are capable of independently authoring an analysis that could withstand scrutiny by first-rate methodologists and statisticians. The underlying concepts are incredibly slippery and the typical human mind (with it’s powerful bias toward pattern-seeking and social intuition) is simply not wired for this level of epistemological rigor.
    You have moved beyond “correlation is not causation”, so maybe it seems now like *everybody* knows this, but my sense is that the number of people who really grok this is tiny compared to the number of people who do not.

  3. Freddie,

    You’re on a roll — and your calmly crafted replies to Argle and Rob show you may be capable of continuing it.

    “Correlation is not causation” has always struck me as one of the prime products of Pedagogicism, that Doctrine which holds that it is necessary to equip teachers with lies to utter at those times when actually answering students’ questions would be sorta like work.

    “The Chinese language is made up of monosyllabic words, ” and “Chinese verbs have no tenses” are both stupid lies, but they have enough apparent plausibility that teachers have gotten away with them for a couple of hundred years now. They are both much easier to parrot than possibly correct answers which could be “The Chinese formation of words is very different from that in language descended from Proto-Indo-European, though in some cases they agglutinate in a way that resembles that of the Romance language…” and “The tenses of Chinese verbs are very different from those of English, or even of the parsed Romance languages, but I don’t have time for all of it right now…” respectively.

    When I hear “Correlation is not causation,” I make the necessary substitutions: causation is the most frequent source of correlation; shared causality comes next; and even when causality is not the cause, there’s very likely going to be something interesting there, so it’s always worth a look…

    Pedagogicism apart, there are a million other problems in education. I don’t have answers to many of them, but I do have sound and important advice for anybody interested in the subject (as opposed to interested in arguing about the subject): Read the Coleman Reports. (Google the late and excellent “James S. Coleman.”)

    And if anybody can convince me of the genuineness of their interest, I’ll even take half an hour to tell them why I think Coleman is worth their time, so they can make up their own minds on the question.



  4. I’ve made the CDNIC complaint plenty, myself. Almost always when someone reacts to “X and Y are correlated” with “well, I better get more X, because I want more Y.” I’m all in favor of correlations opening a discussion about causality. My impression was that the complaint tends to be used to prevent a discussion from closing down, by way of premature conclusions.

    Another way: I thought the complaint was aimed more often at poor research journalism than research itself.

  5. Come on, you don’t think any of those graphs are funny? My favorite version of the joke is where unemployment rates in the Bush administration
    (or whatever) are ‘correlated’ with a certain mountain range. Get it?

    Have you tried going full Humean? ‘I know it doesn’t imply causation- because there’s no such thing! It’s only a useful myth!’

  6. In your previous post, you say “Correlation is a statistical relationship. Causation is a judgement call.” I just wanted to point out that there is a branch of statistical research called “causal inference”, which tries to formalize causation as a statistical relationship. One good resource is Judea Pearl’s overview paper:

    It’s a research area, but it’s promising.

  7. Correlation is a mathematical fact. Causality is a metaphysical concept. Mixing them together is a category mistake.

  8. Well, and the funny thing about this is the people who use “correlation is not causation” will turn around and use correlation uncritically when they think it supports their cause. And frequently it does! Correlations in vaccine use and the decline of childhood diseases, increasing CO2 and temperature anomalies, lead pollution and violent crime rates, all may be indicative of important causal relationships, or relationships that at least bear further study into causal mechanisms. But if they were to allow their opponents the same dismissive retort of “correlation is not causation!” the debate gets stuck on stupid. If people can’t specifically explain why correlations they like are more “real” than correlations they don’t like, maybe they should off on weighing in on correlations.

Leave a Comment

Your email address will not be published. Required fields are marked *