So I’m going to re-up this piece I wrote about why “correlation does not causation” leads to a research nihilism that is more of a problem than mistaking correlation for causation. I’m doing it because of the viral proliferation of this website from Tyler Vigen, a Harvard law student. The website does what every statistics professor since Karl Pearson has said to never, ever do– run random correlations in great numbers and look for those that are large. Like I wrote recently in my post about p-value weirdness, this is a disastrous practice, and a mistake that no one with an ounce of knowledge about research methods would make. And yet I have seen this website shared dozens of times now by people who treat it as some sort silver bullet against correlation, when it neither is that nor purports to be that. It’s robbed of its context and purpose just about every time it gets shared.
To take one example, here Dylan Matthews of Vox, who appears incapable of writing a word that is not steeped in undeserved superiority and condescension, presents the website and the topic of correlational data without bothering to embed his charts with the information and context that are required to understand them intelligently. The rampant representation of this website as some sort of smoking gun against one of our most basic and important types of statistics is derp, just derp presented as profundity, that’s all.
Of course, if you go looking for correlations in mass, you’ll find spurious relationships. That’s Stats 101. Thou Shalt Not Data Snoop is as basic as it gets. That you can find high correlations between phenomenon that do not have a causal relationship is something we’ve known since as long as we’ve been correlating things. Luckily, we enjoy the power of human reason, and no one, no matter how stupid they may be, thinks that the divorce rate is affected by the consumption of margarine. The very fact that the inherent absurdity of these connections is used as an argument against correlational data should clue us in: there’s no danger, whatsoever, from people mistaking these relationships for causal. I’m afraid if you think that the marriage rate in Mississippi is dictated by per capita consumption of milk, no amount of research methodology can save you. Are there dangers in using correlational data? Absolutely. You start from theory, you set a low alpha, you use care in data collection, and you embed your work in caveats and provisos. And if someone else disputes your findings, that’s their right. But they face a burden of proof too, and that is never— never— met by simply saying “correlation does not imply causation! Look ma, I’m a intellectual!”
You know who gets that? Tyler Vigen. Because if you actually click over to the About section of the website, you’ll read this:
I created this website as a fun way to look at correlations and to think about data. Empirical research is interesting, and I love to wonder about how variables work together. The charts on this site aren’t meant to imply causation nor are they meant to create a distrust for research or even correlative data. Rather, I hope this projects fosters interest in statistics and numerical research.
I’m not a math or statistics major (and there are better ways to calculate correlation than I do here), but I do have a love for science and discovery and that’s all anyone should need. Presently I am working on my J.D. at Harvard Law School.
I love that, particularly the bit about the love of science and discovery being enough to engage on these issues. Nobody has to be an expert to talk intelligently about research methods– if you did, I’d have to keep my mouth shut. But you do have to be willing to work, and apparently, clicking over one page to the About section is more effort than the vast throngs that are sharing this site and going “hardy har har!” are capable of mustering.
The meme-ification of “correlation does not imply causation” is a perfect example of how online culture can make us stupid. The tweetification of discussion favors cheap soundbites over real understanding, simplification over complexity, ease of sharing over the difficult work necessary for comprehension, and a dismissive nihilism over the effort required to present meaningful alternatives. It’s the copy-and-paste approach to sounding smart that’s an epidemic online, and it takes a very sound and necessary warning and turns it into a tool for creating more ignorance. Congratulations again, online culture: we can make any smart thing stupid, if we really put our minds to it.
In the meantime, the next time you decline to smoke cigarettes because they cause cancer, you can thank correlation for that knowledge. Or alternatively, you can play the part of a Big Tobacco lawyer, be simultaneously stupid and condescending, and say “correlation is not causation!” literally every time someone mentions correlation. Your call.