by Judith Curry
Correlation doesn’t imply causation.
An interesting article in Slate: Correlation does not imply causation: How the internet fell in love with a stats class cliche. Excerpts:
The correlation phrase has become so common and so irritating that a minor backlash has now ensued against the rhetoric if not the concept. No, correlation does not imply causation, but it sure as hell provides a hint. Still, if it can frame the question, then our observation sets us down the path toward thinking through the workings of reality, so we might learn new ways to tweak them. It helps us go from seeing things to changing them.
So how did a stats-class admonition become so misused and so widespread? What made this simple caveat—a warning not to fall too hard for correlation coefficients—into a coup de grace for second-rate debates? A survey shows the slogan to be a computer-age phenomenon, one that spread through print culture starting in the 1960s and then redoubled its frequency with the advent of the Internet. If now we’re quick to say that correlation is not causation, it’s because the correlations are all around us.
With the arrival of Pearson’s coefficients and the transformation of statistics, that “fallacy” became more central to debate. Should scientists even bother with a slippery concept like causation, which can’t truly be measured in the lab and doesn’t have a proper definition? Maybe not. Pearson’s work suggested that causation might be irrelevant to science and that it could in certain ways be indistinguishable from perfect correlation. “The higher the correlation, the more certainly we can predict from one member what the value of the associated member will be,” he wrote in one of his major works, The Grammar of Science. “This is the transition of correlation into causation.”
To say that correlation does not imply causation makes an important point about the limits of statistics, but there are other limits, too, and ones that scientists ignore with far more frequency. In The Cult of Statistical Significance, the economists Deirdre McCloskey and Stephen Ziliak cite one of these and make an impassioned, book-length argument against the arbitrary cutoff that decides which experimental findings count and which ones don’t. By convention, we call an effect “significant” if the chances of its deriving from a twist of fate—as opposed to some more genuine relationship—are less than 5 percent. But as McCloskey and Ziliak (and many others) point out, there’s nothing special about that number and no reason to invest it with our faith.
I wonder if it has to do with what the foible represents. When we mistake correlation for causation, we find a cause that isn’t there. Once upon a time, perhaps, these sorts of errors—false positives—were not so bad at all. If you ate a berry and got sick, you’d have been wise to imbue your data with some meaning. (Better safe than sorry.) Same goes for a red-hot coal: one touch and you’ve got all the correlations that you need. When the world is strange and scary, when nature bullies and confounds us, it’s far worse to miss a link than it is to make one up. A false negative yields the greatest risk.
Now conditions are reversed. We’re the bullies over nature and less afraid of poison berries. When we make a claim about causation, it’s not so we can hide out from the world but so we can intervene in it. A false positive means approving drugs that have no effect, or imposing regulations that make no difference, or wasting money in schemes to limit unemployment. As science grows more powerful and government more technocratic, the stakes of correlation—of counterfeit relationships and bogus findings—grow ever larger. The false positive is now more onerous than it’s ever been. And all we have to fight it is a catchphrase.
Over the past few years, I have been collecting a lot of material on causation, eventually I will get to a series of blog posts on this, targeted at the climate change attribution problem.
I found the Slate article to be interesting and timely, for several reasons. Most immediately, there is a forthcoming guest post at CE with some very interesting correlations. There are also climate correlation studies that get prematurely dismissed for lack of a convincing physical mechanism.
I was particularly struck by the last two paragraphs excerpted from the Slate article, regarding false positive errors (Type I). A relative lack of concern for a false positive error characterizes the precautionary principle. The Slate article intriguingly suggests that more sophisticated societies have more to lose from the false positive errors. Something to ponder regarding the UNFCCC mitigation policies.