by Judith Curry
There is no cost to getting things wrong. The cost is not getting them published. – Brian Nosek, as quoted by the Economist.
The Economist has an important article entitled Unreliable research: trouble at the lab, with subtitle “Scientists like to think of science as self-correcting. To an alarming degree, it is not.“
There is also an editorial on the paper: How Science Goes Wrong, with subtitle “Problems with scientific research. Science has changed the world, now it needs to change itself.”
Excerpt from the editorial:
A SIMPLE idea underpins science: “trust, but verify”. Results should always be subject to challenge from experiment. That simple but powerful idea has generated a vast body of knowledge.
Modern scientists are doing too much trusting and not enough verifying—to the detriment of the whole of science, and of humanity.
Even when flawed research does not put people’s lives at risk—and much of it is too far from the market to do so—it squanders money and the efforts of some of the world’s best minds. The opportunity costs of stymied progress are hard to quantify, but they are likely to be vast. And they could be rising.
Excerpts from the main (longer) article:
In 2005 John Ioannidis, an epidemiologist from Stanford University, caused a stir with a paper showing why, as a matter of statistical logic, the idea that only one such paper in 20 gives a false-positive result was hugely optimistic. Instead, he argued, “most published research findings are probably false.”
Dr Ioannidis draws his stark conclusion on the basis that the customary approach to statistical significance ignores three things: the “statistical power” of the study (a measure of its ability to avoid type II errors, false negatives in which a real signal is missed in the noise); the unlikeliness of the hypothesis being tested; and the pervasive bias favouring the publication of claims to have found something new.
Unlikeliness is a measure of how surprising the result might be. By and large, scientists want surprising results, and so they test hypotheses that are normally pretty unlikely and often very unlikely.
Victoria Stodden, a statistician at Stanford, speaks for many in her trade when she says that scientists’ grasp of statistics has not kept pace with the development of complex mathematical techniques for crunching data. Some scientists use inappropriate techniques because those are the ones they feel comfortable with; others latch on to new ones without understanding their subtleties. Some just rely on the methods built into their software, even if they don’t understand them.
Not even wrong
This fits with another line of evidence suggesting that a lot of scientific research is poorly thought through, or executed, or both. The peer-reviewers at a journal like Nature provide editors with opinions on a paper’s novelty and significance as well as its shortcomings. But some new journals—PLoS One, published by the not-for-profit Public Library of Science, was the pioneer—make a point of being less picky. These “minimal-threshold” journals, which are online-only, seek to publish as much science as possible, rather than to pick out the best. They thus ask their peer reviewers only if a paper is methodologically sound. Remarkably, almost half the submissions toPLoS One are rejected for failing to clear that seemingly low bar.
JC comment: Seems to me that the editors/reviewers at Science, Nature, PNAS rarely seem to address methodology issues, focusing rather on novelty and relevance.
Models which can be “tuned” in many different ways give researchers more scope to perceive a pattern where none exists.
JC comments: summarizes succinctly my concerns with IPCC’s highly confident attribution argument.
The number of retractions has grown tenfold over the past decade. But they still make up no more than 0.2% of the 1.4m papers published annually in scholarly journals. Papers with fundamental flaws often live on. Some may develop a bad reputation among those in the know, who will warn colleagues. But to outsiders they will appear part of the scientific canon.
Blame the ref
The idea that there are a lot of uncorrected flaws in published studies may seem hard to square with the fact that almost all of them will have been through peer-review. In practice it is poor at detecting many types of error.
Fraud is very likely second to incompetence in generating erroneous results, though it is hard to tell for certain. [Surveys found] that 2% of respondents admitted falsifying or fabricating data, but 28% of respondents claimed to know of colleagues who engaged in questionable research practices.
Peer review’s multiple failings would matter less if science’s self-correction mechanism—replication—was in working order.
Harder to clone than you would wish
[R]eplication is hard and thankless. Journals, thirsty for novelty, show little interest in it; though minimum-threshold journals could change this, they have yet to do so in a big way. Most academic researchers would rather spend time on work that is more likely to enhance their careers. This is especially true of junior researchers, who are aware that overzealous replication can be seen as an implicit challenge to authority. Often, only people with an axe to grind pursue replications with vigour—a state of affairs which makes people wary of having their work replicated.
JC comment: The most succint description of the Hockeystick saga that I’ve seen.
There are ways, too, to make replication difficult. Reproducing research done by others often requires access to their original methods and data. Journals’ growing insistence that at least some raw data be made available seems to count for little: a recent review by Dr Ioannidis which showed that only 143 of 351 randomly selected papers published in the world’s 50 leading journals and covered by some data-sharing policy actually complied.
Software can also be a problem for would-be replicators. Some code used to analyse data or run models may be the result of years of work and thus precious intellectual property that gives its possessors an edge in future research. Although most scientists agree in principle that data should be openly available, there is genuine disagreement on software.
Even when the part of the paper devoted to describing the methods used is up to snuff (and often it is not), performing an experiment always entails what sociologists call “tacit knowledge”—craft skills and extemporisations that their possessors take for granted but can pass on only through example. Thus if a replication fails, it could be because the repeaters didn’t quite get these je-ne-sais-quoi bits of the protocol right.
Taken to extremes, this leads to what Dr Collins calls “the experimenter’s regress”—you can say an experiment has truly been replicated only if the replication gets the same result as the original, a conclusion which makes replication pointless. Avoiding this, and agreeing that a replication counts as “the same procedure” even when it gets a different result, requires recognising the role of tacit knowledge and judgment in experiments. Scientists are not comfortable discussing such things at the best of times; in adversarial contexts it gets yet more vexed.
JC comment: I would certainly be interested in Steve McIntyre’s comments on this subsection, seems like it pretty aptly describes his experiences.
Making the paymasters care
Conscious that it and other journals “fail to exert sufficient scrutiny over the results that they publish” in the life sciences, Nature and its sister publications introduced an 18-point checklist for authors this May. The aim is to ensure that all technical and statistical information that is crucial to an experiment’s reproducibility or that might introduce bias is published. The methods sections of papers are being expanded online to cope with the extra detail; and whereas previously only some classes of data had to be deposited online, now all must be.
People who pay for science, though, do not seem seized by a desire for improvement in this area.
In testimony before Congress on March 5th Bruce Alberts, then the editor of Science, outlined what needs to be done to bolster the credibility of the scientific enterprise. Journals must do more to enforce standards. Checklists such as the one introduced by Nature should be adopted widely, to help guard against the most common research errors. Budding scientists must be taught technical skills, including statistics, and must be imbued with scepticism towards their own results and those of others. Researchers ought to be judged on the basis of the quality, not the quantity, of their work. Funding agencies should encourage replications and lower the barriers to reporting serious efforts which failed to reproduce a published result. Information about such failures ought to be attached to the original publications.
And scientists themselves, Dr Alberts insisted, “need to develop a value system where simply moving on from one’s mistakes without publicly acknowledging them severely damages, rather than protects, a scientific reputation.” This will not be easy. But if science is to stay on its tracks, and be worthy of the trust so widely invested in it, it may be necessary.
Concluding remarks from the editorial:
Science still commands enormous—if sometimes bemused—respect. But its privileged status is founded on the capacity to be right most of the time and to correct its mistakes when it gets things wrong. And it is not as if the universe is short of genuine mysteries to keep generations of scientists hard at work. The false trails laid down by shoddy research are an unforgivable barrier to understanding.
There is a huge premium placed on papers published in Science, Nature, PNAS in academic evaluation (promotion, tenure, salary). Publication in these journals focus on novelty and relevance. In the geosciences, there seems to be a disproportionately large number of papers from the planetary sciences (mostly discovery based exploration) and also paleoclimate and analysis of climate model projections, particularly the impacts of projected future climate (relevance to the public debate on climate change). There are many paleoclimate papers published in these papers that include dubious statistical methods and great leaps of logic. The papers analyzing climate model projections tell us nothing about how nature works, at best only about how the models work. If the paleoclimate research involves the geochemical analysis of proxy data, well that is hard work. However, if the papers are merely statistical analyses of proxy data sets or analyses of of the outputs of climate model production runs, well these papers can be knocked off without much effort.
My point is that ambitious young climate scientists are inadvertently being steered in the direction of analyzing climate model simulations, and particularly projections of future climate change impacts – lots of funding in this area, in addition to high likelihood of publication in a high impact journal, and a guarantee of media attention. And the true meaning of this research in terms of our actual understanding of nature rests on the adequacy and fitness for purpose of these climate models.
And why do these scientists think climate models are fit for these purposes? Why, the IPCC has told them so, with very high confidence. The manufactured consensus of the IPCC has arguably set our true understanding of the climate system back at least a decade, in my judgment.
The real hard work of fundamental climate dynamics and development and improvement of paleo proxies is being relatively shunned by climate scientists since the rewards (and certainly the funding) are much lower. The amount of time and funding that has been wasted by using climate models for purposes for which that are unfit, may eventually be judged to be colossal.
And finally, getting back to the ‘verify’ and replication issue, the blogosphere is already playing a hugely important role here, with McIntyre as the original auditor, longstanding contributions from Lucia, and a host of competent new blogospheric auditors that are emerging.
And journals such as PLoS are steering us in the right direction. I hope that the emergence of such journals will diminish the impact of of Nature, Science, PNAS, or at least will torque those journals into a direction that is more fundamentally useful for the true advancement of science.