by Judith Curry
False positives and exaggerated results in peer-reviewed scientific studies have reached epidemic proportions in recent years. – John Ioannidis
Triggered by a comment from Bill regarding the work of John Ioannidis, I dug through my file of draft posts and found this article by Ioannidis in the Scientific American, about a year ago, entitled An Epidemic of False Claims. Subtitle: Competition and conflicts of interest distort too many medical findings
False positives and exaggerated results in peer-reviewed scientific studies have reached epidemic proportions in recent years. The problem is rampant in economics, the social sciences and even the natural sciences, but it is particularly egregious in biomedicine. Many studies that claim some drug or treatment is beneficial have turned out not to be true. Even when effects are genuine, their true magnitude is often smaller than originally claimed.
The problem begins with the public’s rising expectations of science. Being human, scientists are tempted to show that they know more than they do. Research is fragmented, competition is fierce and emphasis is often given to single studies instead of the big picture.
Much research is conducted for reasons other than the pursuit of truth. Conflicts of interest abound, and they influence outcomes. Even for academics, success often hinges on publishing positive findings. The oligopoly of high-impact journals also has a distorting effect on funding, academic careers and market shares. Industry tailors research agendas to suit its needs, which also shapes academic priorities, journal revenue and even public funding.
The crisis should not shake confidence in the scientific method.But scientists need to improve the way they do their research and how they disseminate evidence.
The best way to ensure that test results are verified would be for scientists to register their detailed experimental protocols before starting their research and disclose full results and data when the research is done. At the moment, results are often selectively reported, emphasizing the most exciting among them, and outsiders frequently do not have access to what they need to replicate studies. Journals and funding agencies should strongly encourage full public availability of all data and analytical methods for each published paper. It would help, too, if scientists stated up front the limitations of their data or inherent flaws in their study designs. Likewise, scientists and sponsors should be thorough in disclosing all potential conflicts of interest.
Eventually findings that bear on treatment decisions and policies should come with a disclosure of any uncertainty that surrounds them. It is fully acceptable for patients and physicians to follow a treatment based on information that has, say, only a 1 percent chance of being correct. But we must be realistic about the odds.
Some excerpts from the comments at the Sci Am website:
The second real problem here is the almost complete breakdown in the peer review system. Papers with unfounded claims and unsubstantiated positive results are being passed routinely for publication. The bar has been lowered continuously over the last 25 years and the criteria for publication has been diluted.
Ultimately the peer review system as it stands is on it’s last legs anyway. The future of published science has to be on the web, in a fully open and democratic forum. Critical review has to be opened up to the public and all other scientists, with full data disclosure, instead of the present system where only the well off can afford to buy scientific publications. Publishing was supposed to be the making available of Scientific work to the wider public ! Instead it has become a closed, narrow dissemination of Science to a tiny group of people.
Quality work will stand up to criticism. Shoddy work will quickly flounder. The elitist attitude of the establishment toward such democratising of Science must be jettisoned.
It would also help if reputable non-specialist science publications took a more skeptical attitude towards research. There have been numerous instances in recent years in which the substance of the research did not match the headline statement.
This is barely forgiveable in mass media. In “scientific” publications it should simply not occur. “Some slight evidence that X influences Y” may not grab the attention of “X causes Y” but standards demand truth in headlining!
This is why a Med School professor told my freshman class 50 some years ago: Remember, for every pearl on the ocean floor, there is one ton of whale manure. Way too much of the latter makes it into print.
This issue is also discussed in an earlier article by Ioannidis (h/t Bill): Why most published research findings are false. While the focus is on medical science, there are some deep insights for any ‘hot topic’ field of science:
[C]orollaries about the probability that a research finding is indeed true.
Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. Small sample size means smaller power and, for all functions above, the positive predictive value (PPV) for a true research finding decreases as power decreases towards 1 − β = 0.05 (where β is the type II error rate). Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies, such as randomized controlled trials in cardiology than in scientific fields with small studies, such as most research of molecular predictors.
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Power is also related to the effect size. Thus research findings are more likely true in scientific fields with large effects, such as the impact of smoking on cancer or cardiovascular disease (relative risks 3–20), than in scientific fields where postulated effects are small, such as genetic risk factors for multigenetic diseases (relative risks 1.1–1.5). Modern epidemiology is increasingly obliged to target smaller effect sizes. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. As shown above, the post-study probability that a finding is true (PPV) depends a lot on the pre-study odds (R). Thus, research findings are more likely true in confirmatory designs, such as large phase III randomized controlled trials, or meta-analyses thereof, than in hypothesis-generating experiments.
Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias,u. For several research designs, e.g., randomized controlled trials or meta-analyses, there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes). Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails. Simply abolishing selective publication would not make this problem go away.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research, and typically they are inadequately and sparsely reported. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations.
These corollaries consider each factor separately, but these factors often influence each other. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation.
JC comment: I was struck by a number of things here.
First, the Proteus effect (new terminology for me). We see this a lot in climate science, and I personally experienced this ca. 2005-2007 in the hurricane wars. It seemed like every two weeks a new paper came out, refuting the previous paper, with a back and forth effect that Revkin termed the windshield wiper effect. Each investigator was personally motivated to tout their own latest findings, and each scientist and journalist provided little context for their specific finding re previous findings. Journalists attempted to do this by interviewing people that were likely to be critical of the paper, giving rise to charges of ‘false balance.’
Assessments like the IPCC would seem to be the answer to sorting this all out. But depending on which group of scientists do the assessment, you can get a different result. I discussed this previously on my original hurricane post, in context of the different assessments produced by the IPCC, the WMO group of experts, and the U.S. CCSP that drew different conclusions regarding the past and future impact of global warming on tropical cyclones.
Further, groups such as the IPCC that are conducting assessments are subject to the same issues raised by the 6 corollaries.
So what to do about this situation? There are no simple solutions, but the recommendations made by Ioannidis in the Scientific American article are a good start.