by Judith Curry
although very few researchers will go as far as to make up their own data, many will “torture the data until they confess”, and forget to mention that the results were obtained by torture….
The social psychology community is reeling from this news, discussed at the blog Ignorance and Uncertainty:
Tilburg University sacked high-profile social psychologist Diederik Stapel, after he was outed as having faked data in his research. Stapel was director of the Tilburg Institute for Behavioral Economics Research, a successful researcher and fundraiser, and as a colleague expressed it, “the poster boy of Dutch social psychology.” He had more than 100 papers published, some in the flagship journals not just of psychology but of science generally (e.g., Science), and won prestigious awards for his research on social cognition and stereotyping.
Tilburg University Rector Philip Eijlander said that Stapel had admitted to using faked data, apparently after Eijlander confronted him with allegations by graduate student research assistants that his research conduct was fraudulent. The story goes that the assistants had identified evidence of data entry by Stapel via copy-and-paste.
Robert Smithson raises some interesting issues and questions. Regarding means and opportunity:
Let me speak to means and opportunity first. Attempts to more strictly regulate the conduct of scientific research are very unlikely to prevent data fakery, for the simple reason that it’s extremely easy to do in a manner that is extraordinarily difficult to detect. Many of us “fake data” on a regular basis when we run simulations. Indeed, simulating from the posterior distribution is part and parcel of Bayesian statistical inference. It would be (and probably has been) child’s play to add fake cases to one’s data by simulating from the posterior and then jittering them randomly to ensure that the false cases look like real data. Or, if you want to fake data from scratch, there is plenty of freely available code for randomly generating multivariate data with user-chosen probability distributions, means, standard deviations, and correlational structure. So, the means and opportunities are on hand for virtually all of us. They are the very same means that underpin a great deal of (honest) research. It is impossible to prevent data fraud by these means through conventional regulatory mechanisms.
Cognitive psychologist E,J, Wagenmakers (as quoted in Andrew Gelman’s thoughtful recent post) is among the few thus far who have addressed possible motivating factors inherent in the present-day research climate. He points out that social psychology has become very competitive, and
“high-impact publications are only possible for results that are really surprising. Unfortunately, most surprising hypotheses are wrong. That is, unless you test them against data you’ve created yourself. There is a slippery slope here though; although very few researchers will go as far as to make up their own data, many will “torture the data until they confess”, and forget to mention that the results were obtained by torture….”
I would add to E.J.’s observations the following points.
First, social psychology journals exhibit a strong bias towards publishing only studies that have achieved a statistically significant result. This bias is widely believed in by researchers and their students. The obvious temptation arising from this is to ease an inconclusive finding into being conclusive by adding more “favorable” cases or making some of the unfavorable ones more favorable.
Second, the addiction in psychology to hypothesis-testing over parameter estimation amounts to an insistence that every study yield a conclusion or decision: Did the null hypothesis get rejected? The obvious remedy for this is to develop a publication climate that does not insist that each and every study be “conclusive,” but instead emphasizes the importance of a cumulative science built on multiple independent studies, careful parameter estimates and multiple tests of candidate theories. This adds an ethical and motivational rationale to calls for a shift to Bayesian methods in psychology.
Third, journal editors and reviewers routinely insist on more than one study to an article. On the surface, this looks like what I’ve just asked for, a healthy insistence on independent replication. It isn’t, for two reasons. First, even if the multiple studies are replications, they aren’t independent because they come from the same authors and/or lab. Genuinely independent replicated studies would be published in separate papers written by non-overlapping sets of authors from separate labs. However, genuinely independent replication earns no kudos and therefore is uncommon.
The second reason is that journal editors don’t merely insist on study replications, they also favor studies that come up with “consistent” rather than “inconsistent” findings (i.e., privileging “successful” replications over “failed” replications). By insisting on multiple studies that reproduce the original findings, journal editors are tempting researchers into corner-cutting or outright fraud in the name of ensuring that that first study’s findings actually get replicated. E.J.’s observation that surprising hypotheses are unlikely to be supported by data goes double (squared, actually) when it comes to replication—Support for a surprising hypothesis may occur once in a while, but it is unlikely to occur twice in a row. Again, remedies are obvious: Develop a publication climate which encourages or even insists on independent replication, that treats well-conducted “failed” replications identically to well-conducted “successful” ones, and which does not privilege “replications” from the same authors or lab of the original study.
Most researchers face the pressures and motivations described above, but few cheat. So personality factors may also exert an influence, along with circumstances specific to those of us who give in to the temptations of cheating. Nevertheless if we want to prevent more Stapels, we’ll get farther by changing the research culture and its motivational effects than we will by exhorting researchers to be good or lecturing them about ethical principles of which they’re already well aware. And we’ll get much farther than we would in a futile attempt to place the collection and entry of every single datum under surveillance by some Stasi-for-scientists.
JC comments: I find this case and Smithson’s comments interesting and of relevance to climate science for several reasons. The research culture and motivational factors in the field of social psychology have arguably contributed to rewarding behaviors that are not in the best interests of scientific progress, in the same way that I have argued that the IPCC and the culture of funding, journal publication, and recognition by professional societies have not always acted in the best interests of scientific progress in climate field.
I was particularly struck by the “data torturing” concept. Consider a chemistry experiment conducted in a controlled laboratory environment, whereby the raw data is used for the analysis, with fairly clear procedures for determining the uncertainty of the measurement. Testing hypotheses using climate data is much more challenging from the perspective of the actual data. In climate science, uncertainty associated with observations can arise from systematic and random instrumental errors, inherent randomness, and errors in analysis of the space-time variations associated with inadequate sampling. Applications of climate data in hypothesis testing may require either the elimination of a trend or high frequency “noise.” Hence for any substantive application, climate data needs to be “tortured” some way, in the sense of applying some sort of manipulations to the data and making some assumptions. The problem occurs when the data is “tortured” to produce a desired “confession.” Seemingly objective manipulations of the data can inadvertently produce “confessions” beyond what is objectively obtained in the original data set.
Documented manipulations of the data can be reproduced if the data and metadata are available, and sufficient information (preferably code) is provided so that independent groups can evaluate the objectivity and technical implementation of the method used in the analysis. It is because of complexity of the climate system and the inherent inadequacy of any measurement system that complex data manipulation methods are used. It is essential that we better understand the limitations of the methods and how to assess the uncertainty that they introduce into the analysis.