by Judith Curry
A behavioral view of overconfidence.
A very interesting paper was just published in a journal of the American Statistical Society called Significance [link to abstract]:
I know I’m right! A behavioral view of over confidence
Albert Mannes and Don Moore
Abstract. Statistics is all about uncertainty. Why, then, are so few of us uncertain enough? Being far too certain is a near-universal trait: the consequences have sometimes been catastrophic. Albert Mannes and Don Moore outline the ways humans are overconfident in their judgements – and why so many of us think that we can finish decking the patio on time.
Good judgement is surely good for society; and persistent overconfidence surely indicates poor judgement. Behavioural research tends to focus on three forms of overconfidence that occur with some frequency in modern life: overestimation, overplacement, and overprecision.
A hallmark of good judgement is that the assessments a person makes about probabilistic events are well calibrated to the long-run frequency of those events. For example, a meteorologist who claims that there is an 80% chance of rain today is well calibrated if on average it rains on 8 of the 10 days he or she makes this pronouncement.
Being well calibrated – that is, having judgement that on average is neither underconfident nor overconfident – means that the confidence expressed in corresponds closely with the frequency with which the respondent is actually correct.
The typical research finding, however, is that people’s confidence exceeds their actual performance. For example, the actual number of correct answers for judgements expressed with 70% confidence is less than 7 out of 10. This form of overconfidence is naturally called overestimation.
Overprecision refers to “our excessive confidence in what we believe we know, and our apparent inability to acknowledge the full extent of our ignorance and the uncertainty of the world we live in”. To be overprecise is to underestimate the degree to which one’s judgement may err. Subjective beliefs about accuracy are too sharp relative to true accuracy. We believe we are close or spot on far more often than we actually are.
JC comment: ’overprecision’ is the main criticism that I have had of the IPCC’s particular brand of overconfidence; overprecision also leads us to consider the white area of the Italian Flag.
The costs of being wrong are often asymmetric. Showing up early for a flight is less costly than showing up late and missing it entirely, so uncertainty about the travel time should lead people to depart earlier for the airport. We used this principle to demonstrate overprecision in a laboratory setting. We asked the participants in our studies to estimate the temperatures in the city where they live over several historical dates under three pay-off conditions. (Each person made guesses under all three conditions.) In the first condition, participants were paid a fixed amount (in the form of lottery tickets towards a prize) if their guesses were within a specified margin of error, either over or under the correct answer.
Their answers here, in which the costs of being wrong were symmetric, allowed them (and us) to gauge their knowledge in this domain. In the second condition, they were rewarded only for correctly guessing the temperature or overestimating it within a margin of error. And in the third condition, they were rewarded only for correctly guessing the temperature or underestimating it within a margin of error. For all estimates, participants received trial-by-trial feedback on their errors.
As expected, people biased their estimates in the appropriate direction given their payoffs – when rewarded for overestimation, they adjusted their estimates upward, and when rewarded for underestimation, adjusted them downward. But their adjustments in these conditions were systematically insufficient: given their actual knowledge, participants would have earned significantly more had they made larger adjustments, which would have reduced the frequency of over- or underestimation when those errors had no pay-off.
JC comment: hmmmm . . . does a framework of the precautionary principle bias estimates?
They acted (in their adjustments) as if their knowledge was more precise than it actually was; metaphorically speaking, they consistently missed their flights. Note that their insufficient adjustments could not simply be explained as being anchored on their best guess. Instead, the levels of overprecision for this task were positively correlated with participants’ overprecision using traditional 90% confidence intervals and with their expressed confidence in their knowledge of the domain.
In social situations, there may be a market for overconfidence. Experimental participants in one study were more likely to purchase advice when sellers expressed more confidence in their judgement, holding accuracy constant. As a result, as the sellers of advice competed with each other, their judgements were expressed with increasing confidence over time without becoming more accurate – they were rewarded for being overconfident.
JC comment: there is an implicit expectation that the IPCC’s confidence level will increase with each assessment report. Hence the ‘leaked’ 95% confidence level for attribution from the forthcoming AR5 report, in spite of reduced accuracy of the climate models relative to the last 15+ years of observations and apparent lowering of the climate sensitivity bound to 1.5C.
Overconfidence has proven remarkably resistant to debiasing – perhaps because it does have value for people in certain situations. Nevertheless, the factors that contribute to overconfidence suggest some ways to become better-calibrated judges of our knowledge.
First, accurate and timely feedback improves calibration. Despite reputations to the contrary, meteorologists are quite well calibrated, no doubt in part because they receive regular information about the quality of their forecasts.
JC comment: unfortunately the same is not true of climate modelers. They only receive useful feedback on decadal time scales, but even with this feedback, they don’t seem to see the need for recalibration. The following seems to explain why.
In our research on overprecision, we manipulated the feedback we gave to participants to examine its impact on overconfidence. As expected, the set of participants receiving no feedback about their knowledge of local temperatures expressed the most confidence in their expertise and were the most overprecise in the main still displayed the bias. Only when we provided a third set of participants with feedback that exaggerated their errors by 2.5 times were we able to eliminate overprecision.
Another technique for reducing overprecision is to call a person’s attention to outcomes or hypotheses they would otherwise neglect. That is, asking people to focus not only on their best guess of the likely outcome but also on neighbouring outcomes reduces the precision of people’s subjective beliefs. This can be done, for instance, by providing people with a liberal number of discrete outcomes (or ranges for continuous outcomes) and asking them to assign probabilities to each. (Had the designer of the Titanic been asked to quantify the probability of striking an iceberg, might he have added more watertight bulkheads?) A related technique calls for “unpacking” a distant future into a set of more proximal futures. For example, in a recent study experimental participants were asked to provide 90% confidence intervals for the closing price of three financial instruments either (a) 3 months into the future, or (b) 1, 2, and 3 months into the future. On average, the widths of the confidence intervals were 33% greater for the 3-month prices in the latter condition, in which the future was unpacked into intermediate periods.
JC comment: This is important, it speaks to broadening the range of possible scenarios and a scenario falsification approach. I’ve made the point numerous times is that our current approaches are neglecting the possibility of abrupt climate change (whether caused naturally, anthropogenically, or as a combination. It further emphasizes the need to understand shorter time scales before having confidence in predictions on longer time scales.
At the very least it is important for decision-makers to be aware that people are prone to overconfidence, and that to assume one is not is to unwittingly fall prey to the bias. Most of us can improve the calibration of our judgements by simply considering the question “How could I be wrong?”
JC comment: Asking “How could I be wrong?” sums up perfectly the remedy for overconfidence.
Matt Briggs has a related post entitled Too Damn Sure: The Epidemiologist Fallacy. Punchline:
The epidemiologist fallacy occurs when an epidemiologist says or implies X causes Y, but when the epidemiologist never actually meets, measures, or monitors X, though everybody pretends he has.
Perspective of a weather forecaster
Weather forecasters are forced by virtue of their job to continually calibrate their forecasts. While my operational experience as a forecaster is not extensive, by virtue of owing a weather forecasting company (CFAN), I am extremely experienced in evaluating and calibrating weather forecasts. I would argue this exercise in forecast calibration improves calibration across other judgment domains.
What weather forecasting calibration does NOT help with is with the ‘overprecision’ issue, where you need to consider what is truly unknown and outside of your experience. This is really much more difficult. The IPCC pretty much guarantees overprecision by its consensus judgment approach that considers evidence for and against, without explicitly considering the space for which you have have no evidence, i.e. the known and unknown unknowns.
I agree that the the best cure for overprecision is continual challenges of “How could I be wrong?” Anyone who presents such a challenge is quickly labeled as a denier. The IPCC needs to figure out some what to cure its overprecision disease.