by Judith Curry
“You can say I don’t believe in gravity. But if you step off the cliff you are going down. So we can say I don’t believe climate is changing, but it is based on science.” – Katherine Hayhoe, co-author of the 4th National Climate Assessment Report.
So, should we have the same confidence in the findings of the recently published 4th (U.S.) National Climate Assessment (NCA4) as we do in gravity? How convincing is the NCA4?
The 4th National Climate Assessment (NCA4) is published in two volumes:
- Vol I: Climate Science Special Report
- Vol II: Impacts, Risks, and Adaptation in the United States
I’ve just completed rereading Vol I of the NCA4. There is so much here of concern that it is difficult to know where to start. I have been very critical of the IPCC in the past (but I will certainly admit that the AR5 was a substantial improvement over the AR4). While the NCA4 shares some common problems with the IPCC AR5, the NCA4 makes the IPCC AR5 look like a relative paragon of rationality.
Since the NCA4 is guiding the U.S. federal government in its decision making, not to mention local/state governments and businesses, it is important to point out the problems in the NCA4 Reports and the assessment process, with two objectives:
- provide a more rational assessment of the confidence that should be placed in these findings
- provide motivation and a framework for doing a better job on the next assessment report.
I’m envisioning a number of blog posts on aspects of the NCA4 over the course of the next few months (here’s to hoping that my day job allows for sufficient time to devote to this). A blog post last year Reviewing the Climate Science Special Report crowdsourced error detection on Vol. 1, with many of the comments making good points. What I plan for this series of blog posts is something different than error detection — a focus on framing and fundamental epistemic errors in approach used in the Report.
This first post addresses the issue of overconfidence in the NCA4. I have previously argued that overconfidence is a problem with the IPCC report (see examples from Overconfidence) and the consensus seeking process; however, the overconfidence problem with the NCA4 is much worse.
Example: overconfidence in NCA4
To illustrate the overconfidence problem with the NCA4 Report, consider the following Key Conclusion from Chapter 1 Our Globally Changing Climate:
“Longer-term climate records over past centuries and millennia indicate that average temperatures in recent decades over much of the world have been much higher, and have risen faster during this time period, than at any time in the past 1,700 years or more, the time period for which the global distribution of surface temperatures can be reconstructed. (High confidence)”
This statement really struck me, since it is at odds with the conclusion from the IPCC AR5 WG1 Chapter 5 on paleoclimate:
“For average annual NH temperatures, the period 1983–2012 was very likely the warmest 30-year period of the last 800 years (high confidence) and likely the warmest 30-year period of the last 1400 years (medium confidence).
While my knowledge of paleoclimate is relatively limited, I don’t find the AR5 conclusion to be unreasonable, but it seems rather overconfident with the conclusion regarding the last 1400 years. The NCA4 conclusion, which is stronger than the AR5 conclusion and with greater confidence, made me wonder whether there was some new research that I was unaware of, and whether the authors included young scientists with a new perspective.
Fortunately, the NCA includes a section at the end of each Chapter that provides a traceability analysis for each of the key conclusions:
“Traceable Accounts for each Key Finding: 1) document the process and rationale the authors used in reaching the conclusions in their Key Finding, 2) provide additional information to readers about the quality of the information used, 3) allow traceability to resources and data, and 4) describe the level of likelihood and confidence in the Key Finding. Thus, the Traceable Accounts represent a synthesis of the chapter author team’s judgment of the validity of findings, as determined through evaluation of evidence and agreement in the scientific literature.”
Here is text from the traceability account for the paleoclimate conclusion:
“Description of evidence base. The Key Finding and supporting text summarizes extensive evidence documented in the climate science literature and are similar to statements made in previous national (NCA3) and international assessments. There are many recent studies of the paleoclimate leading to this conclusion including those cited in the report (e.g., Mann et al. 2008; PAGES 2k Consortium 2013).”
“Major uncertainties: Despite the extensive increase in knowledge in the last few decades, there are still many uncertainties in understanding the hemispheric and global changes in climate over Earth’s history, including that of the last few millennia. Additional research efforts in this direction can help reduce those uncertainties.”
“Assessment of confidence based on evidence and agreement, including short description of nature of evidence and level of agreement : There is high confidence for current temperatures to be higher than they have been in at least 1,700 years and perhaps much longer.
I read all this with acute cognitive dissonance. Apart from Steve McIntyre’s takedown of Mann et al. 2008 and PAGES 2K Consortium (for the latest, see PAGES2K: North American Tree Ring Proxies), how can you ‘square’ high confidence with “there are still many uncertainties in understanding the hemispheric and global changes in climate over Earth’s history, including that of the last few millennia”?
Further, Chapter 5 of the AR5 includes 1+ pages on uncertainties in temperature reconstructions for the past 200o years (section 184.108.40.206), a few choice quotes:
“Reconstructing NH, SH or global-mean temperature variations over the last 2000 years remains a challenge due to limitations of spatial sampling, uncertainties in individual proxy records and challenges associated with the statistical methods used to calibrate and integrate multi-proxy information”
“A key finding is that the methods used for many published reconstructions can underestimate the amplitude of the low-frequency variability”
“data are still sparse in the tropics, SH and over the oceans”
“Limitations in proxy data and reconstruction methods suggest that published uncertainties will underestimate the full range of uncertainties of large-scale temperature reconstructions.”
Heck, does all this even justify the AR5’s ‘medium’ confidence level?
I checked the relevant references in the NCA4 Chapter 1; only two (Mann et al., 2008; PAGES 2013), both of which were referenced by the AR5. The one figure from this section was from — you guessed it — Mann et al. (2008).
I next wondered: exactly who were the paleoclimate experts that came up with this stuff? Here is the author list for Chapter 1:
Wuebbles, D.J., D.R. Easterling, K. Hayhoe, T. Knutson, R.E. Kopp, J.P. Kossin, K.E. Kunkel, A.N. LeGrande, C. Mears, W.V. Sweet, P.C. Taylor, R.S. Vose, and M.F. Wehner
I am fairly familiar with half of these scientists (a few of them I have a great deal of respect for), somewhat familiar with another 25%, and unfamiliar with the rest. I looked these up to see which of them were the paleoclimate experts. There are only two authors (Kopp and LeGrande) that appear to have any expertise in paleoclimate, albeit on topics that don’t directly relate to the Key Finding. This is in contrast to an entire chapter in the IPCC AR5 being devoted to paleoclimate, with substantial expertise among the authors.
A pretty big lapse, not having an expert on your author team related to one of 6 key findings. This isn’t to say that a non-expert can’t do a good job of assessing this topic with a sufficient level of effort. However the level of effort here didn’t seem to extend to reading the IPCC AR5 Chapter 5, particularly section 220.127.116.11.
Why wasn’t this caught by the reviewers? The NCA4 advertises an extensive in house and external review process, including the National Academies.
I took some heat for my Report On Sea Level Rise and Climate Change, since it had only a single author and wasn’t peer reviewed. Well, the NCA provides a good example of how multiple authors and peer review is no panacea for providing a useful assessment report.
And finally, does this issue related to whether current temperatures were warmer than the medieval warm period really matter? Well yes, it is very important in context of detection and attribution arguments (which will be the subject of forthcoming posts).
This is but one example of overconfidence in the NCA4. What is going on here?
Confidence guidance in the NCA4
Exactly what does the NCA4 mean by ‘high confidence’? The confidence assessment used in the NCA4 is essentially the same as that used in the IPCC AR5. From the NCA4:
“Confidence in the validity of a finding based on the type, amount, quality, strength, and consistency of evidence (such as mechanistic understanding, theory, data, models, and expert judgment); the skill, range, and consistency of model projections; and the degree of agreement within the body of literature.”
“Assessments of confidence in the Key Findings are based on the expert judgment of the author team. Confidence should not be interpreted probabilistically, as it is distinct from statistical likelihood. “
These descriptions for each confidence category don’t make sense to me; the words ‘low’, ‘medium’ etc. seem at odds with the descriptions of the categories. Also, I thought I recalled a ‘very low’ confidence category from the IPCC AR5 (which is correct link). The AR5 uncertainty guidance doesn’t give verbal descriptions of the confidence categories, although it does include the following figure:
The concept of ‘robust evidence’ will be considered in a subsequent post; this is not at all straightforward to assess.
The uncertainty guidance for the AR4 provides some insight into what is actually meant by these different confidence categories, although this quantitative specification was dropped for the AR5:
Well this table is certainly counterintuitive to my understanding of confidence. If someone told me that their conclusion had 1 or 2 chances out of 10 of being correct, I would have no confidence in that conclusion, and wonder why we are even talking about ‘confidence’ in this situation. ‘Medium confidence’ implies a conclusion that is ‘as likely as not;’ why have any confidence in this category of conclusions, when an opposing conclusion is equally likely to be correct?
Given the somewhat flaky guidance from the IPCC regarding confidence, the NCA4 confidence descriptions are a step in the right direction regarding clarity, but the categories defy the words used to describe them. For example:
- ‘High confidence’ is described as ‘Moderate evidence, medium consensus.’ The words ‘moderate’ and ‘medium’ sound like ‘medium confidence’ to me.
- ‘Medium confidence’ is described as ‘Suggestive evidence (a few sources, limited consistency, models incomplete, methods emerging); competing schools of thought.’ Sounds like ‘low confidence’ to me.
- ‘Low confidence’ is described as inconclusive evidence, disagreement or lack of opinions among experts. Sounds like ‘no confidence’ to me.
- ‘Very high confidence’ should be reserved for evidence where there is very little chance of the conclusion being reversed or whittled down by future research; findings that have stood the test of time and a number of different challenges.
As pointed out by Risbey and Kandlikar (2007), it is very difficult (and perhaps not very meaningful) to disentangle confidence from likelihood when the confidence level is medium or low.
Who exactly is the audience for these confidence levels? Well, other scientists, policy makers and the public. Such misleading terminology contributes to misleading overconfidence in the conclusions — apart from the issue of the actual judgments that go into assigning a confidence level to one of these categories.
Analyses of the overconfidence problem
While I have written previously on the topic of overconfidence, it is good to be reminded and there are some insightful new articles to consider.
Cassam (2017) Overconfidence is an epistemic vice. Excerpts (rearranged and edited without quote marks):
‘Overconfidence’ can be used to refer to positive illusions or to excessive certainty. The former is the tendency to have positive illusions about our merits relative to others. The latter describes the tendency we have to believe that our knowledge is more certain that it really is. Overconfidence can cause arrogance, and the reverse may also be true. Overconfidence and arrogance are in a symbiotic relationship even if they are distinct mental properties.
Cassam distinguishes four types of overconfidence:
- Personal explanations attribute error to the personal qualities of individuals or groups of individuals. Carelessness, gullibility, closed-mindedness, dogmatism, and prejudice and wishful thinking are examples of such qualities. These qualities are epistemic vices.
- Sub-personal explanations attribute error to the automatic, involuntary, and non-conscious operation of hard-wired cognitive mechanisms. These explanations are mechanistic in a way that personal explanations are not, and the mechanisms are universal rather than person-specific.
- Situational explanations attribute error to contingent situational factors such as time pressure, distraction, overwork or fatigue.
- Systemic explanations attribute error to organizational or systemic factors such as lack of resources, poor training, or professional culture.
To the extent that overconfidence is an epistemic vice that is encouraged by the professional culture, it might be described as a ‘professional vice’.
Apart from the epistemic vices of individual climate scientists (activism seems to the best predictor of such vices), my main concern is the systematic biases introduced by the IPCC and NCA assessment processes – systemic ‘professional vice’.
Thomas Kelly explains how such a systematic vice can work, which was summarized in my 2011 paper Reasoning about Climate Uncertainty:
Kelly (2008) argues that “a belief held at earlier times can skew the total evidence that is available at later times, via characteristic biasing mechanisms, in a direction that is favorable to itself.” Kelly (2008) also finds that “All else being equal, individuals tend to be significantly better at detecting fallacies when the fallacy occurs in an argument for a conclusion which they disbelieve, than when the same fallacy occurs in an argument for a conclusion which they believe.” Kelly (2005) provides insights into the consensus building process: “As more and more peers weigh in on a given issue, the proportion of the total evidence which consists of higher order psychological evidence [of what other people believe] increases, and the proportion of the total evidence which consists of first order evidence decreases . . . At some point, when the number of peers grows large enough, the higher order psychological evidence will swamp the first order evidence into virtual insignificance.” Kelly (2005) concludes: “Over time, this invisible hand process tends to bestow a certain competitive advantage to our prior beliefs with respect to confirmation and disconfirmation. . . In deciding what level of confidence is appropriate, we should taken into account the tendency of beliefs to serve as agents in their own confirmation. Kelly refers to this phenomenon as ‘upward epistemic push.’
The Key Finding regarding paleo temperatures described above is an example of upward epistemic push: the existence of a ‘consensus’ on this issue resulted in ignoring most of the relevant first order evidence (i.e. publications), combined with an apparent systemic desire to increase confidence relative to the NCA3 conclusion.
Walters et al. (2016) argues that overconfidence is driven by the neglect of unknowns. Overconfidence is also driven by biased processing of known evidence in favor of a focal hypothesis (similar to Kelly’s argument). Overconfidence is also attributed to motivated reasoning and protecting one’s self image from failure and regret (political agenda and careerism).
Kahneman (2011) refers to as the ‘What You See is All There Is’ (WYSIATI) principle, in context on focusing on known relative to unknown information.
I would say that all of the above are major contributors to systemic overconfidence related to climate change.
Solutions to overconfidence
I have written multiple blog posts previously on strategies for addressing overconfidence, including:
- Red Teams
- The method of multiple working hypotheses
- Cognitive bias – how petroleum scientists deal with it
- I know I’m right(?) – the best cure for overprecision is continual challenges of “How could I be wrong?”
- Certainly not! – on cultivating doubt and finding pleasure in mystery.
- Italian Flag – three-valued logic that explicitly includes unknowns.
From Kelly (2005):
“It is sometimes suggested that how confident a scientist is justified in being that a given hypothesis is true depends, not only on the character of relevant data to which she has been exposed, but also on the space of alternative hypotheses of which she is aware. According to this line of thought, how strongly a given collection of data supports a hypothesis is not wholly determined by the content of the data and the hypothesis. Rather, it also depends upon whether there are other plausible competing hypotheses in the field. It is because of this that the mere articulation of a plausible alternative hypothesis can dramatically reduce how likely the original hypothesis is on the available data.”
From Walters (2016):
“Overconfidence can be reduced by prompting people to ‘consider the alternative’ or by designating a member of a decision-making team to advocate for the alternative (‘devil’s advocate technique’).”
“Our studies show that the evaluation of what evidence is unknown or missing is an important determinant of judged confidence. However, people tend to underappreciate what they don’t know. Thus, overconfidence is driven in part by insufficient consideration of unknown evidence.”
“We conceptualize known unknowns as evidence relevant to a probability assessment that a judge is aware that he or she is missing while making the assessment. We distinguish this from unknown unknowns, evidence that a judge is not aware he or she is missing. It is useful at this point to further distinguish two varieties of unknown unknowns. In some cases a judge may be unaware that he or she is missing evidence but could potentially recognize that this evidence is missing if prompted. We refer to these as retrievable unknowns. In other cases, a judge is unaware that he or she is missing evidence and furthermore would need to be educated about the relevance of that evidence in order to recognize it as missing. We refer to these as unretrievable unknowns.”
“Considering the unknowns may also be more effective than considering the alternative in judgment tasks where no obvious alternative exists. A hybrid strategy of considering both the unknowns and the alternative may be more effective than either strategy alone.”
Nearly everyone is overconfident. See these previous articles:
- The lure of incredible certitude
- We are all confident idiots.
- It pays to be overconfident, even if you have no idea what you are doing
The issue here is overconfidence of scientists and ‘systemic vice’ about policy-relevant science, where the overconfidence harms both the scientific and decision making processes.
I don’t regard myself as overconfident with regards to climate science; in fact some have accused me of being underconfident. My experience in owning a company that makes weather and climate predictions (whose skill is regularly evaluated) has been extremely humbling in this regard. Further, I study and read the literature from philosophy of science, risk management, social psychology and law regarding uncertainty, evidence, judgement, confidence, argumentation.
The most disturbing point here is that overconfidence seems to ‘pay’ in terms of influence of an individual in political debates about science. There doesn’t seem to be much downside for the individuals/groups to eventually being proven wrong. So scientific overconfidence seems to be a victimless crime, with the only ‘victim’ being science itself and then the public who has to live with inappropriate decisions based on this overconfident information
So what are the implications of all this for understanding overconfidence in the IPCC and particularly the NCA? Cognitive biases in the context of an institutionalized consensus building process have arguably resulted in the consensus becoming increasingly confirmed in a self-reinforcing way, with ever growing confidence. The ‘merchants of doubt’ meme has motivated activist scientists (as well as the institutions that support and assess climate science) to downplay uncertainty and overhype confidence in the interests of motivating action on mitigation.
There are numerous strategies that have been studied and employed to help avoid overconfidence in scientific judgments. However, the IPCC and particularly the NCA introduces systemic bias through the assessment process, including consensus seeking.
As a community, we need to do better — a LOT better. The IPCC actually reflects on these issues in terms of carefully considering uncertainty guidance and selection of a relatively diverse group of authors, although the core problems still remain. The NCA appears not to reflect on any of this, resulting in a document with poorly justified and overconfident conclusions.
Climate change is a very serious issue — depending on your perspective, there will be much future loss and damage from either climate change itself or from the policies designed to prevent climate change. Not only do we need to think harder and more carefully about this, but we need to think better, with better ways justifying our arguments and assessing uncertainty, confidence and ignorance.
Sub-personal biases are unavoidable, although as scientists we should work hard to be aware and try to overcome these biases. Multiple scientists with different perspectives can be a big help, but it doesn’t help if you assign a group of ‘pals’ to do the assessment. The issue of systemic bias introduced by institutional constraints and guidelines is of greatest concern.
The task of synthesis and assessment is an important one, and it requires some different skills than a researcher pursuing a narrow research problem. First and foremost, the assessors need to do their homework and read tons of papers, consider multiple perspectives, understand sources of and reasons for disagreement, play ‘devils advocate’, and ask ‘how could we be wrong?’
Instead, what we see in at least some of the sections of the NCA4 is bootstrapping on previous assessments and then inflating the confidence without justification.
More to come, stay tuned.
Moderation note: this is a technical thread, and I am requesting that comments focus on
- the general overconfidence issue
- additional examples (with documentation) of unjustified, overconfident conclusions (e.g. relative to the AR5)
I am focusing on Vol 1 here, since Vol 2 is contingent on the conclusions from Vol 1. General comments about the NCA4 can be made on the week in review or new year thread. Thanks in advance for your comments.