by Judith Curry
The focus of this series on detection and attribution is the following statement in the IPCC AR4:
“Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.”
Part I addressed the IPCC’s detection strategy and raised issues regarding the IPCC’s inferences about the relative importance of the multi-decadal modes of natural internal variability (e.g. AMO, PDO). Part II addressed uncertainties in external forcing data sets used in the attribution studies and the relevant climate model structural uncertainties. Part III addresses deficiencies in the overall logic of the IPCC’s attribution argument.
Summary. Consilience of evidence, degree of consistency, and expert judgment, spiced with some Bayesian reasoning, are at the heart of IPCC’s judgment regarding the confidence level of its detection and attribution statement. To attempt to find a justification for their confidence level, I formulate the apparent underlying argument for their detection and attribution statement. The uncertainty of each of the premises in the argument is assessed. Different logics for drawing conclusions from these premises are considered. Expert judgment based on a consensus approach seems to be the only way to arrive at such a high confidence level. It is concluded that the IPCC needs logic police, in addition to uncertainty cops and statistics sentries. Some grounds for cautious optimism in the AR5 regarding its detection and attribution assessment is presented.
Reasoning about causality
At the heart of the IPCC’s attribution argument is causality, which is the relationship between and event (the cause) and a second event (the effect), whereby the second event is a consequence of the first event. Interpreting causation as a deterministic relation means that if A causes B, then A must always be followed by B. Probabalistic causation means that A probabilistically causes B if A’s occurrence increases the probability of B. This can reflect imperfect knowledge of a deterministic system or mean that the causal system under study has an inherently chancy nature. Causal calculus infers probabilities from conditional probabilities in causal Bayesian Networks with unmeasured variables, whereby a Bayesian Network represents a set of variables and their conditional dependences. Causal calculus enables characterization of confounding variables, which are variables that can be adjusted to yield the correct causal effect between variables of interest.
I am no expert in logic. My only formal exposure was a course in freshman logic nearly 40 years ago. In the past few years, I’ve wandered randomly through the Wikipedia and the Stanford Encyclopedia of Philosophy. I’ve read a few journal articles on the philosophy of science. I can understand most Bayesian arguments that I’ve encountered, although I’ve never attempted to make one on my own (I am a coauthor on a paper that does). My point is that I think there are some glaring logical errors in the IPCC’s detection and attribution argument, that it doesn’t take an expert in logic to identify. I look forward to the input from logicians, bayesians, and lawyers in terms of assessing the IPCC’s argument.
The IPCC’s strategy for assigning a confidence level
The IPCC’s conclusion on detection and attribution is reached using probabilistic causation and counterfactual reasoning, whereby an ensemble of simulations are used to evaluate agreement between observations and forcing for simulations conducted with and without anthropogenic forcing.
Formal Bayesian reasoning is used to some extent by the IPCC in analyzing detection and attribution. The reasoning process used by the IPCC in assessing confidence in its attribution statement is described by this statement from the AR4:
“The approaches used in detection and attribution research described above cannot fully account for all uncertainties, and thus ultimately expert judgement is required to give a calibrated assessment of whether a specific cause is responsible for a given climate change. The assessment approach used in this chapter is to consider results from multiple studies using a variety of observational data sets, models, forcings and analysis techniques. The assessment based on these results typically takes into account the number of studies, the extent to which there is consensus among studies on the significance of detection results, the extent to which there is consensus on the consistency between the observed change and the change expected from forcing, the degree of consistency with other types of evidence, the extent to which known uncertainties are accounted for in and between studies, and whether there might be other physically plausible explanations for the given climate change. Having determined a particular likelihood assessment, this was then further downweighted to take into account any remaining uncertainties, such as, for example, structural uncertainties or a limited exploration of possible forcing histories of uncertain forcings. The overall assessment also considers whether several independent lines of evidence strengthen a result.” (IPCC AR4)
From this statement, I infer that their objective analysis produced a very high level of confidence based upon multiple lines of evidence (which is referred to as a consilience of evidence), which they then downweighted to account for remaining uncertainties.
The underlying detection and attribution argument
As far as I can tell there has been relatively little discussion of the logic underlying the IPCC’s detection and attribution. Richard Lindzen has stated that he is not impressed by the IPCC’s logic:
“However, with global warming the line of argument is even sillier. It generally amounts to something like if A kicked up some dirt, leaving an indentation in the ground into which a rock fell and B tripped on this rock and bumped into C who was carrying a carton of eggs which fell and broke, then if some broken eggs were found it showed that A had kicked up some dirt.”
Consider the following argument that I think must underlie the IPCC’s assessment of attribution and their high confidence in this assessment. Uncertainty in each of the premises is characterized qualitatively by the Italian flag analysis described in Doubt, whereby evidence for a hypothesis is represented as green, evidence against is represented as red, and the white area reflecting uncommitted belief that can be associated with uncertainty in evidence or unknowns.
Here is the argument:
1. Historical surface temperature observations over the 20th century show a clear signal of increasing surface temperatures. Italian flag: Green 70%, White 30%, Red 0%. (Note: nobody is claiming that the temperatures have NOT increased.)
2. Climate models are fit for the purpose of accurately simulating forced and internal climate variability on the time scale of a century. This implies an accurate sensitivity to external forcing and an accurate simulation of the statistics of natural internal modes of variability on multi-decadal time scales. Italian flag: Green 40%, White 50%, Red 10%. (Note: the biggest issue is climate sensitivity, with a secondary issue being the magnitude of modes of natural internal variability on multi-decadal time scales, and tertiary issues associated model inadequacies in dealing with aerosol-cloud processes and solar indirect effects.)
3. Time series data to force climate models are available and adequate for the required forcing input: long lived greenhouse gases, solar fluxes, volcanic aerosols, anthropogenic aerosols, etc. Italian flag; Green:30%, White 60%, Red 10%. (Note: the biggest uncertainties having the greatest impact are solar forcing and aerosol forcing, with red associated with alternate interpretations of solar forcing.)
4. Natural internal variability is small relative to forced variability and nonlinear interactions between forcings and responses are small; hence 20thcentury climate variability is explained by external forcing. Italian flag: Green 25%, White 50%, Red 25%. (Note: the combination of natural internal variability plus solar variability remains a plausible explanation for most of the 20th century variability; the issue of nonlinear interactions between forcings and responses has not been convincingly explored.)
5. Global climate model simulations that include anthropogenic forcing (greenhouse gases and pollution aerosol) provide better agreement with historical observations in the second half of the 20th century than do simulations with only natural forcing (solar and volcanoes). Italian flag analysis: 30% Green, 50% White, 20% Red (JC Note: all climate models produce this result in spite of different sensitivities and using different forcing data sets; the models do not agree on the causes of the early 20th century warming and the mid-century cooling and do not reproduce the mid-century cooling.)
6. Confidence in premises 1-5 is enhanced by the agreement between the simulations and observations of the 20th century surface temperature.
7. Thus: Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.
Question A: Is my characterization of the argument correct?
Question B: Is my assignment of the Italian flag % values correct? Assignment of % values in the Italian flag analysis is necessarily subjective since the size of the “white” area is by definition unknown. Other assignments of % would be plausible, but my assignment is not inconsistent with uncertainties stated by the IPCC itself as well as my analysis on previous threads.
Question C: Assuming that the answers to A and B are “yes”, how should we assess confidence in the conclusion (#7) based upon the 6 premises?
It seems that different logics could be applied here to assess the level of confidence in the conclusion. I will provide a few simple examples of reasoning to compare (I know I will need help here from the logicians, bayesians, and lawyers in the group, but this is a start).
I. Reasoning from contingency.
Assume that the conclusion (#7) is contingent on each of premises #2-#5. It seems that the confidence in the conclusion should not exceed the values of green % for each of the premises. Therefore the confidence in #7 should not exceed 40%, and possibly not exceed 25%, which is the premise with the lowest confidence level. With a confidence level in the range 25-40%, IPCC’s conclusion would be plausible (not “very likely”)
II. Reasoning from consensus
Decide what is really important here. Ignore the white part of the Italian flag, and just consider the green and the red; this means that each of the premises has green 50% or higher. Focus on premise #5, which is the punchline, and focus only on the period since 1970, where the models agree (ignoring the model disagreement for attribution causes in the earlier part of the 20th century). This puts you in 90% very likely confidence territory
III. Reasoning from a consilience of evidence
This is a case based on circumstantial evidence, that is often bolstered by Bayesian reasoning. A large number of independent lines of evidence increase the confidence. Problems with this kind of reasoning are highlighted by Nullius in Verba. Argument justification would require that you conduct the same analysis for for the opposing argument (i.e. natural variability).
IV. I’m sure there are others? Can anyone do a formal Bayesian analysis of this?
My own confidence assessment (provided on the Doubt thread in the context of the Italian flag) was derived from contingent reasoning.
Circularity in the argument
Apart from the issue of the actual logic used for reasoning, there is circularity in the argument that is endemic to whatever reasoning logic is used. Circular reasoning is a logical fallacy whereby the proposition to be proved is assumed in one of the premises.
The most serious circularity enters into the determination of the forcing data. Given the large uncertainties in forcings and model inadequacies (including a factor of 2 difference in CO2 sensitivity), how is it that each model does a credible job of tracking the 20th century global surface temperature anomalies (AR4 Figure 9.5)? This agreement is accomplished through each modeling group selecting the forcing data set that produces the best agreement with observations, along with model kludges that include adjusting the aerosol forcing to produce good agreement with the surface temperature observations. If a model’s sensitivity is high, it is likely to require greater aerosol forcing to counter the greenhouse warming, and vice versa for a low model sensitivity. The proposition to be proved (#7) is assumed in premise #3 by virtue of kludging of the model parameters and the aerosol forcing to agree with the 20th century observations of surface temperature. Any climate models that uses inverse modeling to determine any aspect of the forcing substantially weakens the attribution argument owing to the introduction of circular reasoning.
Now consider premise #6. The striking consistency between the time series of observed average global temperature observations and simulated values with both natural and anthropogenic forcing (Figure 9.5) was instrumental in convincing me (and presumably others) of the IPCC’s attribution argument. The high confidence level ascribed by the IPCC provides bootstrapped plausibility to the uncertain temperature observations, uncertain forcing, and uncertain model sensitivity, each of which has been demonstrated in the previous sections to have large uncertainties that were not accounted for in the conclusion. I first encountered the marvelous phrase “bootstrapped plausibility” in an essay by Jerome Ravetz, which motivated me to learn more about this. Bootstrapped plausibility (Agassi 1974) occurs with a proposition that is rendered plausible then lends plausibility to some of its more uncertain premises (e.g. 1-5), introducing further circularity into the argument.
From this analysis, it seems that the AR4’s assessment of confidence at the very likely (90-99%) level cannot be objectively justified, even if the word “most” is interpreted to imply a number that is only slightly greater than 50%. A heavy dose of expert judgment is required to come up with “very likely” confidence.
The ambiguity of “most”
The word “most” introduces some interesting conundrums into the problem. The meaning of most is ambiguous; the dictionary provides a meaning of “a great majority of; nearly all.” For some reason, I have been assuming that IPCC’s use of the word “most” implied greater than 50%, but now I don’t know where that came from (would appreciate some help with this). For the sake of argument, lets assume that the word most is associated with a range (maybe as large as 51-95%). It seems that it would be more useful to provide probabilities for a range of states within the state space (help I might not be using the appropriate terminology).
Consider the following collection of states: 0%, 10%, 20% . . . 100%, where the percentage denotes the amount of warming that is attributed anthropogenic causes. In terms of confidence levels, there would be virtually zero confidence that the correct state is either 0% or 100%. There would probably be higher confidence across the ranges 30-70%.
Presenting confidence in the attribution as a function of different possible states would eliminate the ambiguity associated with the word “most” and would give a more realistic view of the confidence and uncertainties associated with the detection and attribution argument.
An alternative way would be to present the confidence levels for each state whereby the % denotes that anthropogenic warming causes at least the % of warming reflected by that state. So 0% would have 100% confidence level and 100% would have zero confidence level.
Based upon my analysis, the IPCC has presented a weak case for its high level of confidence in its detection and attribution statement. With the materials it has in hand (data, models, theory), it seems that it could make a much more robust case for a statement on detection and attribution with better logic and a better experimental design for its model simulations (see Part II).
I am cautiously hopeful that the AR5 might do a better job. The IPCC Expert Meeting on Detection and Attribution Related to Anthropogenic Climate Change (2009) states the following:
“Where models are used in attribution, a model’s ability to properly represent the relevant causal link should be assessed. This should include an assessment of model biases and the model’s ability to capture the relevant processes and scales of interest. Confidence in attribution will also be influenced by the extent to which the study considers . . . confounding factors and also observational data limitations.”
“Confounding factors may lead to false conclusions within attribution studies if not properly considered or controlled for. Examples of possible confounding factors for attribution studies include pervasive biases and errors in instrumental records; model errors and uncertainties; improper or missing representation of forcings in climate and impact models; structural differences in methodological techniques; uncertain or unaccounted for internal variability; and nonlinear interactions between forcings and responses.”
“Confounding factors (or influences) should be explicitly identified and evaluated where possible. Such influences, when left unexamined, could undermine conclusions of climate and impact studies, particularly for factors that may have a large influence on the outcome.”
“For transparency and reproducibility it is essential that all steps taken in attribution approaches are documented. This includes full information on sources of data, steps and methods of data processing, and sources and processing of model results.”
“Estimates of the variability internally generated within the climate system or climate impact system are needed to establish if observed changes are detectable. It is ideal if the observational record is of sufficient length to estimate internal variability of the system that is being considered (note, however, that in most cases observations will contain both response to forcing/drivers and variability). Further estimates of internal variability can be produced from long control simulations with climate models . . . Expert judgments or multi-model techniques may be used to incorporate as far as possible the range of variability in climate models and to assign uncertainty levels, confidence in which will need to be assessed.”
These recommendations reflect a much greater awareness of the challenges and uncertainties associated with attribution. Dare we hope for improvements in the AR5 assessment?