by Judith Curry
Should probabilistic qualities be assigned to climate model projections?
Are the approaches used by the IPCC for assessing climate model projection quality – confidence building, subjective Bayesian, and likelihood – appropriate for climate models?
What are some other approaches that could be used?
The following paper addresses the above questions:
Assessing climate model projections: state of the art and philosophical reflections
Joel Katzav, Henk Djikstra, Jos de Laat
Abstract. The present paper draws on climate science and the philosophy of science in order to evaluate climate-model-based approaches to assessing climate projections. We analyze the difficulties that arise in such assessment and outline criteria of adequacy for approaches to it. In addition, we offer a critical overview of the approaches used in the IPCC working group one fourth report, including the confidence building, Bayesian and likelihood approaches. Finally, we consider approaches that do not feature in the IPCC reports, including three approaches drawn from the philosophy of science. We find that all available approaches face substantial challenges, with IPCC approaches having as a primary source of difficulty their goal of providing probabilistic assessments.
forthcoming in Studies in History and Philosophy of Modern Physics [link to paper]
This paper covers a lot of the same territory that was covered in my paper Climate Science and the Uncertainty Monster with regards to uncertainties in climate models. The part of the paper that I focus on here is the articulation of different approaches for assessing climate model projection quality.
Sections 4, 5, and 6 of the paper describe the main methods that have been used by the IPCC: confidence building, subjective Bayesian approach, and likelihood. These are summarized in section 7 as follows:
As we have noted, WG1 AR4 often uses expert judgment that takes the results of the approaches we have been discussing, as well as partly model-independent approaches, into consideration in assigning final projection qualities. Insofar as final assignments are model based, however, the shared limitations of the approaches we have been discussing remain untouched. In particular, insofar as final assessments are model based, they face serious challenges when it comes to assessing projection quality in light of structural inadequacy, tuning and initial condition inaccuracy. Moreover, they continue to be challenged by the task of assigning probabilities and informative probability ranges to projections.
The main focus of this post is Section 8:
Assessing projections: what else can be done?
We now examine approaches that differ from those that play center stage in WG1 AR4. The first approach, the possibilist approach, is described in the climate science literature but is primarily non-probabilistic. The remaining approaches are philosophy-of-science-based approaches. There are currently four main, but not necessarily mutually exclusive, philosophical approaches to assessing scientific claims. One of these is the already discussed subjective Bayesian approach. The other three are those that are discussed below.
JC note: I focus here on possibilistic and severe testing approaches, which I think are the most promising for the climate problem.
The possibilist approach
On the possibilist approach, we should present the range of alternative projections provided by models as is, insisting that they are no more than possibilities to be taken into account by researchers and decision makers and that they provide only a lower bound to the maximal range of uncertainty. Climate model results should, accordingly, be presented using plots of the actual frequencies with which models have produced specific projections . At the same time, one can supplement projected ranges with informal, though sometimes probabilistic, assessments of confidence in projections that appeal, as the confidence building approach appeals, to inter-model agreement and agreement with physical theory.
Informal approaches to assessing projection quality must address the same central challenges that quantitative approaches must address. So, insofar as the possibilist position allows informal probabilistic assessments of projection quality, it must address the difficulties that all probabilistic approaches face. However, one could easily purge the possibilist approach of all probabilistic elements and assess projections solely in terms of their being possibilities. Moreover, there are obvious ways to develop purely possibilistic assessment further. Purely possibilistic assessment can, in particular, be used to rank projections. Possibilities can, for example, be ranked in terms of how remote they are.
The purged possibilist approach would still face challenges. Presenting CMPs (climate model projections) as possibilities worthy of consideration involves taking a stance on how CMPs relate to reality. For example, if we are presented with an extreme climate sensitivity range of 2 to 11 K and are told that these are possibilities that should not have been neglected by AR3 WG1’s headline uncertainty ranges, a claim is implicitly being made about which climate behavior is a real possibility. It is implied that these possibilities are unlike, for example, the possibility that the United States will more than halve its budget deficit by 2015. Thus a possibilist assessment of projection quality needs to be accompanied by an examination of whether the projections are indeed real possibilities. The same considerations apply to ‘worst case scenarios’ when these are put forward as worthy of discussion in policy settings or research. The threat that arises when we do not make sure that possibilities being considered are real possibilities is that, just as we sometimes underestimate our certainty, we will sometimes exaggerate our uncertainty.
Nevertheless, the challenges facing purely possibilistic assessment are substantially more manageable than those facing probabilistic assessment. To say that something is a real possibility at some time t is, roughly, to say that it is consistent with the overall way things have been up until t and that nothing known excludes it . A case for a projection’s being a real possibility can, accordingly, be made just by arguing that we have an understanding of the overall way relevant aspects of the climate system are, showing that the projection’s correctness is consistent with this understanding and showing that we do not know that there is something that ensures that the projection is wrong. There is, as observed in discussing probabilistic representations of ignorance, no need to specify a full range of alternatives to the projection here. Further, state-of-the-art GCMs can sometimes play an important role in establishing that their projections are real possibilites. State-of-the-art GCMs’ projections of GMST are, for example and given the extent to which GCMs capture our knowledge of the climate system, real possibilities.
Severe testing, climate models and climate model projections
The remaining approach to assessing scientific claims that we will discuss is the severe testing approach. The idea behind the severe testing approach is that the deliberate search for error is the way to get to the truth. Thus, on this approach, we should assess scientific claims on the basis of how well they have withstood severe testing or probing of their weaknesses .
According to Popper, an empirical test of a theory or model is severe to the extent that background knowledge tells us that it is improbable that the theory or model will pass the test. Background knowledge consists in established theories or models other than those being tested.
A crucial difference between the severe testing approach and the approaches pursued by WG1 AR4 is that the severe testing approach never allows mere agreement, or increased agreement, with observations to count in favor of a claim. That simulation of observed phenomena has been successful does not tell us how unexpected the data are and thus how severely the data have tested our claims. If, for example, the successful simulation is the result of tuning, then the success is not improbable, no severe test has been carried out and no increased confidence in model fitness for purpose is warranted. Notice, however, that the fact that claims are tested against in-sample data is not itself supposed to be problematic as long as the data does severely test the claims. Another crucial difference between the severe testing approach and those pursued by WG1 AR4 is that the severe testing approach is not probabilistic. The degree to which a set of claims have withstood severe tests, what Popper calls their degree of corroboration, is not a probability.
How might one apply a (Popperian) severe testing approach to assessing projection quality? What we need, from a severe testing perspective, is a framework that assigns a degree of corroboration to a CMP, p, as a function of how well the model (or ensemble of models), m, which generated p has withstood severe tests of its fitness for the purpose of doing so. Such severe tests would consist in examining the performance of some of those of m’s predictions the successes of which would be both relevant to assessing m’s fitness for the purpose of generating p and improbable in light of background knowledge. Assessing, for example, a GCM’s projection of 21st century GMST would involve assessing how well the GCM performs at severe tests of relevant predictions of 20th century climate and/or paleoclimate. That is it would involve assessing how well the GCM performs at simulating relevant features of the climate system that we expect will seriously challenge its abilities. A relevant prediction will be one the accuracy of which is indicative of the accuracy of the projection of 21st century GMST. Examples of relevant features of the climate the accurate simulation of which will be a challenge to IPCC-AR5 models are the effects of strong ENSO events on the GMST, effects of Atlantic sea surface temperature variations (associated with the MOC) on the GMST and special aspects of the GMST such as its late 30s and early 40s positive trends. That these data will challenge IPCC- AR5 models is suggested by the difficulty CMIP3 models have in adequately 1359 simulating them.
How might the severe testing approach help us with the difficulties involved in assessing projection quality? The severe testing approach allows us to bypass any worries we might have about tuning since it only counts success that does not result from tuning, success that surely does exist, in favor of CMPs. The severe testing approach can thus, at least, be used as a check on the results of approaches that do not take tuning into account. If, for example, the subjective Bayesian approach assigns a high probability to a projection and the severe testing approach gives the projection a high degree of corroboration, we can at least have some assurance that the probabilistic result is not undermined by tuning.
Underdetermination in choice between parameters/available parameterization schemes might also be addressed by the severe testing approach. Substituting different parameterization schemes into a model might result in varying degrees of corroboration, as might perturbing the model’s parameter settings. Where such variations exist, they allow ranking model fitness for purpose as a function of parameter settings/parameterization schemes. Similarly, degrees of corroboration can be used to rank fitness for purpose of models with different structures. The resulting assessment has, like assessment in terms of real possibilities, the advantage that it is less demanding than probabilistic assessment or assessment that is in terms of truth or approximate truth. Ranking two CMPs as to their degrees of corroboration, for example, only requires comparing the two CMPs. It does not require specifying the full range of alternatives to the CMPs. Nor does it require that we take some stand on how close the CMPs are to the truth, and thus that we take a stand on the effects of unknown structural inadequacy on CMP accuracy. Popper’s view is that a ranking in terms of degrees of corroboration only provides us with a ranking of our conjectures about the truth. The most highly corroborated claim would thus, on this suggestion, be our best conjecture about the truth. Being our best conjecture about the truth is, in principle, compatible with being far from the truth.
JC comments: While many of these topics have been covered previously at Climate Etc.,
- Probabilistic(?) estimates of climate sensitivity
- Should we assess climate models in light of severe tests?
- Scenario falsification
- The culture of building confidence in climate models
I like this paper because it provides an integrated framework for assessing climate model projection quality. The IPCC has been mostly relying on confidence building, subjective Bayesian and likelihood methods. Of these, confidence building is a keeper (especially if it includes formal verification and validation), and this should be supplemented by possibilist and extreme testing approaches. The combination of possibilist and extreme testing approaches provides the basis for scenario falsification, which is an approach that I have been arguing for.
Moderation note: This is a technical thread and comment will be moderated for relevance.
