by Judith Curry
This question is posed and addressed in a recent article by Joel Katzav in EOS.
For some background information on how climate models are evaluated, see these previous posts:
Joel Katzav is a Professor in the Department of Philosophy and Ethics at the Eindhoven University of Technology, the Netherlands. His primary areas of research are metaphysics, the philosophy of science, the philosophy of technology, the methodology of philosophy and practical reasoning. My work is currently focused on existence, laws of nature and the epistemology of climate models.
Katzav, J. (2011) ‘Should we assess climate model predictions in light of severe tests?’, Eos, Transactions, American Geophysical Union, 92(23), p. 195.
According to Austro-British philosopher Karl Popper, a system of theoretical claims is scientific only if it is methodologically falsifiable, i.e., only if systematic attempts to falsify or severely test the system are being carried out. He holds that a test of a theoretical system is severe if and only if it is a test of the applicability of the system to a case in which the system’s failure is likely in light of background knowledge, i.e., in light of scientific assumptions other than those of the system being tested.
An implication of Popper’s above condition for being a scientific theoretical system is the injunction to assess theoretical systems in light of how well they have withstood severe testing. Applying this injunction to assessing the quality of climate model predictions (CMPs), including climate model projections, would involve assigning a quality to each CMP as a function of how well it has withstood severe tests allowed by its implications for past, present, and near-future climate or, alternatively, as a function of how well the models that generated the CMP have withstood severe tests of their suitability for generating the CMP.
For example, a severe testing assessment of a CMP generated by a member of the ensemble of global climate models that will be relied on in the fifth assessment report of the Intergovernmental Panel on Climate Change (IPCC) might involve assessing how well the member has done at simulating data that are both relevant to determining its suitability for generating the CMP and unlikely in light of the ensemble of global climate models relied on in the IPCC fourth assessment report. Data capturing global mean surface temperature trends during the second half of the twentieth century are relatively well simulated by, and thus not unlikely in light of, the ensemble of global climate models on which the IPCC fourth assessment report relied. These data would, accordingly, not be expected to challenge global climate models developed since the fourth report and are thus unsuitable for severely testing models that will be relied on in the fifth IPCC report. Data capturing the positive global mean surface temperature trend during the late 1930s and early 1940s are not well simulated by the ensemble relied on in the fourth IPCC report. These data will thus better serve to test severely models used in the fifth IPCC report.
An important question is whether Popper’s injunction should be applied in assessing CMP quality. As we will see, performance at severe tests currently plays a limited role in such assessment. I argue that this should change.
Severe Testing Assessment of CMPs: Current Situation
The scientific community has placed little emphasis on providing assessments of CMP quality in light of performance at severe tests. Consider, by way of illustration, the influential approach adopted by Randall et al.  in chapter 8 of their contribution to the fourth IPCC report. This chapter explains why there is confidence in climate models thus: “Confidence in models comes from their physical basis, and their skill in representing observed climate and past climate changes”.
The focus in this quote, and elsewhere in the chapter, is on what model agreement with physical theory as well as model simulation accuracy confirm. Supposedly, better grounding in physical theory or increased accuracy in simulation of observed and past climate means increased confirmation of CMPs.
CMP quality is thus supposed to depend on simulation accuracy. However, simulation accuracy is not a measure of test severity. If, for example, a simulation’s agreement with data results from accommodation of the data, the agreement will not be unlikely, and therefore the data will not severely test the suitability of the model that generated the simulation for making any predictions.
Severe Testing Assessment of CMPs: Why Do It?
It appears, then, that a severe testing approach to assessing CMP quality would be novel. Should we, however, develop such an approach? Arguably, yes (see also comment 3 in the online supplement). First, as we have seen, a severe testing assessment of CMP quality does not count simulation successes that result from the accommodation of data in favor of CMPs. Thus, a severe testing assessment of CMP quality can help to address worries about relying on such successes, worries such as that these successes are not reliable guides to out-of-sample accuracy, and will provide important policy-relevant information as a result (see comment 4 in the online supplement).
Second, assessing CMP quality using a severe testing approach would assist in assessing the maturity of the science underlying CMPs. This is because the more mature a body of knowledge is, the easier it is to specify severe tests for its claims. Assume that we want to test a prediction severely. The prediction will have testable implications only when conjoined with a set of additional assumptions, including basic theory and quasi- empirical generalizations. So if we are severely to test the prediction, and not just the conjunction of the prediction and the additional assumptions, then the additional assumptions will have to be established independently of the prediction. Only then might the potential falsity of an implication of the conjunction of the prediction and the additional assumptions constitute a real potential challenge to assuming the truth of the prediction, as opposed merely to a challenge to the conjunction of the prediction and the additional assumptions. The more mature a science is, the more such independently established claims tend to be in place and the easier it is to specify severe tests (for an illustration, see comment 5 in the online supplement).
Although severe testing is not typically used in existing assessments of CMP quality, some severe testing of models and CMPs may already occur. Still, and this brings us to a third reason for using a severe testing approach to assessing CMP quality, applying such an approach would increase the extent to which severe testing is used, which, in turn, might help us to develop better CMPs. According to Popper, severe testing is the way in which science progresses and thus the way in which to uncover better predictions. Even if we don’t accept that a methodology based on severe testing is the only way in which we learn about the world, it is clearly one important way of doing so.
JC comment: Joel just emailed me this paper, somehow I missed it when it was first published. I really like this paper, and am excited to see a philosopher with Joel’s range of interests taking on the climate modeling problem.