by Judith Curry
On the thread building confidence in climate models , a small amount of text was devoted to verification and validation (V&V). In raising the level of the game, I included the following bullet:
• Fully documented verification and validation of climate models
Steve Easterbrook objects to this statement; Dan Hughes objects to Steve Easterbrooks objection. On this thread, I further explore this issue and describe how climate models are actually evaluated. And I ponder the issue of what actually makes sense in terms of climate model V&V, given the purposes for which they are used.
For a relatively short, general reference on V&V, see Sargent.
One issue of particular concern in the context of climate models is the existence of adequate documentation for the model’s conceptual basis, assumptions, solution methods, sources of significant uncertainty, input data, and model history. This documentation is needed for the models to be useful and credible for scientists outside the core modeling group, and for public accountability when these models are used for decision making. Further, the validation of climate models and their evaluation against observation is inadequate (and much less extensive than it could/should be), particularly in the context of fitness for many of the purposes for which they are being used.
Model “comfort”
On the building confidence thread, I introduced the concept of “comfort” with regard to climate models. Comfort as used here is related to the sense that the model developers themselves have about their model, which includes the history of model development and the individuals that contributed to its development, the reputations of the various modeling groups, and consistency of the simulated responses among different models and different model versions.
Scientists that evaluate climate models, develop physical process parameterizations, and utilize climate model results become convinced that the models are useful by the model’s relation to theory and physical understanding of the processes involved, consistency of the simulated responses among different models, and the ability of the model and model components to simulate historical observations. Particularly for scientists that use the models rather than participate in model development/evaluation, the reputation of the modeling centers at leading government labs is also an important factor. Another factor is the “sanctioning” of these models by the IPCC (with its statements about model results having a high confidence level).
Knutti states: “So the best we can hope for is to demonstrate that the model does not violate our theoretical understanding of the system and that it is consistent with the available data within the observational uncertainty.” Expert judgment plays a large role in the assessment of confidence in climate models.
Rising discomfort
My perception of climate models and “comfort” comes from reading papers about the models and their applications over the past two decades, as a participant in some of this research, and as a member 0f several national and international committees and working groups that have dealt with the improvement and evaluation of climate models. My own engagement with the climate model development community has been in the area of parameterization development and model component evaluation. In the 1990’s I served on the Executive Committee of the DOE ARM program and from 1998-2003 I served on the Steering Committee for the WCRP GCSS Programme and Chair of the Working Group on Polar Clouds (note these are two of the programs mentioned in IPCC Chapter 8; I am a coauthor on Randall et al. 2003). My particular involvement focused improving parameterizations of radiation, clouds and sea ice in the Artic and evaluating these components in climate models. My recommendations on this topic remain on the website of NASA’s Modeling and Analysis Program (MAP).
Circa 2005, my frustrations with the climate modeling community were in the slow diffusion of new research and parameterization development into climate models, the relatively few modeling groups actively engaging in these model evaluation programs, and the proliferation of the model intercomparison projects (MIPS) instead of a serious program to evaluate the climate models using observations. In spite of this, my confidence in climate models was bolstered by the strong agreement between the 20th century simulations and the time series of global surface temperature anomalies (e.g. Fig 2 Meehl et al.).
A seminal event in my thinking on this subject was the climateaudit thread, which evolved into a discussion of the V&V of weather and climate models. I defended the lack of formal V&V for climate models. I described extensively the documentation for the ECMWF weather forecast models and the NCAR climate model, the models that I am most familiar with and arguably have the most extensive documentation. On this thread, I first encountered Lucia Liljegren, Steve Mosher, and Dan Hughes, and I learned much about the process and standards of V&V in various contexts (you have to get deep into the comments before encountering this discussion).
I continued an email conversation on this topic with Dan Hughes, and he recommended that I read Roache’s book, which I did. The other thing that I learned from the climteaudit thread is that the “comfort” that I had with climate models did not at all translate into confidence for the broader technical community that was interested in climate models. The comfort that I had developed was viewed as a sort of a “truthiness” that relied on appealing to the authority of the climate modelers.
With the increasing use of climate models for policy (e.g. the UNFCCC CO2 stabilization targets and the U.S. EPA endangerment finding), I came to appreciate the need for better documentation of climate models and public availability of the codes. My recent investigation into the climate model simulations used in the IPCC’s detection and attribution analysis has led to a reduction in my confidence in climate models owing to the bootstrapped plausibility of the attribution analysis.
And finally, I acknowledge my personal frustration in several attempts to find out specific details of several models, and having been misled in my understanding of a nontrivial aspect of climate model structural form. As an outgrowth of my work on parameterization of cloud microphysical processes, I have started wondering whether the structural form of the atmospheric core is consistent with the types of cloud parameterizations that are increasingly being incorporated into climate models. I had been reading a paper by Bannon on multicomponent fluids and multiphase flows. One of the particular concerns that I had was the lack of account for condensed water phases in the mass continuity equation, which is described by Thuburn (2008):
Moist processes are strongly nonlinear and are likely to be particularly sensitive to imperfections in conservation of water. Thus there is a very strong argument for requiring a dynamical core to conserve mass of air, water, and long-lived tracers, particularly for climate simulation. Currently most if not all atmospheric models fail to make proper allowance for the change in mass of an air parcel when water vapour condenses and precipitates out. A typical formulation in terms of virtual temperature implicitly replaces the condensed water vapor by an equal volume of dry air. This approximation can lead to noticeable forecast errors in surface pressure during heavy precipitation, for example. However, the approximation will not lead to a systematic long term drift in the atmospheric mass in climate simulations provided there is no long term drift in the mean water content of the atmosphere.
Anastassia Makarieva’s research on water vapor resonated with me in terms of this specific issue. At the Air Vent, Kerry Emanuel pointed out the experimental inclusion of this effect in a mesoscale model that simulated a hurricane. The climate model thread and the thread on Makarieva’s latest paper prompted emails from GFDL , that said:
I just want to point out that this effect was correct implemented in GFDL’s
CM2.1 (which was used in IPCC AR4) and all models developed at GFDL
since. In fact, the NASA GEOS-4 and GEOS-5 GCMs also shared the same
attribute (because we used the same FV dynamical core; this core is also
being used at NCAR for AR5 experiments). . . The “vertically Lagrangian control-volume discretization” allows us to have a local mass sink/source due to moisture changes. All other climate models that I know can’t accomplish this
because the vertical coordinate is tied directly to surface pressure.
Well, the good news is that the GFDL models are correctly handling the condensed water in the mass continuity equation (I can’t tell from the email whether or not NASA and NCAR models are actually including this). The bad news is that Thuburn, myself, Emanuel, and Makarieva did not know about this in spite of attempts to investigate this issue.
Climate model validation
Chapter 8 of the IPCC AR4 summarizes climate model verification efforts circa 2005. These include component level evaluation (such as GCSS and ARM, but relatively few models do this) and evaluation of the full ouputs of the model. Each of the modeling groups does some sort model evaluation against climatology, but the extend of this evaluation varies widely among the different modeling centers. At the time of the AR4, the evaluation of the full models focused on model intercomparison (MIP) projects, that are described in Chapter 8 as follows:
The global model intercomparison activities that began in the late 1980s (e.g., Cess et al., 1989), and continued with the Atmospheric Model Intercomparison Project (AMIP), have now proliferated to include several dozen model intercomparison projects covering virtually all climate model components and various coupled model configurations (see http://www. clivar.org/science/mips.php for a summary).
Only a few of the MIPs have any significant component related to detailed comparision with actual global observations. Such a comparison does not seem to have been the main purpose of the MIPs:
Overall, the vigorous, ongoing intercomparison activities have increased communication among modelling groups, allowed rapid identification and correction of modelling errors and encouraged the creation of standardised benchmark calculations, as well as a more complete and systematic record of modelling progress.
While such MIPs can identify outlier models that may motivate investigation of a possible model problem, the issue of confirmation holism (discussed on the building confidence thread) precludes the identification of the source of the model problem from such activities. At this point, given the relative convergence of the different models, the MIPs are mainly a source of “comfort” and do not make a serious contribution to model validation, IMO.
Recent developments
A more comprehensive framework for climate model validation is presented by Pope and Davies, that includes simplified tests, testing of parameterizations in single-column models, dynamical core tests, simulations of the idealized aquaplanet, climate model intercomparisons, double call tests, spin up tendencies, and evaluation in numerical weather prediction mode. Several of these methods are used by all climate modeling centers; others are merely proposed strategies for climate model evaluation that have not been implemented in a formal way by any of the climate modeling centers.
There is a new paper by Neelin et al. that is hot off the press: Considerations for parameter optimization and sensitivity in climate models. This paper is of particular importance in the context of parameter optimization and validation for climate sensitivity applications and simulation of regional climate.
In the past few years, there have been some very encouraging developments regarding climate model validation using global observational data sets from satellites and reanalyses from numerical weather prediction models. Gleckler et al. evaluates all of the CMIP3 models in terms of the climatology of the atmospheric fields (Note: this paper is behind pay wall; for a pdf presentation, google Gleckler performance metrics climate models, there is a pdf presentation on the NCAR website that pops up.) Santer et al. evaluates the mean state, the annual cycle, and the variability associated with El Nino.
Under the auspices of the WCRP WGNE, there is a recent report from the Climate Model Metrics panel (led by Gleckler), that reflects a major leap forward. For the CMIP5 simulations, this group is conducting the metrics analysis and including the results in the model documentation, with codes and observations to be made publicly available.
Finally! Some serious evaluation of climate models is occurring. But are we evaluating the models in the “right” way? How should we interpret the results of such evaluations?
Philosophy of climate model validation
The main references of relevance are:
- Randall and Wielicki
- Naomi Oreskes
- Elisabeth Lloyd (behind paywall)
- Wendy Parker (behind paywall)
I originally intended to write more on this, but this post is getting too long. The references are here for those that are interested.
Whither climate model V&V?
Climate modeling centers are already doing some level of model verification and validation, but the documentation (if it exists) is scattered. However, the overall treatment of the V&V problem by the climate modeling centers is in the context of the models being used by the research community. As I’ve stated previously, I do not regard the current level of documentation of these models to be adequate to conduct and assess the research applications.
Even for the research applications, improved documentation (easily accessible in one place and more comprehensive than what is written in journal articles) is needed that:
1. Describes the fundamental continuous equations and the assumptions that are made, and the numerical solution methods for the code.
2. Demonstrate that the code is correctly solving the equation set of the models and that the solution methods are stable and convergent.
3. Describes in detail the model calibration/tuning process and the data that were used in the calibration
4. Describes the code structure, logic, and overall execution procedures.
5. Model strengths ,weaknesses and limitations need to be stated explicitly and unequivocably.
6. Describes in detail how to use the code.
Decision makers and the public may want more in the way of verification and a formal V&V (e.g. Software Quality Assurance), as a way of promoting model robustness or for political reasons. The increasing applications of climate models for policy makes drives a need for V&V documentation that is political and a key element in convincing users of the models and third parties (including skeptics) to assess the material.
The U.S. National Research Council (NRC) report entitled “Models in Environmental Regulatory Decision Making” addresses requirements for models of particular relevance to the U.S. Environmental Protection Agency (EPA). In light of EPA’s policy on greenhouse gas emissions, there is little question that climate models should be included under this rubric. The main issue addressed in the NRC report is summarized in this statement:
“Evaluation of regulatory models also must address a more complex set of trade-offs than evaluation of research models for the same class of models. Regulatory model evaluation must consider how accurately a particular model application represents the system of interest while being reproducible, transparent, and useful for the regulatory decision at hand. It also implies that regulatory models should be managed in a way to enhance models in a timely manner and assist users and others to understand a model’s conceptual basis, assumptions, input data requirements, and life history. . . EPA should continue to develop initiatives to ensure that its regulatory models are as accessible as possible to the broader public and stakeholder community. . . It is most important to highlight the critical model assumptions, particularly the conceptual basis for a model and the sources of significant uncertainty.
The traditional standards and methods of model V&V for the computer software industry and engineering and regulatory applications are designed for specific types of applications of software and models. To make sense for climate models, a sensible and useful V&V protocol needs to consider the nature of the models and their applications. Particular considerations for climate models include their continual and ongoing development, and the need to manage and document the inherent uncertainty.
A tension exists between spending time and resources on V&V on the current model, versus improving the model. While continued model development will further scientific research, it does not seem likely in the short term that model developments will substantially change the models in terms of climate sensitivity or their fidelity in modeling regional climate change. Hence, in terms of policy applications, V&V is justified, even if it means slowing down progress in model development. Depending on the actual formulation of the V&V process, the cost in money and time doesn’t need to be onerous; if the reasons for V&V are political, then presumably the funds will be found.
And finally, there is the issue of semantics when discussing V&V. Oreskes says “evaluation (not validation),” interpreting validation to be something different from that interpreted by engineers and software specialists. By “formal,” I mean something that is well documented (which is presumably something different from what Steve Easterbrook infers from the word.) By “independent,” I do not mean the full industrial version of independent V&V, but rather to enable anyone to independently assess and use the model (including open source/crowd source environments).
So, what shall it be?
- v&v (lowercase) business as usual?
- V&V lite with better documentation and model evaluation?
- fully implemented V&V into the model development process?
