by Judith Curry
As climate models become increasingly relevant to policy makers, they are being criticized for not undergoing a formal verification and validation (V&V) process analogous to that used in engineering and regulatory applications. Further, claims are being made that climate models have been falsified by failing to predict specific future events.
To date, establishing confidence in climate models has targeted the scientific community that develops and uses the models. As the climate models become increasingly policy relevant, it is critically important to address the public need for high-quality models for decision making and to establish public confidence in these models. An important element in establishing such confidence is to make the models as accessible as possible to the broader public and stakeholder community.
An overview of uncertainties associated with climate models was provided in last week’s post. Why do climate scientists have confidence in climate models? Is their confidence justified? With climate models being increasingly used to provide policy-relevant information, how should we proceed in building public confidence in them?
All models are imperfect; we don’t need a perfect model, just one that serves its purpose. Airplanes are designed using models that are inadequate in their ability to simulate turbulent flow. Financial models based upon crude assumptions about human behavior have been used for decades to manage risk. In the decision making process, models are used more or less depending on a variety of factors, one of which is the credibility of the model. Climate model simulations are being used as the basis for international climate and energy policy, so it is important to assess the adequacy of climate models for this purpose.
Confidence in weather prediction models
Some issues surrounding the culture of establishing confidence in climate models can be illuminated first by considering numerical weather prediction models. To my knowledge, nobody is clamoring for V&V of weather models; why not? Roger Pielke Jr. provides an interesting perspective on this in the Climate Fix:
Decision makers, including most of us as individuals, have enough experience with weather forecasts to be able to reliably characterize their uncertainty and make decisions in the context of that uncertainty. In the U.S., the National Weather Service issues millions of forecasts every year. This provides an extremely valuable body of information experience for calibrating forecasts in the context of decisions that depend on them. The remarkable reduction in loss of life from weather events over the past century is due in part to improved predictive capabilities, but just as important has been our ability to use predictions effectively despite their uncertainties.
The beginnings of numerical weather forecasting involved some of the giants of 20th century science, including John von Neumann. Numerical weather predictions began in 1955 in both the U.S. and Europe, based upon recent theoretical advances in dynamical meteorology, computational science, and computing capabilities. At this point, the public is interested in better weather forecast models, and weather forecast modeling centers are actively engaged in continuing model development. Model evaluation against observations and comparison of the skill scores of different model versions is a key element of such development.
This strategy for developing confidence is being extended to seasonal climate prediction models, which are based on coupled atmosphere/ocean models. A unified treatment of weather and climate models (i.e. the same dynamical cores for the atmosphere and ocean are used for models across the range of time scales) transfers confidence from the weather and seasonal climate forecast models to the climate models used in century scale simulations. Confidence established in the atmospheric core as a result of the extensive cycles of evaluation and improvement of weather forecast models is important; however caution is needed since other factors become significant in climate models that have less import in weather models, such as mass conservation and cloud and water vapor feedback processes.
Owing to their imperfections and the need for better models, models that predict weather, seasonal climate, and longer climate change are under continued development and evaluation.
Climate scientists’ perspectives on confidence in climate models
Before discussing confidence, I first bring up the issue of “comfort.” Comfort as used here is related to the sense that the model developers themselves have about their model, which includes the history of model development and the individuals that contributed to its development, the reputations of the various modeling groups, and the location of the model simulations in the spectrum of simulations made by competing models. In this context, comfort is a form of “truthiness” that does not translate into user confidence in the model, other than via an appeal to the authority of the modelers.
Scientists that evaluate climate models, develop physical process parameterizations, and utilize climate model results are convinced (at least to some degree) of the usefulness of climate models for their research. They are convinced by a “consilience of evidence” (Oreske’s phrase) that includes the model’s relation to theory and physical understanding of the processes involved, consistency of the simulated responses among different models and different model versions, and the ability of the model and model components to simulate historical observations. Particularly for scientists that use the models rather than participate in model development/evaluation, the reputation of the modeling centers at leading government labs is also an important factor. Another factor is the “sanctioning” of these models by the IPCC (with its statements about model results having a high confidence level), which can lead to building confidence in climate models through circular reasoning.
Knutti 2008 describes the reasons for having confidence in climate models as follows:
- Models are based on physical principles such as conservation of energy, mass and angular momentum.
- Model results are consistent with our understanding of processes based on simpler models, conceptual or theoretical frameworks.
- Models reproduce the mean state and variability in many variables reasonably well, and continue to improve in simulating smaller-scale features
- Models reproduce observed global trends and patterns in many variables.
- Models are tested on case studies such as volcanic eruptions and more distant past climate states
- Multiple models agree on large scales, which is implicitly or explicitly interpreted as increasing our confidence
- Projections from newer models are consistent with older ones (e.g. for temperature patterns and trends), indicating a certain robustness.
Paul Barton Levenson (h/t to Bart Verheggen) provides a useful summary with journal citations of climate model verification.
Challenges to building confidence in climate models
Model validation strategies depend on the intended application of the model. The application of greatest public relevance is projection of 21st century climate variability and change and the role of anthropogenic greenhouse gases. So how can we assess whether climate models are useful for this application? The discussion below is based upon Smith (2002) and Knutti (2008).
User confidence in a forecast model depends critically on the confirmation of forecasts, both using historical data (hindcasts, in-sample) and out-of-sample observations (forecasts). Confirmation with out-of-sample observations is possible for forecasts have a short time horizon that can be compared to out-of-sample observations (e.g. weather forecasts). Unless the model can capture or bound a phenomena in hindcasts and previous forecasts, there is no expectation that the model can quantify the same phenomena in subsequent forecasts. However, capturing the phenomena in hindcasts and previous forecasts does not in any way guarantee the ability of the model to capture the phenomena in the future, but it is a necessary condition. If the distance of future simulations from the established range of model validity is small, it reasonable to extend established confidence in the model to the perturbed future state. Extending such confidence requires that no crucial feedback mechanisms are missing from the model.
Leonard Smith has stated that “There are many more ways to be wrong in a 106 dimensional space than there are ways to be right.” However, since they make millions of predictions, models will invariably get something right. Confirmation of climate models using historical data is relative. Different climate models (and different parameter choices within a climate model) simulate some aspects of the climate system well and others not so well. Some models are arguably better than others in some sort of overall sense, but falsification of such complex models is not really a meaningful concept.
Further, agreement between model and data does not imply that the model gets the correct answer for the right reasons; rather, such agreement merely indicates that the model is empirically adequate. For example, all of the coupled climate models used in the IPCC AR4 reproduce the time series for the 20th century of globally averaged surface temperature anomalies; yet they have different feedbacks and sensitivities and produce markedly different simulations of the 21st century climate.
Even for in-sample validation, there is no straightforward definition of model performance for complex non-deterministic models having millions of degrees of freedom. Because the models are not deterministic, multiple simulations are needed to compare with observations, and the number of simulations conducted by modeling centers are insufficient to create a pdf with a robust mean; hence bounding box approaches (assessing whether the range of the ensembles bounds the observations) are arguably a better way to establish empirical adequacy.
The climate community has been lax in grappling with the issue of establishing formal metrics of model performance and reference datasets to use in the evaluation. In comparing climate models with observations, issues to address include: space/time averaging and which variables to use and which statistics should be compared (mean, pdf, correlations). A critical issue in such comparisons is having reliable global observational datasets with well characterized error statistics, which is a separate challenge in itself (which will be subject of a series of posts after the new year). A further complication arises if datasets used in the model evaluation process are the same as those used for calibration, which gives rise to circular reasoning (confirming the antecedent) in the evaluation process.
An important element in evaluating models is the performance of its component parts. Lenhard and Winsberg (2008, draft) argues that climate models suffer from a particularly severe form of confirmation holism. Confirmation holism in the context of a complex model implies that a single element of the model cannot be tested in isolation since each element depends on the other elements, and hence it is impossible to determine if the underlying theories are false by reference to the evidence. Owing to inadequacies in the observational data and confirmation holism, assessing empirical adequacy should not be the only method for judging a model. Winsberg points out that models should be justified internally, based on their own internal form, and not solely on the basis of what they produce. Each element of the model that is not properly understood and managed represents a potential threat to the simulation results.
Each climate modeling group evaluates its own model against certain observations. All of the global climate modeling groups participate in climate model intercomparison projects (MIPs). Knutti states: “So the best we can hope for is to demonstrate that the model does not violate our theoretical understanding of the system and that it is consistent with the available data within the observational uncertainty.” Expert judgment plays a large role in the assessment of confidence in climate models.
Standards for establishing confidence in simulation models
For models that are used in engineering and regulatory applications, standards have been developed for establishing confidence in simulation models. Models used to design an engineered system have different requirements and challenges from predictive models used in environmental regulation and resource management.
Verification and validation
In high-consequence decision making including regulatory compliance, credibility in numerical simulations models is established by undergoing a validation and verification (V&V) process. Formal V&V procedures have been articulated by several government agencies and engineering professional societies. A lucid description of model V&V is given in this presentation by Charles Macall.
The goal of model verification is to make the model useful in the sense that the model addresses the right problem and provides accurate information about the system being modeled. Verification addresses issues such as implementation of the model in the computer, selection of input parameters, and logical structure of the model. Verification activities include code checking, flow diagrams of model logic, code documentation, and checking of each component.
Validation is concerned with whether the model is an accurate representation of the real system. Both model assumptions and model output undergo validation. Validation is usually achieved through model calibration that is an iterative process of comparing the model to the actual system. Model validation depends on the purpose of the model and its intended use. Pathways to validation include model evaluation of case studies against observations, comparing multiple models, utilization of maximally diverse model ensembles, and assessment of subject matter experts. Challenges to model validation occur if controlled experiments cannot be performed on the system (e.g. there is one historic time series) or the model is not deterministic; each of these conditions imply that the model cannot be falsified.
Oreskes et al. (1994) claim that model V&V is impossible and logically precluded for open-ended systems such as natural systems. Oreskes (1998) argues for model evaluation (not validation), whereby model quality can be evaluated on the basis of the underlying scientific principles, quantity and quality of input parameters, the ability of a model to reproduce independent empirical data. Much of the debate between validation/verification versus evaluation seems to me to be semantic (and Oreskes does not use the V&V terms in the practical sense employed by engineers).
Steve Easterbrook has a superb post on what a V&V process might look like for climate models, and includes reference to an excellent paper by Pope and Davies. Easterbrook states “Verification and Validation for [Earth System Models] is hard because running the models is an expensive proposition (a fully coupled simulation run can take weeks to complete), and because there is rarely a “correct” result – expert judgment is needed to assess the model outputs.” But he rises to the challenge and makes some very interesting and valuable suggestions (I won’t attempt to summarize them here, read Easterbrook’s post, in fact read it twice).
Building confidence in environmental models
The U.S. National Research Council (NRC) has published a report in 2007 entitled “Models in Environmental Regulatory Decision Making,” which addresses models of particular relevance to the U.S. Environmental Protection Agency (EPA). In light of EPA’s policy on greenhouse gas emissions, there is little question that climate models should be included under this rubric. The main issue addressed in the NRC report is summarized in this statement:
“Evaluation of regulatory models also must address a more complex set of trade-offs than evaluation of research models for the same class of models. Regulatory model evaluation must consider how accurately a particular model application represents the system of interest while being reproducible, transparent, and useful for the regulatory decision at hand. Meeting these needs may require different forms of peer review, uncertainty analysis, and extrapolation methods. It also implies that regulatory models should be managed in a way to enhance models in a timely manner and assist users and others to understand a model’s conceptual basis, assumptions, input data requirements, and life history.”
The report describes standards for model evaluation; principles for model development, selection, and application; and model management. And finally the report states:
EPA should continue to develop initiatives to ensure that its regulatory models are as accessible as possible to the broader public and stakeholder community. . . The level of effort should be commensurate with the impact of the model use. It is most important to highlight the critical model assumptions, particularly the conceptual basis for a model and the sources of significant uncertainty. . . The committee anticipates that its recommendations will be met with some resistance because of the potentially substantial resources needed for implementing life-cycle model evaluation. However, given the critical importance of having high-quality models for decision making, such investments are essential if environmental regulatory modeling is to meet challenges now and in the future.
These are standards that should apply to climate models that are being used in a predictive sense.
It’s time for a culture change
Climate model development has followed a pathway mostly driven by scientific curiosity and computational limitations. As they have matured, climate models are being increasingly used to provide decision-relevant information to end users and policy makers, whose needs are helping define the focus of model development in terms of increasing prediction skill on regional and decadal time scales.
It seems that now is the time for climate models to begin transitioning to a formal evaluation process that is suitable the nature of climate models and the applications for which they are increasingly being designed for. Ideas for validation strategies suitable for climate models should be sought from the computer science and engineering fields. The climate community needs to grapple with the issue of model validation in a much more serious way, including the establishment of global satellite climate data records. Knutti states that ” Only a handful of people think about how to make the best use of the enormous amounts of data, how to synthesize data for the non-expert, how to effectively communicate the results and how to characterize uncertainty.”
The issue of V&V is not just needed for public relations and public credibility, but to quantify progress and uncertainty and so to guide the allocation of resources in continued model improvement.
So, where do we go from here? Expecting internal V&V from the climate modeling centers would require convincing resource managers that this is needed. Not easy, but in the U.S., the EPA could become a driver for this. Many of the climate modeling centers are making their code publicly accessible, which is a step in the right direction. Independent V&V in an open environment (enabled by the internet and the blogosphere) may be the way to go (I know there are some efforts along these lines that are out there, please provide links). I look forward to your ideas and suggestions.