by Judith Curry
On the thread building confidence in climate models , a small amount of text was devoted to verification and validation (V&V). In raising the level of the game, I included the following bullet:
• Fully documented verification and validation of climate models
Steve Easterbrook objects to this statement; Dan Hughes objects to Steve Easterbrooks objection. On this thread, I further explore this issue and describe how climate models are actually evaluated. And I ponder the issue of what actually makes sense in terms of climate model V&V, given the purposes for which they are used.
For a relatively short, general reference on V&V, see Sargent.
One issue of particular concern in the context of climate models is the existence of adequate documentation for the model’s conceptual basis, assumptions, solution methods, sources of significant uncertainty, input data, and model history. This documentation is needed for the models to be useful and credible for scientists outside the core modeling group, and for public accountability when these models are used for decision making. Further, the validation of climate models and their evaluation against observation is inadequate (and much less extensive than it could/should be), particularly in the context of fitness for many of the purposes for which they are being used.
On the building confidence thread, I introduced the concept of “comfort” with regard to climate models. Comfort as used here is related to the sense that the model developers themselves have about their model, which includes the history of model development and the individuals that contributed to its development, the reputations of the various modeling groups, and consistency of the simulated responses among different models and different model versions.
Scientists that evaluate climate models, develop physical process parameterizations, and utilize climate model results become convinced that the models are useful by the model’s relation to theory and physical understanding of the processes involved, consistency of the simulated responses among different models, and the ability of the model and model components to simulate historical observations. Particularly for scientists that use the models rather than participate in model development/evaluation, the reputation of the modeling centers at leading government labs is also an important factor. Another factor is the “sanctioning” of these models by the IPCC (with its statements about model results having a high confidence level).
Knutti states: “So the best we can hope for is to demonstrate that the model does not violate our theoretical understanding of the system and that it is consistent with the available data within the observational uncertainty.” Expert judgment plays a large role in the assessment of confidence in climate models.
My perception of climate models and “comfort” comes from reading papers about the models and their applications over the past two decades, as a participant in some of this research, and as a member 0f several national and international committees and working groups that have dealt with the improvement and evaluation of climate models. My own engagement with the climate model development community has been in the area of parameterization development and model component evaluation. In the 1990’s I served on the Executive Committee of the DOE ARM program and from 1998-2003 I served on the Steering Committee for the WCRP GCSS Programme and Chair of the Working Group on Polar Clouds (note these are two of the programs mentioned in IPCC Chapter 8; I am a coauthor on Randall et al. 2003). My particular involvement focused improving parameterizations of radiation, clouds and sea ice in the Artic and evaluating these components in climate models. My recommendations on this topic remain on the website of NASA’s Modeling and Analysis Program (MAP).
Circa 2005, my frustrations with the climate modeling community were in the slow diffusion of new research and parameterization development into climate models, the relatively few modeling groups actively engaging in these model evaluation programs, and the proliferation of the model intercomparison projects (MIPS) instead of a serious program to evaluate the climate models using observations. In spite of this, my confidence in climate models was bolstered by the strong agreement between the 20th century simulations and the time series of global surface temperature anomalies (e.g. Fig 2 Meehl et al.).
A seminal event in my thinking on this subject was the climateaudit thread, which evolved into a discussion of the V&V of weather and climate models. I defended the lack of formal V&V for climate models. I described extensively the documentation for the ECMWF weather forecast models and the NCAR climate model, the models that I am most familiar with and arguably have the most extensive documentation. On this thread, I first encountered Lucia Liljegren, Steve Mosher, and Dan Hughes, and I learned much about the process and standards of V&V in various contexts (you have to get deep into the comments before encountering this discussion).
I continued an email conversation on this topic with Dan Hughes, and he recommended that I read Roache’s book, which I did. The other thing that I learned from the climteaudit thread is that the “comfort” that I had with climate models did not at all translate into confidence for the broader technical community that was interested in climate models. The comfort that I had developed was viewed as a sort of a “truthiness” that relied on appealing to the authority of the climate modelers.
With the increasing use of climate models for policy (e.g. the UNFCCC CO2 stabilization targets and the U.S. EPA endangerment finding), I came to appreciate the need for better documentation of climate models and public availability of the codes. My recent investigation into the climate model simulations used in the IPCC’s detection and attribution analysis has led to a reduction in my confidence in climate models owing to the bootstrapped plausibility of the attribution analysis.
And finally, I acknowledge my personal frustration in several attempts to find out specific details of several models, and having been misled in my understanding of a nontrivial aspect of climate model structural form. As an outgrowth of my work on parameterization of cloud microphysical processes, I have started wondering whether the structural form of the atmospheric core is consistent with the types of cloud parameterizations that are increasingly being incorporated into climate models. I had been reading a paper by Bannon on multicomponent fluids and multiphase flows. One of the particular concerns that I had was the lack of account for condensed water phases in the mass continuity equation, which is described by Thuburn (2008):
Moist processes are strongly nonlinear and are likely to be particularly sensitive to imperfections in conservation of water. Thus there is a very strong argument for requiring a dynamical core to conserve mass of air, water, and long-lived tracers, particularly for climate simulation. Currently most if not all atmospheric models fail to make proper allowance for the change in mass of an air parcel when water vapour condenses and precipitates out. A typical formulation in terms of virtual temperature implicitly replaces the condensed water vapor by an equal volume of dry air. This approximation can lead to noticeable forecast errors in surface pressure during heavy precipitation, for example. However, the approximation will not lead to a systematic long term drift in the atmospheric mass in climate simulations provided there is no long term drift in the mean water content of the atmosphere.
Anastassia Makarieva’s research on water vapor resonated with me in terms of this specific issue. At the Air Vent, Kerry Emanuel pointed out the experimental inclusion of this effect in a mesoscale model that simulated a hurricane. The climate model thread and the thread on Makarieva’s latest paper prompted emails from GFDL , that said:
I just want to point out that this effect was correct implemented in GFDL’s
CM2.1 (which was used in IPCC AR4) and all models developed at GFDL
since. In fact, the NASA GEOS-4 and GEOS-5 GCMs also shared the same
attribute (because we used the same FV dynamical core; this core is also
being used at NCAR for AR5 experiments). . . The “vertically Lagrangian control-volume discretization” allows us to have a local mass sink/source due to moisture changes. All other climate models that I know can’t accomplish this
because the vertical coordinate is tied directly to surface pressure.
Well, the good news is that the GFDL models are correctly handling the condensed water in the mass continuity equation (I can’t tell from the email whether or not NASA and NCAR models are actually including this). The bad news is that Thuburn, myself, Emanuel, and Makarieva did not know about this in spite of attempts to investigate this issue.
Climate model validation
Chapter 8 of the IPCC AR4 summarizes climate model verification efforts circa 2005. These include component level evaluation (such as GCSS and ARM, but relatively few models do this) and evaluation of the full ouputs of the model. Each of the modeling groups does some sort model evaluation against climatology, but the extend of this evaluation varies widely among the different modeling centers. At the time of the AR4, the evaluation of the full models focused on model intercomparison (MIP) projects, that are described in Chapter 8 as follows:
The global model intercomparison activities that began in the late 1980s (e.g., Cess et al., 1989), and continued with the Atmospheric Model Intercomparison Project (AMIP), have now proliferated to include several dozen model intercomparison projects covering virtually all climate model components and various coupled model configurations (see http://www. clivar.org/science/mips.php for a summary).
Only a few of the MIPs have any significant component related to detailed comparision with actual global observations. Such a comparison does not seem to have been the main purpose of the MIPs:
Overall, the vigorous, ongoing intercomparison activities have increased communication among modelling groups, allowed rapid identification and correction of modelling errors and encouraged the creation of standardised benchmark calculations, as well as a more complete and systematic record of modelling progress.
While such MIPs can identify outlier models that may motivate investigation of a possible model problem, the issue of confirmation holism (discussed on the building confidence thread) precludes the identification of the source of the model problem from such activities. At this point, given the relative convergence of the different models, the MIPs are mainly a source of “comfort” and do not make a serious contribution to model validation, IMO.
A more comprehensive framework for climate model validation is presented by Pope and Davies, that includes simplified tests, testing of parameterizations in single-column models, dynamical core tests, simulations of the idealized aquaplanet, climate model intercomparisons, double call tests, spin up tendencies, and evaluation in numerical weather prediction mode. Several of these methods are used by all climate modeling centers; others are merely proposed strategies for climate model evaluation that have not been implemented in a formal way by any of the climate modeling centers.
There is a new paper by Neelin et al. that is hot off the press: Considerations for parameter optimization and sensitivity in climate models. This paper is of particular importance in the context of parameter optimization and validation for climate sensitivity applications and simulation of regional climate.
In the past few years, there have been some very encouraging developments regarding climate model validation using global observational data sets from satellites and reanalyses from numerical weather prediction models. Gleckler et al. evaluates all of the CMIP3 models in terms of the climatology of the atmospheric fields (Note: this paper is behind pay wall; for a pdf presentation, google Gleckler performance metrics climate models, there is a pdf presentation on the NCAR website that pops up.) Santer et al. evaluates the mean state, the annual cycle, and the variability associated with El Nino.
Under the auspices of the WCRP WGNE, there is a recent report from the Climate Model Metrics panel (led by Gleckler), that reflects a major leap forward. For the CMIP5 simulations, this group is conducting the metrics analysis and including the results in the model documentation, with codes and observations to be made publicly available.
Finally! Some serious evaluation of climate models is occurring. But are we evaluating the models in the “right” way? How should we interpret the results of such evaluations?
Philosophy of climate model validation
The main references of relevance are:
- Randall and Wielicki
- Naomi Oreskes
- Elisabeth Lloyd (behind paywall)
- Wendy Parker (behind paywall)
I originally intended to write more on this, but this post is getting too long. The references are here for those that are interested.
Whither climate model V&V?
Climate modeling centers are already doing some level of model verification and validation, but the documentation (if it exists) is scattered. However, the overall treatment of the V&V problem by the climate modeling centers is in the context of the models being used by the research community. As I’ve stated previously, I do not regard the current level of documentation of these models to be adequate to conduct and assess the research applications.
Even for the research applications, improved documentation (easily accessible in one place and more comprehensive than what is written in journal articles) is needed that:
1. Describes the fundamental continuous equations and the assumptions that are made, and the numerical solution methods for the code.
2. Demonstrate that the code is correctly solving the equation set of the models and that the solution methods are stable and convergent.
3. Describes in detail the model calibration/tuning process and the data that were used in the calibration
4. Describes the code structure, logic, and overall execution procedures.
5. Model strengths ,weaknesses and limitations need to be stated explicitly and unequivocably.
6. Describes in detail how to use the code.
Decision makers and the public may want more in the way of verification and a formal V&V (e.g. Software Quality Assurance), as a way of promoting model robustness or for political reasons. The increasing applications of climate models for policy makes drives a need for V&V documentation that is political and a key element in convincing users of the models and third parties (including skeptics) to assess the material.
The U.S. National Research Council (NRC) report entitled “Models in Environmental Regulatory Decision Making” addresses requirements for models of particular relevance to the U.S. Environmental Protection Agency (EPA). In light of EPA’s policy on greenhouse gas emissions, there is little question that climate models should be included under this rubric. The main issue addressed in the NRC report is summarized in this statement:
“Evaluation of regulatory models also must address a more complex set of trade-offs than evaluation of research models for the same class of models. Regulatory model evaluation must consider how accurately a particular model application represents the system of interest while being reproducible, transparent, and useful for the regulatory decision at hand. It also implies that regulatory models should be managed in a way to enhance models in a timely manner and assist users and others to understand a model’s conceptual basis, assumptions, input data requirements, and life history. . . EPA should continue to develop initiatives to ensure that its regulatory models are as accessible as possible to the broader public and stakeholder community. . . It is most important to highlight the critical model assumptions, particularly the conceptual basis for a model and the sources of significant uncertainty.
The traditional standards and methods of model V&V for the computer software industry and engineering and regulatory applications are designed for specific types of applications of software and models. To make sense for climate models, a sensible and useful V&V protocol needs to consider the nature of the models and their applications. Particular considerations for climate models include their continual and ongoing development, and the need to manage and document the inherent uncertainty.
A tension exists between spending time and resources on V&V on the current model, versus improving the model. While continued model development will further scientific research, it does not seem likely in the short term that model developments will substantially change the models in terms of climate sensitivity or their fidelity in modeling regional climate change. Hence, in terms of policy applications, V&V is justified, even if it means slowing down progress in model development. Depending on the actual formulation of the V&V process, the cost in money and time doesn’t need to be onerous; if the reasons for V&V are political, then presumably the funds will be found.
And finally, there is the issue of semantics when discussing V&V. Oreskes says “evaluation (not validation),” interpreting validation to be something different from that interpreted by engineers and software specialists. By “formal,” I mean something that is well documented (which is presumably something different from what Steve Easterbrook infers from the word.) By “independent,” I do not mean the full industrial version of independent V&V, but rather to enable anyone to independently assess and use the model (including open source/crowd source environments).
So, what shall it be?
- v&v (lowercase) business as usual?
- V&V lite with better documentation and model evaluation?
- fully implemented V&V into the model development process?
The whole post is quite a read. I would suggest though that verification of climate models is over a very short term and therefore ‘linear’, longer term variation is likely highly non-linear and may not be captured at all in todays parameterizations.
Perhaps that comment goes in the several posts ago skeptics arguments thread (I am completely out of time for that kind of fun) but that is my big problem with the models. There is too much self confidence in the parameterization of such a narrow range of temperatures/humidities/ocean currents/climates to even slightly consider that they might be projectable to more extreme ranges.
I must admit, I’ve only spent a few days inside the math of the CAM model, but it is hoplessly unable to predict extreme environments in its current state. There is no possibility for it to predict temperatures on a planet with a dry atmosphere or even a 20C warmer one. The thing is far to linearly parametrized according to far too many assumptions. There are literally dozens of loopholes where feedbacks could be missed. The spacing of hadley cells is a good example. Where does the confidence come from that today’s models can predict even a few C of difference? I sure don’t have any.
Anyway, with all that, we were finally accepted for publication on our Antarctic paper. It turns out that Physicists, Engineers and Mathematicians can read thermometers too.
Jeff congrats on the paper, nice job!
Our deeply flawed anonymous review system still works on occasion!
With kind regards<
I am sorry, but this leaves me quite cold. The only way to validate any model is to have it accurately predict what happens in the future on enough occasions that the results could not have occurred by chance alone. Anything else is not very useful.
If a model is not a useful predictive tool it is as much use as a chocolate teapot.
Gazing at the internal workings and at least making sure it actually does the job that its designers think it does, plus all the stuff with documentation and open source and all that is important. As an IT Manager I expect no less from any bit of software that I would choose to use for anything other than the most trivial applications.
But doing all that stuff (and it is a scandalous indictment of the modelling ‘professionals’ that it hasn’t been as much a part of their culture as drinking coffee or backing up their data :-) ) just gets your model to first base. It is the minimum entry fee.
Only when you have doen all that can your model strut its stuff in the real game, which is to predict the future…and to do so in a way that its accuracy can be assessed. There is no point in a model asserting that we’re all going to hell in a handcart 10 years hence. Nobody who reads that prediction today will be alive to remember it, those who are will be entirely indifferent to it, and any policy maker who takes such a prediction at face value needs their head examining even more than usual.
Models can only be useful and credible if they build up a consistent track record of short-term successes. Of actual real quantified predictions that can be evaluated against real outside world measurements and observations. If a model cannot do this…and do it well, then it is junk..however much it agrees with another model (which is likely also junk).
The analogy here is with racing tipsters. There are a zillion people in the world who think that they can forecast the future to their profit. We call this behaviour gambling. And all gamblers will tell you that they are pretty successful, they break even, they do OK, beat the bookies etc..never that they are completely useless at it and have lost their shirts, shorts or worse. One learns to take their self-assessment with great pinches of sceptical salt.
But the guys who really put their butts on the line are the tipsters in the daily newspapers, on TV pr by private arrangement. They make their living by making forecasts and being seen to be right consistently. No editor employs a tipster for long who never gets a winner in a week, or who can’t distinguish between a flat race and a steeplechase. It is no good him saying to the editor of the Daily Gee Gee that the tipster at Good Morning Punters said much the same as him and so they, despite the fact that their tipped horse came a bad last in a two horse race, are really both doing a good job.
The editor will rightly say that they got it wrong , show the door to Mr Bleeding Dreadful Predictor and welcome with open arms Ms Pretty Often Right.
So with climate models. Agreement with each other proves nothing (how did anybody who calls themself a scientist ever think that it did). The only useful output is accurate predictions…that must be reasonably short term to demonstrate competence and build confidence.
All the rest – all the theories that say ‘models predict’, or ‘we put factor x in our models and out pops the answer we want, so factor x is important’ or even worse..’we don’t need to do experiments because we have models’ – is total garbage if they can’t do this.
And since climate science in general, and AGW theory in particular, rely so heavily on models and few if any of them have ever passed this simple basic test of utility, it casts another huge shadow over the whole field – which is getting darker by the minute.
Aargh..early morning dreadful typo….many apologies,,,bring on more coffee.
‘There is no point in a model asserting that we’re all going to hell in a handcart 10 years hence’. should of course read
‘There is no point in a model asserting that we’re all going to hell in a handcart 100 years hence’.
I’d be delighted to see some 10 year predictions that could be checked by observation.
I abase myself for my error (any chance of an ‘edit’ facility like at the Torygraph??
I enjoyed this Latimer. And don’t worry, when I saw the 10yr to hell in a handcart I assumed it was a typo, but it works. If we do go to hell in a handcart, we’ll be dead, we won’t remember, and those who aren’t dead will have plenty to worry about other than who predicted what 10yrs ago. :)
Surely you mean “100 years hence”! I, oldster tho’ I am, have definite hopes of hanging around for the next 10 years. Don’t you?
A Miracle Has Occurred! (2)
You guys will, I’m sure, be very interested in a recently published peer-reviewed article that gives a cogent explanation of exactly why nobody has been able to answer my Joe Sixpack question about the ability of climatology to forecast the future.
Because we can now move from the argument that ‘there is no evidence, so I do not believe that they are any use’
‘We now have positive evidence that they cannot do the job’
Please see here
I look forward to your considered observations on their conclusions that
‘It is claimed that GCMs provide credible quantitative estimates of future climate change, particularly at continental scales and above. Examining the local performance of the models at 55 points, we found that local projections do not correlate well with observed measurements. Furthermore, we found that the correlation at a large spatial scale, i.e. the contiguous USA, is worse than at the local scale’
which appears to me to be scientific speak for
‘these models are total crap’.
Snow in Copenhagen, Climategate, Confirmation that models are crap…you’d almost conclude that the Gods are trying to tell us something. That their chosen vessels this time come from near Mount Olympus may be no coincidence. ;-)
“If a model is not a useful predictive tool it is as much use as a chocolate teapot.”
I was unaware that climate models were so tasty. :)
Now that I know the models are such delectable morsels, I agree completely. Further, the models on which governments intend to base policy must at least be validated to some reasonable standard of reality.
If climate is chaotic, or spatiotemporal chaotic, then a model will never be predictive. An exception might be if there are chaotic variations superimposed on a trend that isn’t chaotic. In that case long term predictions could be made, but verification of the model probably would take decades of observation. However, even if the climate is absolutely chaotic, a proper model would still accurately model behavior.
I’ll repeat my question ( sheesh I was a brat)
Easy question Dr. Curry. If GCMs are used to asses, base, craft, create, change, construe, delimit,
determine, establish, finesse, form, guage, influence, invalidate, inform, judge, juggle,limit, maintain,
manage…… public policy, you will agree that they should pass a standard test and actually graduate.
No GCM left behind!
I might have evolved in some of my views since then…I’ll read what I wrote and see.
Easy question Dr. Curry. If GCMs are used to asses, base, craft, create,[etc.]…… public policy, you will agree that they should pass a standard test and actually graduate.
Yeah, I don’t like them either. But that’s just because I’m jealous as to how much insight they can give compared to the naive way we theorists like to approach these problems.
So what exactly do you have against GCMs besides possession of a diploma? Huh?
Easy question for you?
You miss my point. I have no issue with GCMs. I have an issue with a democracy of models. So, what I proposed was fewer models, and some sort of standard test to sort the wheat from the chaff. Simply, I would rather have 10 realizations from NCAR’s model than 5 realizations from them and 2 realizations from Bozo U, and etc. I’m not alone in this position as many ( even in the climategate mails) have argued against the democracy of models and since AR4 itself suggested some sort of benchmarking to select models. Next.
I think you are correct, Steve. I would be for nations pooling resources to build a truly super computing platform shared by researchers.
basically the way we would work with a validated model is this.
All reasearchers get a copy of the validated model. You basically have to learn OPC ( other peoples code) if you want to improve the model of course you can but your improvements have to be vetting by the
model validation team and then shared back with everybody else.
Looking at the large collection of GCMs and even the differences in inputs does not engender much confidence in the process.
of course a lot of sharing is going on now, but the huge investment people have in legacy stuff would mean that people would come up with all sorts of irrational arguments against best practices here.
This is basically open source with centralised vetting, which seems to work well enough in the real world (e.g. Linux)
steven mosher | December 2, 2010 at 12:51 pm | Reply
“basically the way we would work with a validated model is this.
All reasearchers get a copy of the validated model. ”
I disagree. I’ve had enough of “one true theory one true model one true policy” nonsense to last me a lifetime.
Bring on teams with solar based models, co2 based models, cloud based models and let’s have ’em all make predictions. Then see who cuts the mustard.
How else are you going to ‘validate’ them?
I don’t recall saying anything about ruling out anything you suggest.
“solar based model” should be a hoot
Sorry, I did indeed miss your point. Yes, Bozo U. needs to be discounted somehow, either by giving them less than a whole vote in the democracy of models or none at all.
@Alston This is basically open source with centralised vetting, which seems to work well enough in the real world (e.g. Linux)
Agreed. Usually one can come up with some objection or improvement to almost any suggestion, but what’s wrong with the Linux model? I can’t see any problem with it for climate modeling, in fact the Linux model is an even better model for climate modeling than for operating systems because of the huge revenue stream the latter can generate, witness Microsoft whose enterprise sales alone can easily support the labor of OS development and maintenance. One neither can nor should try to make a significant amount of money with climate modeling, which is much more of a pro bono academic enterprise than operating systems.
Strongly Agree: The CRITICAL issue is to avoid the “Woops factor” – to identify errors BEFORE they become VERY VERY EXPENSIVE!
See: ‘Programming error’ caused Russian rocket failure
Note how easily I can miss an error after “proofing”!
From my view (not a climate scientist, a mechanical/aero engineer), m thoughts are that better validation should lead to a better product . . . at least on a ‘short term’ global circulation/weather models . . but . . ( . . I knew it, there’s always a ‘but’ . . . :-) ) :
If the nature of solar activity, be it solar wind/cosmic ray issues that can affect cloud development, if low solar activity that we are in now creates an effect high in the atmosphere that alters wind currents/circulation, the best validation of current models (that as I understand don’t include these factors) won’t lead to accurate long term forecasting. And even if they were somehow included in the models, these factors, being variable and somewhat chaotic in nature, still would not likely be accurately captured in models sufficiently to result in accurate long term projections.
Indeed, the solar experts had a rough go of it http://www.swpc.noaa.gov/SolarCycle/SC24/index.html trying to predict the level for cycle 24. If solar activity is found to be fundamental to weather/climate, the predictability of the sun will need to precede accurate climate modeling.
Indeed, and a model containing even one tiny error, but integrated over a long period of time, could end up producing a nonsensical projection. How is that prevented in current models?
I do try not to post much, since I’m here in hard-nosed learning mode, but I really do appreciate your genuine efforts to manoeuvre around the paywall issue
Being too ignorant about parametrization to ask a sophisticated question, I’ll ask a naive one. To what extent are increasing computer power (and thus reduced grid sizes) reducing the need for and impact of parametrizations?
to some extent this helps, especially for cloud and convection parameterizations
to some extent this helps, especially for cloud and convection parameterizations
But only some. Basically it’s a lost cause. Your faster CPU may promise to save the day, but at the end of that day the butterfly effect is even faster.
Vaughan – Aren’t you conflating two issues? The first is the uncertainty associated with parametrizations, which can be reduced if the areas being parametrized are reduced compared in relation to the entire system. The second is the uncertainty associated with chaotic elements in the climate system, but although these elements appear to exist, climate as a whole does not appear to behave in a highly chaotic manner on a global scale averaged over many decades. Indeed, it is rather predictable, and I can state with confidence that it will be colder in January in Fairbanks, Alaska than in Miami, Florida, and this prediction is equally certain to be realized whether I make it on a day that is unseasonably warm or unseasonably cold, or whether it is raining or the sun is shining.
This comment also addresses the question raised by Brian H. Climate is weather averaged over multiple areas and/or long intervals. When the averaging can be made more accurate by more precise delineation of “weather” in each grid, the long term and/or global averaging (climate) becomes more accurate.
climate as a whole does not appear to behave in a highly chaotic manner on a global scale averaged over many decades.
Indeed, and I make this point often myself. But you’re overlooking the context, namely the question of whether more powerful CPUs can help reduce the grid size, which is what Judith was responding to. I don’t see your prediction of a cold January in Fairbanks being improved one iota with a finer grid size.
Some elements of long term global warming are quite predictable, such as we’re probably going to fry. Others such as how probably and when are much harder to compute, although there the problem seems to be less a matter of grid size than other factors like understanding transport of heat in the ocean and underground, the choice of definition of climate sensitivity, the bugs that inevitably creep into every complex climate model, the different emphases and hand-tuned parameters in those models, and so on and so on.
Any grids in those projections are well within the capabilities of a contemporary GPU and are unlikely to see much benefit from advances in computing technology. It’s all in the definitions, physics, oceanography, software, and so on, not in the grid size (as I see it anyway, which may be wrong since I don’t write climate models myself).
The fundamental defense of climate predictors against the “14-day max weather predictability” observation is that the BIG factors rule when you get out into decades and centuries.
Pushing for smaller grid size etc. gives the lie to this argument. Smaller grids are weather, not climate. The BIG factors operate on BIG grids, don’t they? If not, aren’t they really SMALL factors with pretentions?
Climate is not defined by the size of your grid. Madison, Wisconsin has a climate. New York City has a climate.
In discussions of models of climate with climatologists, several have said that the ability of the models to “hindcast” i.e., duplicate past climate temperatures, indicates that they can forecast future temperatures.
My view is that success at “hindcasting” is no guarantee of future predictive accuracy. Do you have a comment?
Hindcasting is necessary, but not sufficient.
Models could use correlation without causation.
These could appear accurate in the short term while giving very wrong long term projections.
Its not a guarantee. But its better than not even being able to do any hindcasting at all.
“Give me five free parameters, and I’ll make the elephant flap his ears!” Always remember that the ideal number of parameters is zero. Every one added narrows the scope and predictive power of the model — because the parameter will vary in real life given half a chance, and time to make its move.
“Give me five free parameters, and I’ll make the elephant flap his ears!” Always remember that the ideal number of parameters is zero. Every one added narrows the scope and predictive power of the model — because the parameter will vary in real life given half a chance, and time to make its move.
At the risk of seeming rude, we have a challenger for the most easily contradicted statement made on Judith Curry’s blog to date.
If the CO2 in the atmosphere were 1%, Earth would be far hotter than at its present level of .039%. Any projection of future temperature on Earth that fails to take that parameter into account will give wildly incorrect results.
Atomic physics currently has north of 23 independent parameters modeling the particle zoo. Granted it would be nice to reduce it to zero, which was basically the hope of people like Eddington and others, but if physics can’t get fewer than 23 parameters for a couple of hundred things that are way smaller than fleas, how can you expect to model Earth, which is far more complex, with zero parameters? It utterly defies logic.
And physicists hate and want desperately to eliminate all 23. That their values cannot be derived is an ongoing “travesty” for the Standard Model.
And physicists hate and want desperately to eliminate all 23. That their values cannot be derived is an ongoing “travesty” for the Standard Model.
There’s a problem with this can be illustrated with a different example. On any planet the potential energy of a body at height h above the surface is mgh. There is one parameter here, namely g, which varies from planet to planet, in fact it even varies as a function of altitude on the same planet. (m and h mean the same everywhere.)
What exactly would it mean to have a model of physics that permited this parameter to be eliminated? It doesn’t even make sense: you have to know the value of g and it’s different at different places.
Now you’ve been assuming the 23 parameters of physics have the same value throughout the universe. If they vary, even slightly, from galaxy to galaxy, it doesn’t even make sense to “want desperately to eliminate all 23.”
Of course, g is a parameter that can be determined by empirical measurement, although it can also be calculated. Can all the GCM parameters be measured? Any of them?
You’re asking the wrong person. I’m just as happy to support logical attacks on GCMs as I am to defend them against illogical attacks like the theory that models should contain no parameters.
GCMs are full of what someone aptly described as “odd arbitrary parameterizations”. Plugs & WAGs. Tunings.
Speaking of parameters, you can adjust the Plug for Global Average Temperature down by about 1.6°C from 1990 onwards:
Nice little step function there, huh?
Thanks for laying out issues.
With trillions of dollars of our tax dollars at stake, I strongly vote for:
3) Fully implemented V&V into the model development process?
The Computational Fluid Dynamics (CFD) community worked through validation with NIST, ASHRE and ASME etc. Since this was commercial software, it used public experimental data and expertise in modeling prescribed configurations.
See CFD-Online for CFD Validation
e.g. ASHRAE® Technical Committee TC 4.10 Indoor Environmental Modeling
With CGWM models, the highest public transparency should be required.
This detail of quantitative public formal validation should be applied to each and every portion of the climate models, not just overall “fit”.
I strongly recommend the Climate Science community implement the strictest standards used in chemical reactor modeling and CFD modeling of aircraft and gas turbines. The software used for these applications is used to make commercial decisions where errors costs hundreds of millions to billions of dollars.
ALL GWP models should at least be subjected to this level of testing.
Much more important is to validate each equation and parametric model.
LBL Radiation Models
The Line by Line radiation modeling community demonstrated professional software validation. e.g.
Miskolczi published the details of HARTCODE with full details.
F.M. Miskolczi et al.: High-resolution atmospheric radiance-transmittance code (HARTCODE). In: Meteorology and Environmental Sciences Proc. of the Course on Physical Climatology and Meteorology for Environmental Application. World Scientific Publishing Co. Inc., Singapore, 1990.
Miskolczi quantitatively tested the software against data. e.g.
Rizzi-Matricardi-Miskolczi: Simulation of uplooking and downlooking high- resolution radiance spectra with two different radiative transfer models. Applied Optics, Vol. 41. No. 6, 2002.
All major Line by Line models participated in a public intercomparison:
Kratz-Mlynczak-Mertens-Brindley-Gordley-Torres-Miskolczi-Turner: An inter-comparison of far-infrared line-by-line radiative transfer models. Journal of Quantitative Spectroscopy & Radiative Transfer No. 90, 2005.
Public forecasting standards
The GWM models are used for major public policy. I support the arguments that they should meet the most stringent standards for public forecasting. e.g. the research and modeling standards used for medical research should be applied to climate models.
See: GLOBAL WARMING: FORECASTS BY SCIENTISTS
VERSUS SCIENTIFIC FORECASTS by Kesten C. Green and J. Scott Armstrong
Testing GWM model results
The CSIRO used its climate model to predict increasing drought.
David Stockwell of Niche Modeling, fit the model to half the historical data and tested the projections against the rest of the historical data. He showed the CSIRO models had their predictions backwards – predicting drought instead of increasing precipitation.
Stockwell reviews this and similar validation testing of climate models.
Errors of Global Warming Effects Modeling
If one skeptic can so easily shred climate model results, then I have little confidence in published results, compared to the detailed validation required and performed in the engineering industries.
Keep up your efforts to move the climate science to the highest standards of professional scientific modeling and forecasting.
Yep, me too. If only to see people who believe nature has to confirm to the currently formulated laws of physics squirm.
The following is from Feynman’s report on the Shuttle disaster (thanks to a previous poster on another thread for mentioning it):
“The software is checked very carefully in a bottom-up fashion. First, each new line of code is checked, then sections of code or modules with special functions are verified. The scope is increased step by step until the new changes are incorporated into a complete system and checked. This complete output is considered the final product, newly released. But completely independently there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product.”
I would draw specific attention to the independent verification group, purposely taking an “adversary attitude”.
When I was in commercial (not scientific) programming, our training was all about taking an adversary attitude, even to our own code and our interpretation of the specifications for it. The idea wasn’t to confirm code worked or the interpretation was correct, so much as to do one’s damnedest to smash the code and show the interpretation was pants.
GCMs need to be tested by those with this hostile attitude. That’s why you need sceptics on board, and their being a recognised part of the team. Engineers and software people know this. V&V would take care of itself were suitably-qualified sceptics given a recognised role as devil’s advocates.
GCMs need to be tested by those with this hostile attitude.
Why would anyone hostile to GCMs even bother to test them? Why not just declare them the devil’s spawn and have nothing to do with them?
I didn’t say: “hostile to GCM’s”. I referred to a “hostile attitude”. I don’t know if you’ve been on the receiving end of a software audit, but the guys who do that for a living can be extremely hostile. Their doing that helps produce a rock-solid product.
He’s referring to the Q/A step.
Some of us made a life out of doing hostile testing. It’s called a red team review or tiger team or murder board. These folks specialize in poking holes in other peoples work, finding little errors. Essential. In commercial organizations some organizations turn testing over to the support organization. The people who have to take the problem calls are then motivated to find problems. “hostile” refers to the attitude you take in your testing. Your goal is to find the mistake, find the problem, the hole, the place were the model breaks down. Very often you find nothing. I think I found two errors in 5 years. That was success. It’s a mind set you are not used to. It’s usually cultivated in organizations that do life and death work.
Financial auditors should do this too. When they forget – or have their attention deliberately distracted – you can easily end up with accounting disasters like Enron.
Professional people in any discipline should recognise that (in most cases *) external audit/review/assessment is part of the system that exists to ensure quality and honesty in the work of a whole area. It is not a personal insult to have one’s work checked..if it has been done well it will pass with credit, and your reputation ad credibility enhanced.
Of course there is always a tension between the checker and the checkee…that is how the system is supposed to work. I’ve done both and both sides can be an occasionally uncomfortable place to be.
But it is very juvenile, kindergarten-like behaviour to refuse to show one’s work and very naive to try to actively subvert the checkers. If nothing else it just raises suspicions that all is not well.
* This can of course be taken too far by centralising bureaucracies. We had the misfortune in UK to suffer under one of the more egregious regimes for the last ten years. But taking anything to excess can be bad for you..doesn’t mean a small dose can’t be beneficial.
I did IV&V work for NASA on the HST and Shuttle systems for several years. Managed to find more than a few “holes” in the systems. The problem wasn’t so much the “holes” as getting mangement to believe them and pay attention . Had the same problem on the Terra spacecraft – a command that shuts off not only the telemetry transmitter but the command receiver is definitely a “hole”. But getting it fixed was a major fight with the system engineering manager.
Power engineers use the “belt and suspenders” approach.
You make sure it absolutely works all the time no matter what, no excuses. You neck is on the line all the time.
Chernoble was caused by the opposite attitude.
Present climate models are too close to Chernoble practices, yet we are asked to risk far higher funds from “We the People”.
Return from alarmism to sanity.
That reminds me of the now infamous (and I paraphrase somewhat)
“Why should I give you my code when all you want to do is pick holes in it.”
Wow. My (admittedly frivolous) question seems to have generated more unity than typical for this blog. I’m glad I asked it. ;)
Adversarial checking need not be confined to humans. The model checking approach to software verification pioneered by Clarke, Emerson, and Sifakis, for which they received the Turing award in 2007, can be considered an automated exhaustive form of adversarial testing, in which the model checker considers every possible state to ensure that no bad scenario goes unnoticed. It might be interesting to bring model checking technology to bear on climate modeling software. (My own research on program verification started in 1976 when I introduced dynamic logic, a modal logic of programs, the year before Pnueli introduced temporal logic, another modal logic of programs. Dynamic logic was the first such system to base its semantics on Kripke structures, which model checking also relies on if not always under that name.)
Sounds great in theory, and probably brilliant for writing a learned paper about. maybe even getting a prize from fellow academics.
But does your wonderful intellectual edifice ever get down to actual practical stuff that I can give to Joanne the programmer to use and help her to write good code when she’s doing invoices? Or programming climate models?
So gentle a question, one can’t but provide starting points:
My question was not about V&V in general, with which I am tolerably familiar after 30 years in IT, but with Mr Pratt’s ‘dynamic logic’ in particular.
Sorry if I wasn’t clear enough. As a commercial/industrial IT specialist, I am not so much interested in formal academic methods, but in useful tools that Joe Sixpack could use to help get effective code out there to my users better and more quickly.
Sorry for the late reply. I just noticed yours.
I’m not sure if now you’re asking for the same thing as you previously did, but here is an example of model checker that can be extended to include dynamic logic:
Alloy is said to be a subset of Z, which is a formal specification language that has some industrial application.
I concede readily that it might not answer the needs of Joe Sixpack. If you want to get code out fast, throw in Ruby coders and agility specialists. If you wish to build mission-critical stuff, you might want to verify your program with a formal specification. Heck, you might even want to specify your program directly into modal logic.
Somehow, I get the feeling that climate models are not invoices.
You’re right on the last point.
Invoices get paid no attention (in any sense) if they are not accurate. No such ‘usefulness test’ seems to be applied in climatology.
There is a rumour that with the next OSX release there will be a screensaver that, when invoked, will show verification computations of the climate models of your choice. It’s only a rumour, though:
> How quick come the reasons for approving what we like!
Vaughan, this sounds interesting to me, do you have any web links that explain this in more detail?
Even more intresting is that you can get model checkers to work in a 3-valued logic (like your Italian flag), and Easterbrook was one of the co-authors on the papers. Eg.
But I’m sure he’ll show up here with some argument from authority about why this doesn’t apply to climate models.
Hi, Judy. The Wikipedia article on model checking has many references.
I should perhaps clarify my position on climate models. I think they’re useful as a tool for understanding a good many of the great many mechanisms that contribute to climate. For example modeling the operation of the Hadley cells would be an excellent case study for climate modeling if it hasn’t already been done. (Has it? If not then it’s an outstanding project for an enterprising grad student.) One question I’d love to see answered is how sensitive is the number of cells (three per hemisphere on Earth) to the relevant parameters such as temperature, pressure, and rate of rotation of the planet.
But I also think our understanding of those mechanisms is so incomplete today that they can’t be relied on to make projections that are as good as the projections made simply by extrapolating the observed changes in temperature, CO2, ocean pH, etc. Models go to a huge amount of effort to derive wrong answers when more accurate answers can be had with far less effort using simple extrapolation.
One might object that extrapolation must become unreliable by 2060. Does anyone have or know of a model that promises more accurate results half a century hence than naive extrapolation based on simple sensible principles? If so I would really love to know how the model is able to beat naive extrapolation.
For example modeling the operation of the Hadley cells would be an excellent case study for climate modeling if it hasn’t already been done.
There are one or two odd things there which an academic climatologist may think irrelevant.
Thanks, I’d noticed that early while browsing around on your site. However what I had in mind was a naive Navier-Stokes model that accurately predicted the velocity of the jet stream. Seems to me this ought to be quite feasible, compared to some climate modeling questions.
“Does anyone have or know of a model that promises more accurate results half a century hence than naive extrapolation based on simple sensible principles?”
Would your “simple sensible principles” include oceanic oscillations?
You read my mind. Or noticed me talking about AMO recently. (But not ENSO, which is too short term to be relevant to half-century projections.)
Absolutely! The first step after a new version of a software product is built is to hand it to the testers and tell them to BREAK IT!
With ‘reward’ for each break they can make. The more ingenious the better.
This does require that somebody knows and defines what “break it” means.
If the tester makes the model spew out a minus 7C temperature global average decrease in 10 years – did he break it?
If she’d set the initial conditions to those prevailing in calendar year 2000 (using raw data, not ‘adjusted’), and the model came up with a -7C change predicted for 2010, then we can go and measure (using raw data not ‘adjusted’) whether that prediction is correct.
Funnily enough, In UK for November/December it might just be right, as global warming has brought us the coldest autumn for a generation, but we are told repeatedly that it is ‘the warmest year ever’ (‘adjusted’ data).
So we could reasonably conclude that the model was no frigging use at all. ‘Broken’ if you will.
If however it were to have accurately predicted the observed (raw data, not ‘adjusted’) change, then it would get another chance at the table. This game is vicious. It is not a simple one of ‘everybody gets a gold star for trying nicely – or is awarded Mrs Joyfull’s prize for Raffia Work’. Fail badly – and -7C is very badly – and you’re out.
Funnily enough, In UK for November/December it might just be right, as global warming has brought us the coldest autumn for a generation, but we are told repeatedly that it is ‘the warmest year ever’ (‘adjusted’ data).
Sorry, but can you point me to where anyone has claimed this is the warmest year ever in the UK?
Nobody ever has AFAIK.
But it would be an extremely brave or extremely foolhardy politician who stood on College Green tomorrow and announced yet another tranche of scarce spare cash from UK taxpayers to ‘combat global warming’.
And the next idiot who tells us all that it was the warmest year ever will soon be told to put his head where the sun don’t shine.
The nice lady from the Met Office who assured us that the cold spell was all due to global warming is unlikely to be asked back to Breakfast Time on TV – unless they want a successor to Roland Rat to give everyone a good belly laugh in the mornings.
And very few apart from committed warmists now believe anything the CRU tell us about anything. If they told me their postcode, I’d make sure to double check it with the Post Office.
Well it would be nice if we could all just refuse to believe things which were inconvenient and they would just go away. But never mind, we can go with GISS instead – they are likely to show this year the hottest on record rather than level with 1998 as CRU is likely to show.
To argue that people should not point it out because the public might believe it or it might be unpopular to tell them is just silly, and frankly irrelevant to the point.
But practical politics won’t go away either.
Personally I am completely indifferent whether it is the warmest average year ever anyway. All I was pointing out is that to the average Brit such an argument has about as much relevance at the moment as whether the Moon is or is not made of Green Cheese.
You can stand on a soapbox and holler at us much as you like. It won’t make the slightest difference.
You may regret this fact. You may think people are short-sighted, you may believe them to be muddle headed or plain wrong. You may believe that I am the Devil’s spawn for saying so, Your beliefs on this matter are no more or less important than mine.
But we’re not going to deliver up any more resources to satisfy your consciences unless you can do something more appealing than hollering and hectoring, and telling us all what bad people we have been – according to your judgmental standards.
PS: I first saw this 30 years ago, but it’s still as true as ever:
Elfn’Safety would ban all of the designs..and chop down the tree in case a leaf fell on a passing toddler’s head.
Even if this was avoided, there would be a three day compulsory training course with annual re-evaluation before any kiddie was allowed near the plaything – and then only supervised with a ratio of two adults per child, plus a third to make sure that the two weren’t conspiring to harm another wee bairn.
Judith: You mangled the link to my alleged objection in your first paragraph. Which is a pity, because if anyone were able to follow the link to my blog post they would see I don’t object to the statement “[We need] Fully documented verification and validation of climate models”. Indeed better documentation of the V&V practices is exactly what my current research project aims to do.
What I object to is the idea that industrial processes that are used as risk reduction strategies for safety-critical systems should be brought in, because these would be a waste of time (for a community that is already far too small for all the demands that are made on it).
I also object to calling what climate modeling centres do as “V&V-lite” or “v&v lowercase”, whatever the hell these are supposed to mean (you don’t define them), when you cannot clearly articulate what V&V strategies the various climate modelling centers currently use on a day-to-day basis.
GCMs may not be safety-critical, but they sure as heck are critical to the case for AGW and to policy issues. Their developers need to get it right, and they can learn a great deal from engineers and software professionals. Just as the paleo folk could learn a lot from dedicated statisticians.
Trillions are at stake. It’s in GCM developers’ interests to make sure their product is fit for purpose, and seen to be so. Like it or not, there is considerable scepticism about that.
Academia has traditionally been insulated against what it might see as interference; and I’m sure there’s some merit in arguing that money is an issue in what may be perceived as unnecessary luxuries. Maybe funding needs to factor in appropriate extra costs. So it could well be the funders who need a kick up the backside, who knows. Most universities have computer science departments, and there may be some scope for collaboration there, I can’t really judge – it would depend to what extent those departments focus on commercial/industrial quality control issues.
Maybe you can’t see how amateur GCM development, lacking proper documentation, rigorous and truly independent testing, version control, data archiving, etc. appears to be, but any software developer can. It is extraordinarily damaging to the credibility of climate science.
If the modellers want to do their damnedest to ensure the best chance of scepticism winning the day, they should carry on with business as usual. They shouldn’t look for ways of making things more professional. They should ignore the advice of professionals in relevant areas. They should remain convinced that these kinds of issues are irrelevant, impractical, and too expensive.
The sooner that climatologists get off Spaceship Gaia and rejoin the rest of us here on Planet Earth, the better.
If the climate modeling is not done professionally, with the usual – or equivalent – checks that guarantee that the result doesn’t suffer from easy-to-find errors, then the results of this climate modeling are really worthless and it is a waste of money to invest a single penny to climate modeling. One can’t trust such results.
On the other hand, the results of such modeling are used to defend trillions of dollars of investments into new technology, and bans of the old and well-established ones. And the research into climate change is getting lots of money, too. There are billions of dollars going into climate change science every years.
I can assure you that for much less than a billion of dollars, you may get professional teams that actually do the work correctly. It is totally preposterous to claim that the documentation for a couple of climate models cannot be paid from billions of dollars per year. Every other serious discipline is doing the same work for much less money – take e.g. the analyses of trillions of collisions at the LHC. The construction costs were $10 billion but the annual spending on the LHC is much less than for the climate science. Still, the people can do it right and write lots of papers that satisfy all the required standards.
The money is being wasted in climate science simply because most of the people – well over 90% – employed in the climate science are simply not up to their job. The mankind would save immense resources if they were simply fired.
The entire enterprise suffers fatally from the DIY exclusivity of the core “researchers”. There exist professional statisticians, modelers, forecasters, etc. aplenty in the real world.
You will look for them in vain in the “Team”.
One could speculate about the reasons for that.
One reason might be that no self-respecting professional would want to join such an amateur organisation, lest their credibility with their peers sinks like a stone. I certainly wouldn’t want to join a crew who think validation and verification of models is a waste of time. The largest distance between me and them possible please.
Another might be that they are not invited to join. We know that climatologists are completely paranoid about ‘outside’ scrutiny of their work. Perhaps letting experienced professionals into the tribe would be like letting a fox loose in a hen house.
Or maybe its just one of those strange coincidences.
“GCMs may not be safety-critical, but they sure as heck are critical to the case for AGW and to policy issues.”
I don’t find this assertion to be obvious.
While I wouldn’t want to say that GCMs are irrelevant, the ways in which GCMs are actually used in policy debates, and the ways in which they rationally ought to feed into policy debates are not obvious and are are very much worth considering.
I agree with this statement, this was going to be part of my decision making under climate uncertainty part II, which i hope to get to next week.
The IPCC is not coy, or circumspect, about the use models for their projections, nor about the goal to influence policy.
Steve, a critical element of V&V is the documentation. Show me the document. And explain to me what it is ok for me and other scientists not to be able to find documentation about a key element of the atmospheric core model (e.g. my point on the continuity equation). And also please explain to me why climate models are fit for purposes like like making policy related to CO2 endangerment findings based on climate model results, and regional planning based on climate model results, and where such documents exist discussing the tests and assessments of fitness for purpose (uncertainties, limitations, etc).
Agreed 200%. This post on Steve’s site is particularly troublesome. This statement illustrates a huge problem:
We don’t know what’s computationally feasible (until we try it). We don’t know what will be scientifically useful. So we can’t write a specification, nor explain the requirements to someone who doesn’t have a high level of domain expertise.
Constructing a specification is more than a communications device for the software developers, it also serves to structure and validate the requirements for those providing them. An idea in your head almost always tends to be more complete and coherent than the same idea committed to paper (in the beginning).
Allowing for the fact that this code is investigational in nature and is developed iteratively, the requirements process at least helps put a box around what it is firm and what requirements, assumptions, etc. are still “under construction”.
Start with conservation of mass, and conservation of energy.
Those should be “Useful”.
Rigorous modeling of clouds would then “help” considering that clouds along can make the climate warm or cool – and current satellite measurements are insufficiently accurate to quantitatively track cloud changes to that degree!
For something “useful”, recommend using Essenhigh’s quantitative thermodynamic formulation for Standard Atmosphere Profiles. It is almost completely “closed”.
Compare the average standard atmosphere profiles of climate models and look at first and second differences to see how to improve both.
Prediction of the Standard Atmosphere Profiles of Temperature,
Pressure, and Density with Height for the Lower Atmosphere by
Solution of the (S−S) Integral Equations of Transfer and Evaluation
of the Potential for Profile Perturbation by Combustion Emissions
Robert H. Essenhigh
Energy Fuels, 2006, 20 (3), 1057-1067 • DOI: 10.1021/ef050276y
Then combine Essenhigh with Line By Line LBL radiation codes.
“object to is the idea that industrial processes that are used as risk reduction strategies for safety-critical systems should be brought in,”
Time to get out of the sandbox into the real world.
It your models cannot stand up to the most rigorous commercial standards and most exacting professional science standards, they should not be used for ANY policy decisions where we are being asked to spend $65 trillion etc.
At least 10% of the global “climate mitigation budget” and 50-70% of the programming budget should be put into “kicking the tires”, verification, validation, and making it “bullet proof” from the ground up. Doctoral students just learning how to program does NOT cut it.
It must be made not just “idiot proof” but “phd proof”.
See Willis Eschenbach Testing … testing … is this model powered up?
It appears that NONE of the existing climate models are up to even this 95% accuracy test.
From comments on this thread, it appears that NONE of the models have the detailed verification and validation documentation posted.
Conclusion – Climate models are NOT any where close to commercial standards, let alone prime time public policy quality requiring much higher expenditures.
Get serious – read the HarryReadme file. That reveals fraudulent data files with endemic corruption. Do that in financial world and you get a ticket straight to jail.
We require reliable results not fiddling, fudging and fraud!!!
I really don’t give a monkeys whether it wastes the time of a community that is already far too small for all the demands that are made on it. They’ll just have to get used to it, poor dears.
But it might save the rest of us – who are paying for the bloody things in the first place – from making some very every expensive mistakes. On the precautionary principle, so beloved of Alarmists, if there is the slightest chance of a bad thing happening we have to take all possible precautions. Auditing and metaphorically ‘destroying’ the existing climate models so that they can be done properly is entirely in keeping.
It is not as though climatology as a whole already has a spotless record of adopting professional best practice in all the fields that it tries to work in. Quite the reverse, so this could be a most revealing pioneering step.
Next week….How CRU could back up their valuable data without using just an old broom cupboard and Dr Jones’ memory.
Panel and Multivariate Methods for Tests of Trend Equivalence in Climate Data Series
McKitrick, McIntyre & Herman, 2010, In press, Atmospheric Science Letters
For models to diverge that seriously from the data gives us little confidence in them.
Time to start from the ground up and focus on getting a few models right – with complete public documentation, verification, validation and professional kicking the tires.
Since Steve was the lead scientist for NASA’s IV&V facility and has spent considerable time in the recent past actually observing what goes on at climate modeling centers, this increasingly resembles one of the Wiki farces where the person who is the world’s expert on something gets trashed by the amateurs for not know what she is doing.
And exactly what does Steve’s experience have to do with the need of improved climate model documentation that is normally associated with V&V? My post articulated this need by climate researchers and the broader political need/interest in such documentation. “trust us we’re the experts and we’re doing a good day to day job with model” does not meet this need.
Not sure if Eli is referring to you Judith, or the howling hordes, whose self righteous confidence in their own ability seems to be inversely proportional to their actual knowledge, your blog seems to attract.
From Steve Easterbrook’s own blog today
‘part of the problem is that climate models are now so complex that nobody really understands the entire model’
(Full article here: http://www.easterbrook.ca/steve/?p=2062)
from which I, as a slightly better than layman in these matters, do not conclude that the right path forward is to forge ahead with no documentation at all.
Anybody who worked on Y2K problems will remember the difficulties of tinkering with old undocumented COBOL and things. But they just did banking and stuff…nothing important compared witrh predicting the entire future of humanity.
Of course it is the right thing to use a set of diverse models that nobody understands any longer, are deliberately not documented and have never been proved to forecast anything correctly!
You guys really really must think that the world outside climatology is populated by idiots. Logs and splinters mon brave.
But those who live in greenhouses shouldn’t throw stones.
So, who understands current operating systems or other complex software, but we use them (and curse at them on occasion).
Well for one thing, Steve is asserting that much of the documentation you are asking for for current models is available and being done. You appear to be arguing about the situation in the past. A place to start is in13: software engineering for climate modeling at AGU.
Assertions may or not be fine, but let’s not forget the past.
The process by which software was developed has an effect on its current quality. This is a consensus SQA assumption. So issues about past climate model development processes are a legitimate quality issue today.
The usual remedy (barring a do-over) is ad hoc SQA remediation. This is not easy. Of course, timely documentation that should have been produced, but wasn’t, can’t be directly “remediated.” That would be forgery. Other reasonable and independently justifiable objective evidence of a quality process having been performed must be generated.
I think Steve has a difficult task in front of him. But the climate models are crucial and I wish him success.
Well then, Eli, he should be uniquely positioned to comment on that paper and what it says about the validity of those climate models for making projections – particularly since he asserts here: Another form of model intercomparison is the use of model ensembles (Collins, 2007), which increasingly provide a more robust prediction system than single models runs, but which also play an important role in model validation
Steve has made some very interesting statements that I’d love to see defended. One regarding V&V tools is that those “that focus on specifications don’t make much sense, because there are no detailed specifications of climate models (nor can there be – they’re built by iterative refinement like agile software development). “ I’ve done a fair bit of iterative development and the idea that it precludes the development of specifications is pretty incredible.
In this article, he states “There does not need to be any clear relationship between the calculational system and the observational system – I didn’t include such a relationship in my diagram. For example, climate models can be run in configurations that don’t match the real world at all: e.g. a waterworld with no landmasses, or a world in which interesting things are varied: the tilt of the pole, the composition of the atmosphere, etc.” which is not quite right – the ultimate configuration of that model run may not match the real world, but the physics of the model components must.
I do agree with his assertion that independent V&V is probably overkill (provided that the documentation of the internal V&V is complete and available), but much more definition of what’s done and how the standards were determined is needed.
And as far as amateurs go, I think you’ll find that there is a fair amount of professional development experience on here, even if it isn’t all in the scientific arena – some aspects of the discipline are universal.
Appeal to authority does not justify models being off by 2X!
With respect to what metric precisely?
The Eli, Professor Easterbrook’s discussions so far have been nothing more that arguments from authority, with his self-refernces as a basis. Those on us who have direct experience and expertise in the V&V and SQA aspects of engineering and scientific software, almost none of it being related to commercial software, know that Professor Easterbrook has so far neglected to cite a single source from the peer-reviewed literature. A history of 25+ years of peer-reviewed literature seems to have been swept down a rabbit hole.
His citation of Oreskes is especially egregious. That paper has been refuted in the peer-reviewed literature by almost everyone who has experience in engineering and scientific software.
Now we have The Eli hopping along to offer up a secondary, higher-order insistence that a position of authority is all that’s needed to ensure that The Truth has been revealed. Given the extremely lowly regard of arguments from position of authority, a higher-order version of such an argument is basically worthless.
Professor Easterbrook has launched two strawmen; commercial software and real-time flight-safety critical software. Those strawmen are solely the invention of Professor Easterbrook. No one else has mentioned them.
Let’s go to the peer-reviewed literature, ( where have I heard that before ) and get back to the proven and successful fundamentals of V&V and SQA for engineering and scientific software.
I am not seeing much mention of programming source code and version control. That is always a considered a critical component in commercial software development. In a field such as climate modeling with continuous changes occurring, keeping track of that material should be a very high priority. There are many very good and inexpensive software tools for performing that job. Those tools have the advantage that source code for any desired version can be retrieved nearly instantaneously and provided for review by outside groups.
Go look at ModelE. or the the code at MIT
While the modeling centers were surprisingly slow to take up version control, that particular criticism is quite stale.
In particular I can state that NCAR’s CCSM is maintained and distributed in svn.
“While the modeling centers were surprisingly slow to take up version control, that particular criticism is quite stale.”
I very glad to hear the ‘complaint’ is stale. It is, however, something that should never be left out of any description of V&V. The original comment was triggered simply because code and version control is the first thing set up in commercial software development, not simply an afterthought. … Just thought it would be nice for Dr. Curry to slip a mention of it into her discussion.
GaryW: Unfortunately, Judith has explicitly stated that she doesn’t think the day-to-day practices of the modellers is relevant to V&V. Which is a pity, because they really are at the heart of good software V&V. All the modeling labs I’ve visited have:
– all, or almost all the code under version control using SVN. One lab I visited is using GIT.
– bug tracking systems and electronic discussion boards for reporting and managing software defects (although they are not always used as systematically as I would like)
– automated continuous integration tools, which run standard tests on the trunk either every night, or every few days (the frequency varies from lab to lab).
– a controlled release process with major and minor releases (although the numbering used for releases isn’t always clear enough on this)
– a regression test process that is used systematically to examine the effect of every code modification (no matter how minor) on the entire model.
– a code review process (although I think it’s generally too informal – some labs do this much better than others)
– clear coding style guidelines, and in some cases, automated tools to test conformance to the style guidelines
– and some labs are experimenting with additional tools such as Doxygen, Redmine, etc.
There’s lots of scope for further improvement, but all the labs have an excellent basis for good code management practices.
But of course Judith doesn’t think this is V&V, and can’t be bothered to find out what actually happens at modelling labs before criticizing them. Which then leads to many people in this comment thread saying “I can’t believe they don’t do X”, based on Judith’s vague criticisms, when of course they are doing X. This is how the echo chamber works to spread disinformation, and I believe Judith is being very irresponsible in encouraging it.
How does she “encourage” it? Somebody who hasnt looked asks a question about version control and those of us who have either looked at the code ( for about 3 years) or worked with the code, chime in to answer the question.
If you want to do a service, then pick an an example ( I would suggest MIT or NCAR) and explain the process they use. In depth. Those of us with some experience in working with code will second the good points you make ( use you as a reference as well) and engage in areas of disagreement if there are any. This aint about Judith no matter how much your nose is out of joint. Its about software engineering.
Great. Fantastic. Super. Some good processes involved in the creation of the models. Others (eg Judith Curry) differ in their interpretation, but lets leave that aside for a moment.
Now what about the important bit, not the focus on how to manufacture the tool..but how to use it to forecast the future and to demonstrate that those forecasts have some validity.
I’m more interested in practical real world predictions and their agreement with observations than I am in academic discussion about ‘what do we mean by verify?’.
So where are the results that show that these models have some value. Actual predictions that were made and verified by observation. That are done consistently by the same model?
Sorry – forgot to add.. and all the predictions that the models made that weren’t verified by observation as well.
Silence? No reply? N/A?
Or do you just need more time to sift through the extensive pile of success stories to find the one most exactly suited to answer my question?
Wow. Another couple of days have gone by.
That must be quite a pile of success stories that you have that it’s taking you so long to select the best to tell us about.
Or did somebody from CRU take them all home so that they are now (if not conveniently and irrevocably lost), covered by non-existent confidentiality agreements?
Perhaps I should try an FOI request….the official answer ‘we cannot show you any modelling success stories because they don’t exist’ would be a very interesting story for Watts, Bishop Hill, CiF etc…………
If you’re really interested, visit PCMDI, get the model output from the CMIP3 archive, and figure out yourself if climate models are any good. You have about 30 TB of data to choose from. Do some learning. As “a slightly better than layman in these matters”, it ought to be trivial for you.
“But of course Judith doesn’t think this is V&V”
And for good reason, only 3 of your 8 points fall into your Venn diagram detailing the difference between Validation and Verification and none fall in the pure validation space. The other 5 points may be good practice and should enhance V&V efforts, but they aren’t V&V in and of themselves.
What happens on a day to day basis is what happens. I am talking about the overall strategy and the documentation.
“Unfortunately, Judith has explicitly stated that she doesn’t think the day-to-day practices of the modellers is relevant to V&V. ”
Yes, it is hard to get a concept across sometimes without being able to wave my hands. The actual point is that source code and version control is an integral component of V&V. It is pretty hard to justify saying you have verified a piece of software if you cannot definitively say what code you are actually running.
Having good source control does, without a doubt, enhance a shop’s V&V practices. It is not, however, sufficient to point to source control as evidence of V&V. You can check in bad code just as easily as good.
Hmm… Are you trying to say that source code control and version control is useless? Or are you simply saying that version management software does qualify as the sum total of V&V? I certainly cannot imagine anyone that would disagree that there is very much more to V&V than version management and source code control.
Certainly full NASA shuttle computer V&V would be inappropriate the vast majority of software development projects. As anyone who has been in manufacturing can tell you, you cannot inspect quality into a product. V&V cannot assure a usable software product.
V&V represents a set methods that are used to help good developers produce reliable and maintainable software products. Source code and version control is as integral to V&V of a software product as temperature is to climate studies. If you are not controlling change access to your source code and keeping track of which versions were produced, you cannot claim V&V of your product has been performed, no matter how many meetings, documents, or test you have logged.
Now then, the point I was trying to make for Dr. Curry is that source code and version control is not a difficult thing to achieve. Good coding shops are already using the tools necessary to perform that function. However, because it is commonly used, mostly from simple common sense, does not mean it is merely one way of doing business and may be ignored in discussion of V&V. I should always be mentioned. It is well enough known that a single bullet item is usually all that is necessary. But it should never be overlooked. That would be an amateurish mistake.
“I should always be mentioned.”
“It should always be mentioned.”
You can mention me if you want but I doubt that would count for anything significant.
I think if you look back at my comment, you’ll see that the word “useless” does not appear. My point was that absence of source control would be significant, but presence of it says nothing meaninful about the V&V process of a particular development shop. Even Easterbrook doesn’t list it as a V&V activity in his post on the subject.
For public perception of version control in climate science, see the HarryReadme File
You now have to bend over backwards and do triple flips to prove that your models are any where close to professional quality, – let alone to the highest commercial duty sufficient to justify spending tens of trillions of dollars out of the pockets of We the People.
To date, the disreprency between data and models is not encouraging to say the least!
The Harry Readme has absolutely nothing to do with GCMs.
As to what the right proportion is between modeling and public risk (and there are risks either way, after all, or we wouldn’t be talking about this) I am inclined to agree that the effort to date has been inadequate. Of course, those most loudly claiming the field’s inadequacies seem to be the quickest to want to defund it. It’s peculiar, almost as if some people wanted to be unreasonable.
In the UK, most people working in the public sector are either in the process of re-applying for their own jobs, or have already lost them.
When the climate modelers are asked why we should carry on spending large amounts of public wonga on their models, do you think they’d be well advised to have some solid documentation to show or do you think they’ll get away with some arm waving about what they do infornally on a day to day basis?
Extend that to ‘climatologists’ in general, and we might have a quorum.
What did climatologists ever do for us?
(For those without a sense of irony, this is a reference to a famous scene from The Life of Brian: ‘What did the Romans Ever Do for Us’. You can watch it on YouTube.)
‘The Harry Readme has absolutely nothing to do with GCMs.’
Well, this Joe Sixpack might take quite a bit of persuading that that is the case. HRM graphically lays bare the poor standard of work that seems to be common in climatology. Without further evidence – and this thread has not managed to build any confidence at all that the GCM modellers aspire to any greater professional standards than Hapless Hopeless Harry – then I’m inclined to believe that it has everything to do with GCMs as well.
If you have any evidence to support this other than your inclination, feel free to provide it. Pending a satisfactory response I am “inclined” to consider your claim as empty bluster.
There have been 126 entries on this blog. None of them, as yet, contain any evidence of the high professional standards that GCM modellers aspire to.
Indeed, the only guy who is actually explicitly professionally involved (Steve Easterbrook) spent his first few contributions arguing that worrying about such things would be unnecessary and a waste of time.
We *do* have explicit written evidence from HRM that shows how poor the standards in a very closely related lab are….the GC modellers must use Harry’s ‘adjusted’ data as input to their models FFS.
So I think I’ll be more inclined to believe my earlier remarks than your bald assertion, with no evidence whatsoever, that the one has nothing to do with the other.
Quite easy for you to demonstrate that I am completely wrong…just publish the standards manual for professional GCMers, and the compliance statements and audits that go with them….just like the rest of us in important IT work have sometimes to do.
Dr Tobis said two things:
“The Harry Readme has absolutely nothing to do with GCMs:” this is demonstrably true, “Harry” was a database programmer dealing with the CRU time series, not a global climate model.
As regards the quality of the models, he said he is “inclined to agree that the effort to date has been inadequate,” which seems to support what you are saying. Note the use of the key word “agree.”
1. Re Harry: see my remarks above and below this. Especially about the ability of a badly organised IT shop to produce high-quality product.
2. I don’t think that Michael Tobis actually wrote anything about the quality of the models. Had he done so, he and I would have most likely in some form of agreement. Perhaps with a major difference in emphasis, but at least on the same side. That the quality has never been shown to be anywhere near fit for the purposes for which they are used.
What he actually said was
‘As to what the right proportion is between modeling and public risk (and there are risks either way, after all, or we wouldn’t be talking about this) I am inclined to agree that the effort to date has been inadequate’.
H’mmm. I’m not sure I understand that at all. Maybe my error, but I’d welcome a clarification.
He’s conceding that the Team has been playing with global survival without a safety net for anyone concerned.
I take it from that you have not deemed it necessary to read “Harryreadme.txt” or if you have which datasets was he referring too?
Latimer my reply was to Tobias not you
Apologies John – my bad.
But I hope my subsequent remarks served to emphasise the point we are both making about sow’s ears and silk purses.
Thanks for your remark.
As far as I am aware I read everything in HarryReadMe as soon as it was published.
After I had picked myself up from the floor, open-mouthed in astonishment that anything so amateur should have been done within about a zillion miles from the ‘Climatic Research Unit’, supposed guardian of the most important data in the history of humanity – or some other such drivel, I read the rest of the Climategate stuff with its sorry tale of mislaying/deletion of data, lack of any archiving, shoddy work and complete disorganisation as far as IT is concerned.
And as an IT manager, used to sorting out failing installations, I know from experience that you don’t get world-class IT quality from a third class IT organisation. Quality is built-in to everything you do..if area A is working badly it is unlikely that B, C or D will be Gold medal winners – an observation that seems to have passed many academics by.
Unless you have real evidence that Phil and Keef have produced world class stuff from among Harry’s dross (unlike Mick and Keef who can), then I take leave to doubt the credibility of anything that comes from CRU.
re “The Harry Readme has absolutely nothing to do with GCMs.”
Have you any concrete evidence that any of the GCM’s did not use (or preferably that none of the GCM’s every used) any of CRU’s temperature data to “fit” or “tune” their numerous parameters?
Until then, we must presume that GCM’s are contaminated by the fraud, and the extensive application of Murphy’s law documented in HarryReadMe. The burden is on the Climate Science community to prove otherwise.
To date, there has been no provision of evidence to Curry’s request for the detailed Verification & Validation documentation.
Burnt once, twice shy.
You know me, I’m a reformed cowboy. I start at the bozo easy stuff and work up from there.
What I have noticed in my investigation of the outputs of GCMs is that they are generally not “lifelike”.
What do I mean by “lifelike”?
1. The model gets the averages right.
2, The model gets the first derivatives (actually “first differences” for digital data) right.
3. The model gets the second differences right.
Regarding the averages, the results from the CMIP are instructive in this regard. These are control runs, where the forcings don’t change at all. I dunno if this website will allow images, but I’ll try. If not, the image is in the CMIP link just above.
The difference in the radiation temperatures from the highest to the lowest GCMs global control runs (from 16.5°C to 11.5°C) is 27 watts per square meter … clearly, some of them are very, very wrong. The CERFAC model result changes by 1°C, that’s 5.5 w/m2, when the forcings haven’t changed at all. Can this GCM tell us anything about a 1 watt/m2 change over 50 years? I don’t think so …
So they can’t even get the averages right for a stable system … doesn’t bode well for the averages once things start moving.
Let me see if the images show up before I move on to looking at some actual climate model runs.
Well . . . Mr. Eschenbach misses the whole point of the 2005 Santer et al. Science paper. He states that models are generally not “lifelike”.
In fact, the 2005 Santer et al. Science paper revealed that models can
be more “lifelike” than observations. A satellite-based estimate of
changes in lower tropospheric temperature, developed by scientists at
the University of Alabama at Huntsville (UAH), implied that the
tropical lower troposphere had COOLED over the satellite era. This
cooling was inconsistent with model results, with observational estimates of changes in tropical sea-surface temperatures, and with basic moist adiabatic lapse rate theory. But the cooling estimated by UAH group was also erroneous. The problem was the result of an sign error (this error was in the adjustment for the effects of satellite orbital drift on sampling of the diurnal temperature cycle).
Santer et al. showed that, despite model errors in the mean and temporal
variability of SSTs (the errors alluded to by Mr. Eschenbach), model
temperature changes consistently behaved according to a moist adiabatic
lapse rate across a range of timescales – while UAH data did not. As
heretical as it might sound to Mr. Eschenbach, model data were actually
useful in revealing this observational error.
The bottom line here is that if your intent is to assess how “lifelike”
models are, it sure helps if you are accounting for (the currently
large) structural uncertainty in observations. Most scientists routinely
sorry meant for this to follow Mr. Eshecnbach’s next post ….
OK, no images, so I’ll put them in as links instead. Onwards. I continue my analysis of the models used in “Amplification of Surface Temperature Trends and Variability in the Tropical Atmosphere”, B. D. Santer et al. (including Gavin Schmidt), Science, 2005.
To be valid on my planet, a model needs to get the averages right on an hourly, daily, monthly, and annual basis. It needs to get not only averages that represent the cyclical swings right. The absolute values need to be right as well.
After we look at the values of the variable of interest (e.g. temperature, pressure), we should next look at the first differences. This is how much the climate variables change from hour to hour, or from day to day, from month to month, or year to year.
And then we need to look at the second differences. These measure how fast a system variable responds to a change in the forcings.
I had naively assumed that these were being checked. But about four years ago, when I looked at the models used in Santer et al., I changed my mind. Let me take these one at a time. First we’ll look at the actual values. Here is the data from the Santer study for the sea surface temperature anomaly:
Santer Sea Surface Temperature (T)
LEGEND: The colored boxes show the range from the first (lower) quartile to the third (upper) quartile. NOAA and HadCRUT (red and orange) are observational data, the rest are model hindcasts. Notches show 95% confidence interval for the median. “Whiskers” (dotted lines going up and down from colored boxes) show the range of data out to the size of the Inter Quartile Range (IQR, shown by box height). Circles show “outliers”, points which are further from the quartile than the size of the IQR (length of the whiskers). Gray rectangles at top and bottom of colored boxes show 95% confidence intervals for quartiles. Hatched horizontal strips show 95% confidence intervals for quartiles and median of HadCRUT observational data.
My graph uses what is called a “notched” boxplot. The heavy dark horizontal line shows the median of each dataset. The notches on each side of each median show a 95% confidence interval for the median. If the notches of two datasets do not overlap vertically, we can say with 95% confidence that the two medians are significantly different. The same is true of the gray rectangles at the top and bottom of each colored box. These are 95% confidence intervals on the quartiles. If these do not overlap, once again we have 95% confidence that the quartile is significantly different. The three confidence ranges of the HadCRUT data are shown as hatched bands behind the boxplots, so we can compare models to the 95% confidence level of the data.
Now, without even considering the numbers and confidence levels, which of these model hindcasts look “lifelike” and which don’t? It’s like one of those tests we used to hate to take in high school, “which of the boxplots on the right belong to the group on the left?” To me, the only one that matches for quartiles, outliers, and median sea surface temperature is the UKMO model.
Next, we can take a look at the “first difference” of the data. This is usually represented by the Greek letter delta “∆”. Since this is monthly data, the first difference of temperature (called “∆T) represents how much the sea surface temperature changes from month to month. Here is that analysis:
Santer First Difference of Sea Surface Temperature (∆T)
Again, few of the models are lifelike. Some have huge month-to-month swings never seen in the observational record. Some barely change temperature from month to month. The way they move is not realistic.
Finally, here’s the second difference of the temperature (∆∆T). This represents how fast the oceanic temperature can change direction.
Santer First Difference of Sea Surface Temperature (∆∆T)
Same thing. Some models can switch from rapidly rising temperatures to rapidly falling temperatures in a single month. Again, nothing like this appears in either set of observations. Other models take a long time to start rising or start falling.
So before we get into intricate questions of V&V, and of whether the models converge and the like, how about we do the dumb stuff first, like check them against the value and the first and second differences of the observed temperature and other climate variables? Because as I show above, lots of them don’t even pass that most basic of tests of a model, is it “lifelike”? That is to say, can it reproduce the basic averages and statistics of the planet?
This is of more than bozo interest to a reformed cowboy. In a widely-quoted paper explaining why climate models work (Thorpe, Alan J. “Climate Change Prediction — A challenging scientific problem”, Institute for Physics, 76 Portland Place London W1B 1NT), the author states (emphasis mine):
Unfortunately, as my analysis above shows, the climate models do not correctly describe “the average and statistics of the weather states”, many of them are way out of line. This makes it quite unlikely that the models, or an “ensemble” of the models, can describe the future evolution of the climate.
Anyhow, Judith, that’s my recommendation. Dumb tests first, smart tests later.
these ‘dumb’ tests would be part of a validation. In mosher’s school some of these models would flunk. The focus of scientists would be on improving the best of breed rather than making yagcm. You take the worst models and put a bullet in their brain pan.
We need 3 programs that work very well, not 30 programs that perform poorly to very poorly.
Time to focus on funding the two best and defund the rest.
Then take the best of each of the areas, and build a third program from the ground up with very stringent control verification and validation at and all levels.
To add competition, have one North American program,one from the EU, and one from Asia. Then set the most stringent criteria of public openess verification and validation and let them compete. And may the best program win the public’s trust.
Let the universities compete for improving specific sections of models which would then graduate into the main codes when they qualify all testing, verification and validation steps.
We have too many preening peacocks and no elephants in harness to do the heavy lifting.
You rightly say:
“Unfortunately, as my analysis above shows, the climate models do not correctly describe “the average and statistics of the weather states”, many of them are way out of line. This makes it quite unlikely that the models, or an “ensemble” of the models, can describe the future evolution of the climate.”
And, in addition to their emulating climate systems which are known to not exist, the existing “ensemble” of models does not emulate the range of known possible climate systems one of which does exist.
As I wrote in another thread:
Thank you for your response.
To be clear, I agree with your statement that “In modal logic we would say they [i.e. the climate models] express possibility”. Indeed, that is what I meant when I said, “at most only one of the models is emulating the climate system of the real Earth”. Simply, the climate models are each a possible description of the real climate system and nothing more.
However, it is important to note that the models do not encompass the total range of possible descriptions.
Assuming for the moment that the models are correct in their basic assumption that climate change is driven by change to radiative forcing, then none of the models represents possibilities provided by the low values of climate sensitivity obtained empirically by Idso, Lindzen&Choi, etc..
But the range of the models’ outputs is presented (e.g. in the AR4) as being the true range of possibilities. It is not the true range and, therefore, the presentation is – in fact – a misrepresentation.
And this misrepresentation is why I said it is a “falsehood” to assert that an ability to tune each of the models with assumed aerosol forcing makes any of the models a “more useful scientific tool”. Such tuning merely makes each model capable of representing the possibility which that model is showing.
Furthermore, none of the models attempts to emulate other possible causes of climate change than the assumption that climate change is defined by change to radiative forcing. One such alternative possibility is that the climate system is constantly seeking chaotic attractors while being continuously forced from equilibrium by the Earth’s orbit varying the Earth’s distance from the Sun (n.b. I explained this possibility in a response to another of your comments on another thread of this blog). That possibility reduces the possible climate sensitivity to zero.
Please note that I am stating more than there is uncertainty about the models’ results. I am talking about a clear misrepresentation of what the models do and what their outputs represent.
It is bad science to report a range of possibilities derived from assumed climate sensitivities without clearly explaining that other possibilities derived from lower values of climate sensitivity (which are empirically obtained) are equally possible.
And if that reported range is taken as being sufficient to justify claims that actions to change the future are required, then that claimed justification is pseudo-science of precisely the same type as astrology.
One thing I’ve always wonder about climate models is how much calibration/tuning goes on. What concerns me with having too much calibration is that it might overshadow the parametrization leading to inductions that have little scientific backing. In other words, fancy curve fitting.
It seems that there should be some form of testing that could be done that could show the extent that a model will tune to questionable data sets. Feed it ten sets of “historical” data and see how well the model can tell you which data set is a viable historical data set. If the model handles all ten data sets without issue (even though some are physically impossible), or if the model only handles data that exactly matches the last 100 years of data, then it would seem that there are issues with the models.
In my mind I keep going back to Kepler and his model of planetary motion. It worked great for a planet traveling around the sun. You could reasonably predict a planet’s position far into the future. But once you start tossing in other bodies, things start to break down. Then along comes Newton with a new model (a single equation regarding gravity) and now we can handle n-body problems in a somewhat gross iterative manner. Now we come to client science models with (I would imagine) tons of non-linear equations causing all sorts of complex feedback issues (due to the iterative nature). I still have a hard time understanding why we can have much faith in any “predictions” the models make beyond what we get with weather forecasting (climate isn’t weather, yes I know. I’m talking about initial condition issues leading to quick divergence in non-linear systems of equations.)
@Tim Smith | December 2, 2010 at 2:21 am
“In my mind I keep going back to Kepler….”
I would stick there a little longer if I were you.
Demonstrating the cause of natural variation is the very foundation of climate modeling. The point is, how did he know 1594/5 winter would be harsh ?
I have a blog that takes a different angle: Fractals will end the climate debate.
Wow. I normally breeze through your posts Dr Curry (with the occasional side quest to look up a new term/reference), but i found that one really hard going- though probably more to do with myself than the way it was written.
I have been involved, closely in the installation, validation (IQ,OQ,PQ) and initial operation of a cutting edge piece of kit (modified version of existing equipment for different use). From this background i’d have to say that option 3 at the bottom of your post is the ONLY way to go.
It’s no good trying to ‘back-validate’ and operational model or piece of equipment because then all you can do is tinker. I.e. you are not really able to significantly alter the fundamental aspects of the model/equipment.
In process, continual validation, correction and testing is literally the ONLY way you can go. Well, the only way you can go if you care about accurate, reproducable results.
It has the side-benefit of quickly highlighting deficiencies whilst allowing you the time and the scope to correct said deficiencies. It also allows for a degree of experimentation within the ‘building’ stage- again, against the validation and testing background.
This sort of approach is typical in industry. I’ve said this before, but i think it bears repeating- Academia and the ‘pure’ research branches of science could REALLY do with taking some of these procedural methedoligies on board- they’re used for a very simple reason- they work.
Wow, excellent post! I would be extremely interested to read a validation report, even if only for point 1. (description of the continuous PDE and numerical methods) and point 2. (stability and convergence of the code). Not a full validation (experimental validation would be needed for that), but at least I will know what is solved and how it is done, a HUGE improvement compared to what I gathered reading IPCC report. As a guy having programmed and used FE models for all my professional life, this lack of basic information is the main thing making me distrust GCM…Is there a link where such info is available for a current used GCM?
Good reading but I disagree with the notion, written in between the lines, that the models have to be a key part of climatology. I also find the diversity of tests that the models are exposed to hugely insufficient. And of course, the idea that the “sanctioning” of models by the IPCC should be considered relevant by sensible people is just totally unreasonable. See
The “Recent Developments” section should include the recent paper by McKitrick, McIntyre and Herman, which shows that the models have overestimated the observed trends by a significant margin. It also makes the point that Santer et al (2008) only used data up to 1999.
The final published version is on the website of Atmospheric Science Letters but it is only slightly different from the freely available version.
From my experience in the nuclear industry, three things were important with computer models.
1 Good and comprehensive documentation.
2 Thorough validation from which limts of applicability were specified.
3 Independent verification.
External QA auditing and complete openness were essential.
Money spent on verification was money well spent. Finding and correcting errors before production use was infinitely better than finding an error 2, 5 or 10 years down the line. 10 – 20% of budget on verification was acceptable.
In contrast with the industry practice, academics using similar models were far more interested in running the models and publishing results. Anyone working on verification was not going anywhere career-wise.
Rather than having 20+ GCMs, it would be infinitley preferable to get rid of 80% of them and use the resource saved on documentation and V&V of a handful of models (IMHO, but what do I know?)
Having read Easterbrook, I immediately get the message that he is speaking from authority and not being objective when he calls anyone disagreeing with his pronouncements as being a “contrarian”. That to me is not a scientist talking. I cannot therefore take anything else he says seriously.
Strongly endorse your perspective and recommendations.
Hi Philip. I agree with your remarks in general and I really like your idea of reducing the number of GCMs and using the money saved to fund IV&V of the rest. I strongly believe it needs further consideration. I wonder what other GCM modelers think of the idea? I have left a comment on Steve Easterbrook’s blog asking what he thought about your idea here. I hope he responds.
“In contrast with the industry practice, academics using similar models were far more interested in running the models and publishing results. Anyone working on verification was not going anywhere career-wise.”
I would not be so sure about that. I did my PhD in numerical simulation (not climate related), and it is true that relatively few papers were published on model validation/review/comparison. However, those papers, when done right, were cited extensively. If you know what you do, publishing a review paper on comparison/validation of state of the art simulation techniques (including models developped by others) is a sure way to boost your citation index. Not sure about climatology, but for CFD or solid mechanics, those kind of papers are highly appreciated and academically rewarded. They are not easy to produce (you need to either be expert in all the simulation techniques to avoid bias, or have a large team with multiple specialists and a good coordination) and quite time consuming, though…
My comment concerns the requirement that
As pointed out in the post, Thuburn (2008) mentioned that the neglect of precipitation mass sink
However, the problem of accounting for a mass sink cannot be reduced to the constancy of the mean water content of the atmosphere. The neglect of a stationary mass sink (e.g., the annually averaged positive precipitation-evaporation difference in the near-equatorial part of the Hadley cell) will necessarily result in an underestimate of the stationary pressure gradients (pressure differences between the areas where vapor predominantly condenses and areas where it predominantly evaporates) and, by consequence, in an underestimate of the overall circulation intensity. To cope with this underestimate but retain a good match of the model output with observations, one will need to inflate parameterization coefficients for other circulation drivers (differential heating in the case of Hadley cell). Such a procedure will necessarily have a negative effect on the predictive power of the model — this power will be reduced proportionally to the magnitude of the effect of the precipitation mass sink on pressure gradients.
Neither can the physical problem be reduced to the ability of a model to account for a mass sink. It should be clearly stated how this mass sink is formulated and how it enters the system of the equations of hydrodynamics. The GFDL comment quotes the paper by Lin (2004). This paper is cited 167 times in Scopus. I scanned these papers (certainly, I could have overlooked something) but I could not find any published analyses of the models’ sensitivity to the presence/absence of a mass sink, with a theoretical justification of what this mass sink should look like. (By the way, in 2008 Thuburn should have been aware of the study of Lin (2004), because he cited Lin’s 2004 work in an earlier publication, Thuburn and Woolings 2005 J. Comp. Phys. 203: 386.).
In our paper we investigate the effect of a mass sink on horizontal pressure gradients. We believed we expressed the underlying physics behind our formulation of condensation rate very clearly. However, the recent intense blog discussions of our propostitions have shown that significantly more detailed explanations and justifications are desirable — and we are working to present such in the open discussion. (It would be really great if a representative of a modelling community would take time to post a comment there and contrast our approach to formulation of a mass sink with the one adopted in the GCMs).
In comparison, in the absence of any published sensitivity studies explicitly describing HOW the precipitation mass sink is accounted for in GCMs, how can one judge about the correctness of such an account? Moreover, how can a physical effect be included/excluded to/from a GCM model without any theoretical (not modelling) research quantifying its magnitude?
@Tim Smith | December 2, 2010 at 2:21 am
“In my mind I keep going back to Kepler….”
I would stick there a little longer if I were you.
Demonstrating the cause of natural variation is the very foundation of climate modeling. The point is, how did he know 1594/5 winter would be harsh ?
As Philbert so eloquently expressed, the fear of climatologists is that validation may suddenly and viciously turn into INvalidation. Which he and his pals, by the content and tone of the Climategate emails, were desperate to prevent.
Models are tools to increase our understanding. However, science needs to be based on experiment and observation, if not it is science fiction or religion. It is not possible to validate models with expert opinion.
My perspective on modelling is from the perspective of an aerospace engineer, but I do have a reasonably extensive background in managing software development. The term V & V threw me initially as I was not sure what is meant as the difference between the two milestones. In aerospace applications we generally call it FQT (final qualification testing), and may have one event for initial acceptance at the component level and another for acceptance at the system level.
I absolutely agree that models need to be fully documented, but only so that those who bought the model can effectively have someone fix/upgrade the model/code later as necessary at an affordable cost. If I had written a climate model that actually happened to accurately work for say 25 years into the future, I certainly would not be showing anyone else the details of that program as it would give up by proprietary data. Now If I was paying someone to develop a model of this type, then obviously the requirement for documentation would be a part of the statement of work that also describes what the model is supposed to accomplish, and the criteria for that testing. To do otherwise is simply bad management.
It seems that one of the larger problems with climate models is that they are difficult to actually verify except over a long period of time, and we are wishing more expedient results. Writing the verification criteria for climate modelling seems pretty straightforward. Due to the enormous number of unknown variables potentially effecting the models effectiveness it seems like time is required for validation.
“Models are tools to increase our understanding.”
If they only remained only so.
What else can models hindcast apart from the surface temperature?
Outgoing longwave radiation?
Ocean heat content?
Any plots anywhere?
I think there is a danger that if we force industrial strength V&V onto the climate modelling community, we may disrupt their creative talent.
Seriously though, where will we be if they simply take the huff at being audited and head off for pastures new? How will we run their models if they are not properly documented? Did the funding bodies not make full V&V and full documentation a pre-requisite?
Perhaps the funding and oversight bodies are incompetent.
I think that humanity would somehow manage to struggle on through such an unfortunate episode. :-)
Think how much fossil-fuel-generated electricity would be saved as we didn’t have to run their undocumented models on the world’s most powerful supercomputers.
And the hubris-laden programmers could make a good living as technology-driven racing tipsters. There’s a real opportunity to put their predictive skills to the test.
But many MSM journos would be unemployed and unemployable. Without a regular stream of doom-laden model outputs and associated press releases, they would have nothing to write about. After all, they don’t do any actual journalism.
V&V processes should be fit for purpose.
There is an implicit but undemonstrated assumption in much of the above discussion that GCMs are suitable for forecasting future climate, and therefore should be subject rigorous industrial strength V&V because of the risks involved.
Until it is demonstrated that GCMs are suitable for forecasting future climate (suitable both absolutely and relatively i.e. compared with other approaches) , asking for industrial strength V&V is over kill, and the effort should be on research grade V&V for the various subsystems and the interactions between them, as they being developed and demonstrated.
Agree that “V&V should be fit for the purpose.” To justify $65 trillion, they should meet the most rigorous V&V and documentation standards!
Re “implicit but undemonstrated assumption in much of the above discussion that GCMs are suitable for forecasting future climate, ”
On the contrary, the very lack of detailed documentation, lack of V&V and lack of following published principles scientific forecasting leads us to the inescapable conclusion that GCMS are NOT yet suitable for forecasting future climate.
Furthermore hind/forecasting show divergence between models & data. See:
Panel and Multivariate Methods for Tests of Trend Equivalence in Climate Data Series McKitrick, McIntyre & Herman, 2010, In press, Atmospheric Science Letters, etc.
My point too actually. It’s an assumption that GCMs are suitable for forecasting future climate, and undemosntrated.
I was just quietly reminding those who are calling for industrial strength V&V on this thread that this is only worth doing once the models move from being research tools and become suitable for wider use in forecasting climate (and I should add that there are some reasons why that may never happen – in the end other types of models may prove to be more useful).
By calling for industrial strength V&V people are both distracting attention from getting appropriate V&V in place, and implicitly crediting GMCs with a use they shouldn’t have at their current state of development.
Firstly I look at this from different perspective. I come from a groundwater flow modeling background and now deal with petroleum reservoir simulation, both of which we have been doing for many more years than climate modeling. The discussions I see here are regarding how robust the code is, how well various models compare, how well they match with known physical properties, etc. While these are extremely important they are the first and initial stage of v&v. Once you get past this stage, the absolutely most important v&v becomes how well the model can hind cast. It is absolutely essential that the theoretical inputs and code are able to reproduce known history accurately before you ever begin to attempt to forecast with a model. From experience I would say we spend 90 to 95 % of our time matching our hind cast before we ever attempt a forecast. My impression is that GCM’s and the modelers spend a minimal amount of time hind casting and a majority of the time making sure that the forecast results match what they expect the model to produce. A self propagating result?
Not quite sure you would think that. GCMs are checked for hindcasting all the time. One important application of this has been to understand the Northern Hemisphere cooling that occurred between 1940 and 1970.
Not quite sure why you would think that. GCMs are checked for hindcasting all the time. One important application of this has been to understand the Northern Hemisphere cooling that occurred between 1940 and 1970.
Hindcast to what?
Short term, there’s the instrumental record, which gets very sparse prior to the late nineteenth century, and is largely restricted to areas settled by Europeans. Almost nothing prior to 1850.
Medium term, there are the tree-ring and multiproxy reconstructions that go back a couple of centuries, a few extending to almost 2,000 years before 1850.
Long term, there are the ice cores and marine records that go back tens and hundreds of thousands of years, but have very low temporal resolution.
In my opinion, the climate community greatly overestimates the accuracy and precison of the second category of reconstruction (those covering many centuries at a seemingly high resolution). As far as model robustness, hindcasting to an unreliable or faulty record is worse than no hindcasting at all, it seems to me.
PAGES-CLIVAR is one ongoing project to validate and improve models via hindcasting to pre-instrumental records.
Hi Judith. Your blog’s handy search facility seems enabled only for the postings starting each thread and not for comments. Is there some way of enabling it for comments as well, or alternatively separate search boxes for each of postings and comments?
let me check.
here is how i think it works. I just searched for “colose” and two threads popped up. I then open the thread, and search for “colose’ from the web browser. But I will try to see if this can be improved.
When you are in the thread you want to search, use the Web browser “Find” function. In Safari it is in the folder symbol to the right of the Google window. I use this all the time.
Climate change, based on GSMs is being billed as major threat to human existence and well-being. Billions of people are “very likely” threatened. the whole world economy is to be redesigned with colossal shifts in wealth and resources.
In what way do you see this situation as not being “risk reduction strategies for safety-critical systems”? Unless the models are seriously wrong this is global safety-critical, or so we’re told.
While climatology was some academic backwater this was not an important issue but now GCMs are being pushed as being sole predictor of a catastrophic future that requires “urgent” reorgainisation of the WHOLE WORLD, you’d better get behind some rock solid V&V procedures.
But of course you will not. You will resist that to the very end.
I’d say its more important to deal directly with the inappropriate use of GCMs than spend too much time on imposing industrial strength V&V. It’s much easier to show the GCMs do not perform well enough to be useful, than to argue that they shouldn’t be used because of inadequate V&V.
That last quote was attributed to Steve Easterbrook (post above) but wordpress fails to use the blockquote cite=”” that it proposes below the comment box.
at minute 30-31 it gets interesting
The complete programme is interesting http://www.newton.ac.uk/programmes/CLP/clpw01p.html
What is missing are the requirements. In brief, the objective of V&V (Verification and Validation) may be summed up as “Do the right thing in the right way”.
Doing the right thing in the wrong way is an implementation error.
Doing the wrong thing in the right way is, at best, useless; normally, perfectly wrong; at worst, fatal.
“The purpose of Validation is to demonstrate that a product or product component fulfills its intended use when placed in its intended environment.” http://www.sei.cmu.edu/cmmi/tools/cmmiv1-3/upload/DEV-VAL-compare.pdf
“The purpose of Verification is to assure that selected work products meet their specified requirements.” http://www.sei.cmu.edu/cmmi/tools/cmmiv1-3/upload/DEV-VER-compare.pdf
CMMI Product Development Team. CMMI(SM) for Systems Engineering/Software Engineering, Version 1.02 (CMMI-SE/SW, V1.02), November 2000, SEI Joint Program Office
Sargent does refer tangentially to requirements (as “purpose”):
Sargent, R.G. “Verification and validation of simulation models.” In 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274), 1:121-130. Washington, DC, USA, 2002.
This discussion should have taken place long ago. The present model output has been sold to politicians and policy makers as if carved in stone they own it and are using it everyday to sell there policies. little or no wiggle room is now available to scientists it is now to late to add uncertainty to the output, global warming must happen, no if’s or buts are will be acceptable to the politicians who have bought it. Unlike most previous doomsday predication’s too many reputations and too much money have now been invested in this. This will not be allowed to fizzle out
Putting your house in order now will not help.
Then the solution is: Start Over.
It’s not too late to do proper validation but modellers and going to have to move fast on the excuses. You can already see in the Met. Office’s public presentation on their web site that they are hedging there bets being more cautious and starting to talk of uncertainty and other climate effects.
my guess is they will “discover” that the sun “has more effect than previously thought” and slowly reduce the amount of positive feedback currently used to make models roughly follow the last 50 years of warming.
They will never loose face (and funding) by admitting they got if wrong , just slowly shift position and modify the models.
There are sadly very few , it seems, with the integrity and open mindedness of Dr. Curry.
There is a consequence of missing or incomplete Requirements. One cannot do Verification against Requirements. One can not manage Requirements over time. But one can tinker with parameters used to tune the model(s) during Validation in order to get (expected) results during production. Such practices may well affect confidence.
Some references follow:
IPCC Task Group on Data and Scenario Support for Impact and Climate Assessment (TGICA). General Guidelines On The Use Of Scenario Data For Climate Impact And Adaptation Assessment. IPCC, June 2007.
Lindzen, Richard S. “Taking GreenHouse Warming Seriously.” Energy & Environment 18, no. 7 (12, 2007): 937-950.
Wigley, T. M. L., and S. C. B. Raper. “Reasons for Larger Warming Projections in the IPCC Third Assessment Report.” Journal of Climate 15, no. 20 (October 15, 2002): 2945-2952. http://journals.ametsoc.org/doi/abs/10.1175/1520-0442%282002%29015%3C2945%3ARFLWPI%3E2.0.CO%3B2
Johnston, Jason Scott, and Robert G. Fuller, Jr. Global Warming Advocacy Science: a Cross Examination. Research Paper. University Of Pennsylvania: University Of Pennsylvania Law School, May 2010. http://www.probeinternational.org/UPennCross.pdf
(Page 29, 30 Global Circulation Model Parameters conflict)
There’s empirical evidence that aerosol forcing, despite Lindzen’s conjecture, is not “much smaller than assumed”, and that his implication that it is “assumed” misrepresents the evidence. Anthropogenic GHG’s have been estimated to have contributed about 3 W/m^2 positive forcing in recent decades, and black carbon (BC) in aerosols about 1 W/m^2 positive forcing (Ramanathan). Aerosol BC is found mainly in “brown clouds” that are abundant in parts of Asia from biomass burning and fossil fuel burning, but these brown clouds contain a mixture of warming aerosols (primarily BC) and cooling aerosols (organic carbon plus sulfates and other products of fossil fuel combustion). The negative forcing from the cooling aerosols tend to cancel out the 1 W/m^2 BC effect. Elsewhere, where fossil fuels from modern energy societies predominate over biomass energy sources, the effluents contain mainly the sulfates and other cooling components, and so the global cooling effect will be more than 1 W/m^2. Whether the total negative forcing amounts to as much as 2 W/m^2 is something I’m unaware of – my point is simply that negative aerosol forcing is a non-imaginary and substantial climate influence.
For one discussion of brown clouds, with references, a link is
Black Carbon and Brown Clouds
It’s probably just a quibble over the terminology, but I have to disagree with a couple of those statements. Incomplete requirements do not hinder verification, they merely provide looser tolerances to the “pass” bucket (eg. “bring me a sandwich” can be satisfied with pretty much anything jammed between bread, while “bring me a hamburger” is more restrictive). Additionally, requirements management most definitely includes refinement of requirements.
On the other hand, the completeness of the requirements absolutely impacts the confidence placed in the validation effort.
Actually, Gene, you are right on; your comment is not a quibble! Your observation is the key:
Bad or incomplete requirements can indeed be verified by the stakeholders; in my experience, they often are.
It all depends upon the inclusion of all stakeholders during requirements definition. To extend your example:
Philipona, Rolf, Klaus Behrens, and Christian Ruckstuhl. “How declining aerosols and rising greenhouse gases forced rapid warming in Europe since the 1980s.” Geophysical Research Letters 36 (January 20, 2009): 5 PP.
The math presentation linked above (also linked here: http://www.newton.ac.uk/programmes/CLP/seminars/082310001.html ) makes the point that it is possible to build in stochastic methods that get equal or better results with far lower resources, and permit far lower dependence on “plugs” (parameters) derived from “bulk” systems derived from current meteorological models. It can also be “patched in” to improve the existing kluges, but he appeals for “ab initio” derivation of GCMs.
His validation suggestion is to work with upgrading seasonal projections/forecasts, since waiting till 2100 to approve the changes is impractical.
The whole seminar, btw, is an implicit suggestion (demand) that competent mathematicians get involved at the design level. He paints a picture of numerous independent centers trying to get hundreds of specialized areas right with about 20 personnel each. Virtually all of those people are necessarily working outside their levels and areas of competence, if any, the vast majority of the time. DIY math, stats, physics, forecasting, coding etc. is clearly unfit for purpose.
Here’s a math question for you. Let’s suppose your Ph.D. boss measures the speed of a particle (in 10^3 miles/sec) as the following values:
180, 181, 182, 183, 184, 185, 186.
What’s the mean? What’s the 2σ range?
What’s the mean? What’s the 2σ range?
What’s your point? Granted 183 is a tad hard to achieve, but that sequence does have a mean, a standard deviation (sqrt(14/3)), and a median absolute deviation (2). (MAD was invented by Gauss in 1816 as a more robust alternative to standard deviation.)
I thought your later “F=ma” is a model. was more on point. It’s a closed form formula. If you’re advocating that the climate be modeled with a closed form formula, I’m behind you 100%. We’d then have climate science. Mysterious software written by someone else running on someone else’s computer that we have to take its word for is not my idea of science.
(Believe it or not I actually have a simple closed form formula for climate, only slightly more complicated than F=ma. I only claim validity for 1850-now, but it’s interesting to speculate on how far it can be extrapolated. If the methane escapes it will fail very badly. Ditto if carbon mitigation ever occurs.)
I wonder how many of the ardent IV&V folks here are willing to take a job helping to develop, say, CESM? Are they willing to walk the walk?
When comparisons of climate models outputs with data are attempted at various scales and the models are found wanting, the immediate response is that the large scale and long term average is still correct. But this papers over a lot of disagreements with data and is a defense without proof.
Let’s turn the question back on you, Craig. What would constitute “valid” output from a climate model? What are your metrics?
Nothing can be proved regarding model outputs, they are just model outputs.
Be more explicit. I didn’t ask for proof, I asked for what would be considered “valid”. In other words, what should a climate model *do*?
You validate the model in that it is representative of what it is simulating but in a logical sense you cannot say the output valid or attribute validity to it. Have a look on wiki it will be explained better than I can. I suppose the main point is the logic the output is derived from cannot be proved to be true.
Semantics, then. Choose different word than “valid”, then.
Perhaps you can answer the question I asked.
“What would constitute “valid” output from a climate model?”
Which bit have I not covered?
The output is not real you must interpret it as too its value/use.
What about my other question – “What should a climate model *do*?”
Anything, in your view?
PS – “F=ma” is a model.
I really have no idea” climate model” is a generic term
You are correct in one sense: Define the requirements for a climate model.
In another sense, however, for a “wicked problem” such as climate, one view may contribute, but can not suffice. Even if Craig is a stakeholder, there are many others. (I do not think Dr. Jerome Ravetz is one.)
For starters, let me suggest a quick review of the scientific method. It might shed light on the framework for developing a climate model.
Various. “Scientific methods.” Encyclopedia Wikipedia, the free encyclopedia, September 30, 2010.
Feynman, Richard. “Feynman Chaser – The Key to Science.” YouTube. YouTube, n.d. http://www.youtube.com/watch?v=b240PGCMwV0
Its just a simulated really number in your case a temperature.
Hunh? That barely parses as English.
Sorry, been up the last couple of nights listening to the cricket.
Same here. Finally found it behind a chair and turfed it outside. Darn things’ll keep anyone awake.
Here we are after a few days of discussions and arguments and where are we?
(Hypothetically) I’m the CEO of a major corp, sitting at the head of the conference table.
You lot are the tech heads and scientists etc with models sitting on the conf table in front of you.
After a couple of hours of arguments (that’s how long it took to read the whole thread carefully, follow some links etc) about whos model was validated verified or not, what standards were used/applied, were those standards of sudfficient quality, complaints about VV lite or small cap vvs and on and on and on.
I finally get the jack of it, take out my large (very very large) cheque and yell out (over your din)
“HEY BOOFHEADS, I DON’T GIVE A CHIT ABOUT YOUR TECH DETAILS. DO YOU WANT THIS CHEQUE?? HA???
Then SHOW ME THE RESULTS. SHOW ME YOUR TRACK RECORD
if I’m satisfied, you’ll get the cheque. If not….. find another job.
Anyhoooow, that’s what popped into my sore head.
A welcome dose of outside reality in all this navel-gazing autocongratulation fest. The models do not do what they claim to do.
Required reading for all climate modellers – ‘The Emperor’s New Clothes’ by HC Andersen.
Many things affect climate, but on this Earth, there is nothing more stabilizing than ICE and WATER! Look at the history of the temperature of the earth. Especially look at the ice core data for the past half million years. It has gotten warm, time and time again. Every time that it got warm, it then got cold. When it got very warm it then got very cold. When it got a little warm it then got a little cold. Come on people! Look at the data. Warm melts Arctic ice. Exposed Arctic Water causes Ocean Effect Snow and that makes it cold. It truly is that simple! Ice and Albedo changes resulting from the waxing and waning of the Arctic Ice is the major thermostat of the earth. This pattern emerged as the Continents drifted into the current configuration. Review the Theory of Maurice Ewing and William Donn. They had it right back in the 1950’s.
Many factors do go into determining the temperature of the earth. A trace amount of CO2 likely does have a trace effect. The 1 molecule of manmade CO2 per 10,000 molecules of other stuff may cause a tiny bit of warming. Past temperature changes have been rapid. When the Climate Scientists tried to drive these rapid changes with CO2, it did not work. They were sure that CO2 was the cause so they made up carbon feedback terms. They tweaked the parameters until the models matched temperatures. They said since they matched the temperature, the models were proven to be correct. That is not any kind of a proof! That matches the description of curve fits and not models. Their climate theory is wrong and their climate models are wrong and they will not even discuss these topics with me. Once they determined that I did disagree, they quit communicating with me. Their climate theory and climate models must be reviewed by people who are outside the consensus group.
Past temperature changes have been rapid. Albedo of the earth is the parameter that is powerful and that can change temperatures rapidly. They tell us that CO2 is causing warming and that is causing ice to melt. Actually, ice is melting and that is changing albedo and that is causing the warming. When the Arctic Ocean and other Northern waters are frozen, there is little source for moisture and it does not snow enough to replace the ice that melts and the earth gets warmer. Eventually, enough ice melts and enough water exposed to provide enough moisture to more than replace the ice that melts each year. In the past, major periods of warming melted most or all the ice in the Arctic Ocean and what followed was major periods of cooling caused by massive amounts of Ocean Effect Snow. In the more recent ten thousand years the melts were not as severe and resulting cooling was not as severe. Albedo, not CO2, is fine tuning the temperature of the earth.
They tell us the ice extent in the Arctic Ocean is at record lows and then we have winters with record snows. Yet, they don’t suspect that this is not caused by CO2. Their climate theory is flawed and that caused their climate models to be flawed. Maurice Ewing and William Donn had a valid climate theory back in the 1950’s but someone invented computers. A high level manager at the Johnson Space Center has said, more than once, that when you give some people computers, you ruin them. They believe the numbers that come out of the computers and they stop thinking. They observed that CO2 went up and down with temperature and decided it must be driving. Simple PHYSICS tell us the cold water will adsorb CO2 and warm water kicks it out. Open a cold and a hot carbonated drink. Which one spewed the most? They make no mention of this property of CO2 and water. The oceans are huge and they hold much more CO2 than is in the. Of course CO2 goes up and down, driven by temperature.
Manmade CO2 is less than 1 Molecule of CO2 for every 10,000 Molecules of other stuff. That cannot drive rapid temperature shifts. Albedo can produce the large and rapid shifts.
Steve Easterbrook has a presentation at AGU on this topic, the abstract seems at odds with what he has been saying here:
What’s “at odds”?
My reading of these posts at Professor Easterbrook’s site seems to me to be saying that all is sweetness and light within the climate science software development community:
Verification and Validation of Earth System Models
Do Climate Models need Independent Verification and Validation?
Validating Climate Models
Should science models be separate from production models?
The AGU abstract, on the other hand, seems to indicate that there are significant problems within that community:
My direct experience with application of recent engineering and scientific software development methodologies has shown that many of the problems noted above can easily be avoided. Documentation of the specifications, for one example, provide an excellent starting point for avoiding these problems. Coding guidelines including careful specifications for interfaces is another example.
This last sentence in the above quote:
is a strawman in that ‘generic software engineering processes’ are not the only processes employed for engineering and scientific software.
The following statement from this post The difference between Verification and Validation:
is especially troubling.
Professor Easterbrook has also cited the infamous paper by Oreskes Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. A paper that has been refuted several times within the engineering and scientific software development community. A paper that says, in effect, that we might as well not even think about starting development of software for complex natural physical phenomena and processes. And yet at the same time, some such software is readily accepted, climate science software for example, while others are simply rejected based solely on that paper.
A short and interesting introduction to application of the scientific method to engineering and scientific software is given in this paper: CFD: A Castle in the Sand?
Indeed. I’m curious to see if Steve returns here to address some of the questions regarding his pronouncements about the special case of climate modeling.
By the way, very interesting links on your site regarding V&V at Los Alamos.
Dan, provide us a one-line spec for a GCM.
What in the world does that mean?
Yet another straw man; short, thin, single-straw version.
So use a few lines. What would the spec for a GCM look like?
A: Produce a (revised and improved) model of the climate
Test it against reality
When it fails to produce good results GOTO A:
That should keep the few good modellers out of mischief for a few years.
(Yes I know, I know…GOTO used for illustrative purposes only)
The point is not whether Dan or anyone else on this site can dash off a specification for a GCM (a GCM would actually require several given the number of moving parts). Something ad hoc would only serve to divert attention from the real point – those building these models appear to be doing so without specifying what they’re planning to build and Steve is defending this as normal.
The issue isn’t that there isn’t “the spec” for GCMs – there is. We all know what it is. The claim that the scientists and programmers building GCMs have no clue as to what they’re doing or why is bogus.
How do you reconcile your assertion that ‘We all know what the specification is’ with Easterbrook’s statement
‘there are no detailed specifications of climate models (nor can there be – they’re built by iterative refinement like agile software development)’
You can’t both be right.
We know what the purpose of GCMs are – a *detailed* spec is virtually impossible, but that doesn’t mean that we cannot tell when a GCM is working correctly or not.
‘hat doesn’t mean that we cannot tell when a GCM is working correctly or not.’
How do you tell?
Yet another strawman. The specifications for models / codes/ application procedures on the scale of GCMs will run to several hundreds of pages. Very likely on the order of thousands.
I suggest you re-read the first three sentences that I quoted above.
As for ‘the spec’ and we all know what that is, I suggest you aren’t aware of what we’re talking about.
Specifications are not one-sentence mission statements.
The claim that the scientists and programmers building GCMs have no clue as to what they’re doing or why is bogus.
Do we get flying monkeys with our straw men? I made no such claims. I stated that they’re working without specifications. One can do so when building something by and for oneself. For larger systems built by teams for wider audiences, problems and proposed solutions need to documented. This allows for a common understanding within the team and for proper validation that the proposed solution is fit for purpose.
I might hack together a tool shed without a plan, but I certainly wouldn’t live in a building constructed in the same manner.
‘The testing processes are effective at removing software errors prior to release, but the code is hard to understand and hard to change. Software errors and model configuration problems are common during model development, and appear to have a serious impact on scientific productivity. These problems have grown dramatically in recent years with the growth in size and complexity of earth system models’
If you change the subject from ‘scientific models’ to ‘large software systems’, that was the sort of stuff being written about many commercial systems when I joined the IT industry in the late 1970s.
The ‘build a bit here, patch a bit there, modify that part to take account of a special case and avoid a bug on Tuesdays in Leap Years’ type of programming which had stood the nascent Data Processing industry in good stead in Finance and Banking, Insurance, Distribution and Manufacturing..and also Air Traffic Control was coming to the end of its useful life.
As a ‘model’ of DP it had worked well, but was no longer adequate as the uses for which it was needed expanded and the ‘reach’ of such systems increased.
Instead of a multitude of ‘home grown’ software…where each installation had a different software base, there was a gradual move to standardisation and configuration of widely available packages. Hence the rise of companies like SAP AG and Oracle. Who may not be as obvious as Microsoft, but have a greater influence in the commercial transaction (buying and selling stuff) market.
Seems to me like the climate modelling industry is at much the same stage as teh banks were in the early 1980s. Faced with a generational change in their software needs, but not quite sure of how to get there.
Messrs Esterbrook and his colleagues would do themselves a favour if they were to spend a little time researching the history of this period. There are many parallels that could make their lives a lot easier.
And now a plea. As a 30 year IT veteran, I have been horrified and saddened by nearly everything I have discovered about IT practices in Climatology. It is all so bloody primitive and slipshod. And it needn’t be.
For a field that relies so heavily on IT in many of its forms, it is shocking to see how little of today’s best practice is adopted, and how many of the problems they still struggle with are those that were understood and solutions found in the commercial world twenty or thirty or forty years ago.
The rapid technological changes since then have made it more , not less, essential for an IT installation to be well-managed, well-controlled and with all the safeguards of verification, version control, auditability and all the good ‘systems management’ practices that I was learning in 1980. Plus some more recent innovations.
For me it is like a surgeon from today looking at some others who claim the title ‘surgeon’ and who claim for themselves great expertise and infallibility. But when I examine their theatre technique, I hear them arguing that these new-fangled anaesthetic things would only slow them down and anyway the pain that the patient endures is probably sent by God as a punishment for Sin, so who are we mere mortals to interfere with his Divine Will? The argument about validation and verification seems to be exactly analogous. Better not have any- it might get in the way of producing results.
All these problems..even the deluge of prior incompetence that was heaped upon poor Harry’s head are not unique and new to Climatology. Solutions and best practice is known. The BCS for example has an active membership composed of cutting-edge professionals in their field.
But why does the climate science community not take advantage of this knowledge? Is it that they couldn’t get a paper from it? Is it that by asking an IT guy from a bank something they feel that they would be tainted by commercial corruption? Or are they just too convinced of their own ‘super specialness’ that they don’t believe that anyone who is not a climate scientist and/or doesn’t have a PhD in Radiative Physics could possibly have anything to contribute. To stoop so low would be to admit that they are not super-intellects in all fields?
I have never met anybody from the Amish community, so I apologise in advance if I do them a disseervice, but they seem to bear the same relationship to the more mainstream US society as climate scientists do to the big world of IT.
They see it, they may even occasionally visit it, but they know in their hearts that is has nothing to offer them and they’d prefer to live in splendid isolation. While proclaiming that they have found the One True Way. Go figure.
Would you be willing to jump feet-first into decades’ worth of legacy Fortran code, at a pay rate (for senior staff) which would elicit giggles from newly minted CS graduates heading for the business and/or financial sectors?
Whether I would be prepared to do so personally or not is irrelevant.
But similar exercises are undertaken very successfully by programmers in India where there is a huge pool of very talented, very bright and very enthusiastic software guys and gals. Whose pay rates are not high by ‘Western’ standards, but whose technical expertise and software quality are very very good. I have commissioned work there and been very favourably impressed by the results.
I can think of no particular reason why climate models need to be programmed in Colorado or California when the technical abilities are available cheaper elsewhere. The money could be better spent in those countries and we could terminate most of the time-serving hangers-on in the US. Sounds like a great idea to me. More bangs per buck. The new Congress will love it.
We know how well outsourcing works with the military – why not science too?
I got better quality code quicker and cheaper than doing it in Europe. Seemed like a great deal to me.
And anyway, as nobody apart from the programmers believes in climate models anyway, nobody would care if it was all a disaster. Apart from the programmers here who would probably be unemployable outside academe because of their shoddy and unprofessional practices.
No idea what you mean when you discuss military outsourcing. Are you going to explain or just leave it as one of your frequent cryptic utterances?
Privatizing the military has worked wonders in the Middle East and Central Asia, wouldn’t you agree? Profits have been in the billions!
Then again, your view of the entire community is so obscured by obstinate ignorance that your opinions need to take that into account.
Please be more specific. Whose military has been privatised? What part(s)? What have been the consequences?
AFAIK UK military has not been privatised. And until a few weeks ago I worked there so I think I would have noticed.
Ever hear of Blackwater or Halliburton?
Blackwater is a river in NE Hampshire. I lived in the associated village for a while. Who is Halliburton? And what did he do?
Now you’re just being dumb.
You should not assume that your audience on this blog are US citizens. Many here are from UK. We do not follow the intimate details of your politics. Do not assume that we do.
After all you did call your problem *global* warming.
I make no assumptions at all about the level of your knowledge. That’s one thing I’ve learned about you.
Try Google. Works really neat.
Sorry guv. I’m not playing your game any more.
You bought up the subject of military privatisation. It is your job to explain it to your audience. If you fail to (or can’t) the deficiency lies with you. There is no obligation on them to follow up anything at all….or even to read your posts.
In sales theory there is a little thing called ‘buying the right’. Which means that you have to demonstrate some expertise or ability to help somebody solve their particular problem before you will even get a hearing. Let alone for people to act in yur advice/recommendation.
Too many climatologists are ignorant of this essential step in the sales process, and assume that ‘Trust me I’m an important scientist’ is good enough. It ain’t.
I can offer one-to-one tuition for you at very reasonable rates. Please contact my usual professional agency to arrange a schedule. And make sure that they know you will need to start at a very basic level.
Why should anyone bother to explain anything to you when your standard response is to put the tips of your index fingers into your ear canals and make bleating noises?
I can lead you to water, but I can’t make you think. Obviously.
Or else you’re just too lazy.
Many things affect climate, but on this Earth, there is nothing more stabilizing than ICE and WATER! When the Earth is cold and the Arctic Ocean is frozen, there is no good source for snow and ice on the Earth retreats and that lowers albedo. When the Earth is warm and the Arctic Ocean is thawed, there is a really good source for moisture and it snows and the ice advances and that raises albedo.
Look at the history of the temperature of the earth. Especially look at the ice core data for the past half million years. It has gotten warm, time and time again. Every time that it got warm, it then got cold. When it got very warm it then got very cold. When it got a little warm it then got a little cold. Come on people! Look at the data. Warm melts Arctic ice. Exposed Arctic Water causes Ocean Effect Snow and that makes it cold. It truly is that simple! Ice and Albedo changes resulting from the waxing and waning of the Arctic Ice is the major thermostat of the earth. Can no one else see this? In this whole thread, no one besides me even mentions albedo. No climate model is correct if it does not correctly deal with albedo.
Dr. Curry, or anyone, please tell why albedo is not a consideration in all these climate discussions.
Climate science leaves out albedo.
I have searched site after site and they all discuss theory, models and equations that have to do with the greenhouse effects and completely do not even mention albedo. I have used the find feature in my browser to search for the word albedo. It is seldom mentioned and never talked about as a significant term in any theory or model.
It is most unfortunate that both sides of this main stream debate ignore albedo and the snow on, snow off, influence of the Arctic Ocean. I have a huge amount of confidence that they have no chance of being right until they correctly include albedo in their models and theory. When the Arctic is Frozen it don’t snow much. When the Arctic ice is melted and the water exposed, it snows like crazy. That lowers and raised the albedo and changes the Earth Equilibrium Temperature. You cannot melt all the ice in the Arctic without getting massive amounts of Arctic Ocean Effect Snow!
Willis Eschenbach has an interesting piece on NASA’s GISS ModelE over at WUWT entitled “Zero Point Three Times the Forcing” that I commend to your attention.