Verification, validation, and uncertainty quantification in scientific computing

by Judith Curry

I think I am gaining some insight into the debate between scientists versus engineers regarding climate model verification and validation.

For background, the previous relevant posts at Climate Etc. are:

Of particular relevance is the post Climate model verification and validation, where the starkly different perspectives of Steve Easterbrook and Dan Hughes are contrasted.

Steve Easterbrook’s new post

Steve Easterbrook has a new post entitled “Formal verification for climate models?”  His post starts out with this question:

Valdivino, who is working on a PhD in Brazil, on formal software verification techniques, is inspired by my suggestion to find ways to apply our current software research skills to climate science. But he asks some hard questions:

1.) If I want to Validate and Verify climate models should I forget all the things that I have learned so far in the V&V discipline? (e.g. Model-Based Testing (Finite State Machine, Statecharts, Z, B), structural testing, code inspection, static analysis, model checking)
2.) Among all V&V techniques, what can really be reused / adapted for climate models?

Excerpts from Easterbrook’s response:

Climate models are built through a long, slow process of trial and error, continually seeking to improve the quality of the simulations (See here for an overview of how they’re tested). As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work, because the relationship between particular earth system processes and the overall behaviour of the climate system is exactly what the researchers are working to understand.

Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works. 

Now, as we know, agile software practices aren’t really amenable to any kind of formal verification technique. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification (the ‘target skill levels’ in the chart above don’t count – these aspirational goals rather than specifications). And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.

Dan Hughes responds

Dan Hughes posted a comment a Easterbrook’s blog, which was censored.  Hughes posted his entire comment on his own blog “Censored at Serendipity”, some excerpts:

Steve Easterbrook at Serendipity has censored a comment that I made.

The following was cut by Steve Easterbrook

Instead of bold, un-supported, mis-characterizations of the material, from a position of authority, would you point to a single aspect of either the material that I have posted, or in either of the books by Roache et al., that is not appropriate to scientific and engineering software. Or, have I mis-read the title of the books.

I find your assertions completely at odds with all the material in those books, and in the hundreds of other peer-reviewed papers and reports that have been written on V&V of engineering and scientific software over the past two and a half decades.

You have once again clearly illustrated the lack of knowledge of the state-of-the-art of independent V&V and SQA that has been successfully applied to a wide range of engineering and scientific software that pervades the Climate Science Community.

Thank you for your very professional and deeply considered response. Over the period of time that several of us have attempted constructive discussion here, you have yet to provide a single example of the lack of appropriateness of our peer-reviewed sources. Not a single example. You provide only meaningless arm waving from a position of authority.

Dan Hughes pointed me to the following paper:

A Comprehensive framework for verification, validation, and uncertainty quantification in scientific computing

Christopher J. Roy and William L. Oberkampf

Abstract.   An overview of a comprehensive framework is given for estimating the predictive uncertainty of scientific computing applications. The framework is comprehensive in the sense that it treats both types of uncer- tainty (aleatory and epistemic), incorporates uncertainty due to the mathematical form of the model, and it provides a procedure for including estimates of numerical error in the predictive uncertainty. Aleatory (random) uncertainties in model inputs are treated as random variables, while epistemic (lack of knowl- edge) uncertainties are treated as intervals with no assumed probability distributions. Approaches for propagating both types of uncertainties through the model to the system response quantities of interest are briefly discussed. Numerical approximation errors (due to discretization, iteration, and computer round off) are estimated using verification techniques, and the conversion of these errors into epistemic uncertainties is discussed. Model form uncertainty is quantified using (a) model validation procedures, i.e., statistical comparisons of model predictions to available experimental data, and (b) extrapolation of this uncertainty structure to points in the application domain where experimental data do not exist. Finally, methods for conveying the total predictive uncertainty to decision makers are presented. The dif- ferent steps in the predictive uncertainty framework are illustrated using a simple example in computa- tional fluid dynamics applied to a hypersonic wind tunnel.

Comput. Methods Appl. Mech. Engrg. 200 (2011) 2131–2144

Link to entire paper [here].

This is a very good paper, which is broadly consistent with what I have been proposing regarding uncertainty determination and V&V.  One note re vocabulary: aleatory uncertainty is the same as ontic uncertainty discussed in my uncertainty lexicon.

The paper’s conclusions highlights the new elements of their approach, which I think are particularly relevant for climate models:

The framework for verification, validation, and uncertainty quantification (VV&UQ) in scientific computing presented here represents a conceptual shift in the way that scientific and engi- neering predictions are performed and presented to decision mak- ers. The philosophy of the present approach is to rigorously segregate aleatory and epistemic uncertainties in input quantities, and to explicitly account for numerical solution errors and model form uncertainty directly in terms of the predicted system response quantities of interest. In this way the decision maker is clearly and unambiguously shown the uncertainty in the predicted quantities of interest. For example, if the model has been found to be inaccurate in comparisons with relevant experimental data, the decision maker will starkly see this in any new predictions; as opposed to the approach of immediately incorporating newly obtained experimental data for system responses into the model by way of re-calibration of model parameters. We believe our proposed approach to presenting predictive uncertainty to decision makers is needed to reduce the tendency of under estimating predictive uncertainty, especially when large extrapolation of models is required. We believe that with this clearer picture of the uncertainties, the decision maker is better served. This approach is particularly important for predictions of high-consequence systems, such as those where human life, the public safety, national security, or the future of a company is at stake.

JC comments:  If climate models were built only for science qua science, to support the academic research programs of the people building the models, then Steve Easterbrook would have a valid point.  However, climate models (particularly the NCAR Community Climate Models) are built as tools for community research, and these models are used for a wide variety of scientific research problems.  Further, these climate models are used in the IPCC assessment process, which reports to the UNFCCC, and in national-level policy making as well.  Greater accountability is needed not only because of the policy applications of these models, but also to support the range of scientific applications of these models by people that were not involved in building the models.

So count me in the camp of Dan Hughes, Joshua Stults and William Oberkampf on this one.  I have been trying to make some headway to raise the standards of climate model V&V, through the NASA and DOE committees that I serve on.  I am cautiously optimistic that the relevant people (i.e. the ones in charge of the climate $$) might be prepared to listen to these arguments.

337 responses to “Verification, validation, and uncertainty quantification in scientific computing

  1. “So count me in the camp of Dan Hughes, Joshua Stults and William Oberkampf on this one. ” Me too.

    We have ample evidence of unwanted consequences when politicians design policies from the output of climate models.

    • There is grave danger to the whole society when members of the public lose confidence in world leaders who, perhaps innocently, accepted the output of computer models as scientific fact.

      Václav Klaus, President of the Czech Republic, is one of few world leaders who recognized this danger.

      He wrote the forward to Professor Ian Pilmer’s new book making fun of consensus climate science, “How to get expelled from school.”

      http://www.connorcourt.com/Plimerflyer.pdf

    • There seems to be a misunderstanding of software engineering. V&V does not in any way prove the software will “give the right answer”.

      For example, it is completely possible to V&V software with a 100% pass to predict all the lottery winners next year. And right up until next year no one will be able show that the software is faulty in any regard.

      Equally, it is much simpler to write a climate model that will predict what climate scientist will accept as a valid forecast for next year than it is to write a climate model that will deliver a reliable forecast for next year.

      What climate scientists have so far invented are climate models that do a very good job of predicting what climate scientists will accept as being a valid forecast.

      Those models that do a poor job of predicting what climate scientists will accept end up on the rubbish heap, no matter how valid their forecasts might actually have been.

      A climate model that predicts the climate with 100% accuracy 50 years in the future has signed its own death warrant if the climate scientists that controls is programming does not agree with the forecast.

      It will be scrapped as being “wrong”, even though it was right. No one will know until 50 years from now, and by then the records will have long since been forgotten.

  2. Now, as we know, agile software practices aren’t really amenable to any kind of formal verification technique. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification (the ‘target skill levels’ in the chart above don’t count – these aspirational goals rather than specifications). And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.

    All well and good, but if you’re going to let yourself off the hook like that, how much confidence can anyone have in the results? There’s a time for dinking around, and then there’s a time for getting it right. The dinking around approach is fine while you’re discovering and learning, but that’s a pretty lame excuse for publishing a paper without any QC.

  3. Willis Eschenbach

    Judith, thank you for this excellent post. While the methods and concepts used to test and examine and validate iterative climate models are different from those appropriate in other scientific disciplines and with other model types, it is extremely important that we do not respond by throwing up our hands and quitting as Orestes, Easterbrook and others cleverly and tactically propose. Yeah, they’d like that all right …

    w,

    • As P.E. notes, immediately above your comment, the purely scientific/academic concerns and processes (aka ‘dinking around’) are for the pre-application stages. As soon as you want to draw actual conclusions and act on them, the game changes and “skill levels” become the core concern.

  4. The value of any model must necessarily be limited to the rigour and extent of knowledge under which such models have been formulated.

    Easterbrook is simply saying that climate models are not verifiable nor can they be validated due to complexity of the systems that underpin them.

    Hence, IMO no current climate model in use for decision-making purposes by Governments is capable of V and V, which is probably why none of these models have ever been given due scrutiny by sceptics.

    In fact, there is ample evidence that any dissenting opinion from within the modelling groups are dismissed without due peer review processes being followed, which implies that the underlying science is flawed.

    • Peter
      You write –

      “The value of any model must necessarily be limited to the rigour and extent of knowledge under which such models have been formulated”

      I respectfully disagree. How a model is developed is NOT what is important to a policy maker, what is important is how well a model accurately predicts future conditions. Current GCM have no ability to accurately predict future conditions and therefore they have a very low value for use in government policy development

      • You two are observing obverse and reverse of the same coin.
        =============

      • The value of a model is NOT in its ability to predict the future, any more that the power of the Oracle or Delphi came from her ability to predict the future.

        The value of the model (oracle) comes from its ability to predict what people will BELIEVE is a prediction about the future. In the case of Delphi this brought great wealth.

        For example, you might work harder today because your boss predicts that this will increase your pay in the future. Whether this prediction is true of not is not is irrelevant. You do not work harder today because your got a raise (or got fired) next year.

    • To clarify, Rob makes a valid point that policy makers would not care how the models are formulated but more on their predictive ability.

      I believe that a simplistic model, for example, would only be capable of giving simplistic results, especially where serial errors due to computer rounding may give misleading results, such as the “hockey stick” graph of AGW.

      In any case, the predictive results over the past decade so far does not engender confidence in the GCM’s currently in use.

    • Just a question re synonyms: are you using “flawed” to mean “shambolic junk”? If so, +1.

      If not, stop equivocating.

      • Flawed in my lexicon means something that is less than what should be expected given the resources allocated to a project and while some people on this blog choose to express themselves more forcefully, my language is deliberately moderate in tone.

        If this restraint is not to anyone’s liking then they always have the option to pass me over for the next comment. As an aside, I value your contribution to this blog Brian, along with a few others and while I may seem to be sitting on the fence, its only because I still think human activity is affecting our environment (and that includes climate) but I am feeling more and more sceptictical of AGW based on CO2.

      • Sorry, I was just teasing. It wasn’t sarc/ or anything down-putting.

  5. more stuff like this
    http://www.newton.ac.uk/programmes/CLP/seminars/120815301.html

    There are many good talks in this seminar.. have a look dan.

    • Steve
      Let us consider a few basic points. From the perspective of someone trying to make intelligent governmental policy decisions (ha ha yes you can argue that is never done in the US) what should be the difference between a long term weather model and a climate model?

      A policy maker is interested in what outputs from these models? Is there anything much more than temperature and rainfall? Probably not. There are many other things that the model will have to simulate in order to accurately forecast these two conditions but these are probably the most important two factors impacting government policies.

      Why are climate models developed differently from weather models? The answer is that people are already working to develop better and better long term weather models and the complexity of the system on the whole has stymied their attempts to create models that work with reasonable accuracy for more than a short period into the future.

      This set of facts led “climate model” developers to take a very different approach to try to develop a set of models that were put together completely differently in the hopes that that would better simulate the environment over longer time periods. Unfortunately, these models are not able to meet the desired goal of giving policy makers useful information on the key criteria that they care about.

      • some weather models form the core of GCMs
        who knows what a policy maker needs to know? regional sea level, continental temperature, rainfall.
        They may be satisfied by very SIMPLE models, rather than GCM

        people have thought about this problem backwards. Building a tool, before the use case was defined

      • Steve

        I do not think we have much of a disagreement on the topic, but I am still confused on how some are thinking on this topic. Maybe you can provide insight.
        People fear that a warmer world will have negative consequences for humanity. Almost all of these fears are rooted in the IPCC’s AR4.
        I next look at how the IPCC came to their conclusions regarding “Freshwater Resources and their Management” in that report-
        “Since the TAR, over 100 studies of climate change effects on river flows have been published in scientific journals, and many more have been reported in internal reports.”
        “Virtually all studies use a hydrological model driven by scenarios based on climate model simulations, with a number of them using SRES-based scenarios (e.g., Hayhoe et al., 2004; Zierl and Bugmann, 2005; Kay et al., 2006a). A number of global-scale assessments (e.g., Manabe et al., 2004a, b; Milly et al., 2005, Nohara et al., 2006) directly use climate model simulations of river runoff, but the reliability of estimated changes is dependent on the rather poor ability of the climate model to simulate 20th century runoff reliably.”
        So there it is demonstrated that almost the entire basis of the IPCCs conclusions on water availability are based on climate models to forecast conditions that the climate models have not been able to accurately predict. Somehow, although these climate models were not designed to predict rainfall accurately, they have still been relied upon to determine that there will be a problem.

        One of the things I find interesting is that the IPCC does like to appeal to authority and argue that the science is settled. “Over 100 studies of climate change published” –“peer reviewed studies” (well I guess that those that think the peer review process means much regarding climate models need to reevaluate their position.
        If someone is concerned about higher CO2 leading to negative consequences, how about developing a model that will accurately predict two key attributes—rainfall and temperature. Almost all other points are factors that might go into consideration when developing a model, but they are not really critical to policy makers.

        In the inverse, if you do not have a model that can accurately predict future rainfall as a function of atmospheric CO2, then it simply is not of much value to describe future environmental conditions and of low value to policy makers.

      • The adverse consequences are just hand-waving, albeit very necessary hand-waving. Without them, no Big Funding, no Global Oversight of CO2 emissions, etc., etc.

        And history trumps GHW (Global Hand Waving) models. Warm periods have been those of vigourous growth and expansion. Cooling the reverse.

        Once more, it’s lies all the way down, guv’nor.

  6. Dr. Curry-

    Thank you for this post (and noting Steve sites last week). Being a former Chemist that ended up having to design a software V&V process (i.e. the protocols) that would withstand an FDA audit (for class II medical devices) and later a design control process this topic hits home for me.

    Steve’s comment about the processes he found used for design control (think of Steve’s Engineering Requirements) and V&V in climate models was a bit surprising. Spending some time with the FDA (or the UK equivalent) on what is considered world class processes might be something to consider.

  7. Judith –
    Thank you, thank you, thank you.

    This has been one of the reasons for my disbelief in the consensus science for the last 15 or more years. Having done IV&V for NASA before Easterbrook emerged, as well as knowing many of the people who worked for him at NASA, my view is that his view is inconsistent with practical, real-world V&V.

    That climate models have escaped V&V for this long is, as someone has said – a tragedy.

    • Don’t you mean a “travesty”? :)

      • OK – I’ll accept that correction, although it could be either or both.

        The “tragedy” I had in mind was the waste of time, energy and resources due to the lack of validated models and thus the lack of validated model output for the last 20 years.

        But then, if one is establishing a religion, one does NOT want the basis for that religion to be too closely examined. Ergo, validation is not a desirable process, eh?

        Thank you. :-)

    • It is clear that embracing some proper V&V will bring positive long term benefits. The short form of Easterbrook’s answer here is it is simply “too hard” to do. Not much fun either.

      I can certainly attest to the no fun part, having gone through the FDA process myself several times. If I was a researcher it would be the last thing I would want to spend my time on.

      At some point the model guys are going to get tired of being the weak link and someone is going to force them to do it. Today would be a good day to start. I suggest I would not be the only one not surprised to find out that the error margins are enormous when all the errors are properly accounted for and propagated through a model run. Gigantic. Non-linear feedback with uncertainty will tend to explode pretty quickly.

      One senses that the modeler’s pretty much know that the results will be very bad PR for the cause, and it will be misused by some people for political purposes (and they are probably right).

      The questions is, are they more interested in defending their empire, or getting the right answer?

      Not to go down a conspiracy theory path (which means of course I will), but I believe they likely know this problem is so difficult that they aren’t likely to make much useful headway for a long time. It takes decades to confirm or reject a model’s performance (30 year running averages on trends), and there just doesn’t appear to be a shortcut. HIndcasting is of limited use as climate forcing inputs get increasingly vague the further back you go. Ensemble averaging seems futile unless the error characteristics of each model are known.

      So confirming that the models have enormous V&V problems without also offering up a solution is research funding suicide. Thus the effective “no comment” attitude.

  8. Thank you for this post. It does get to the key issue on the topic imo.

  9. Easterbrook is mixing apples and oranges — and then saying the oranges don’t exist. V&V is a risk management tool, useless unless the risks have been defined. Thus, the level, scope, and rigor of V&V must be appropriate for the software’s intended usages. And of course, scientific usage of the GCMs, say to understand phenomena, is much different than using the models to predict the climate for engineering or policy problems, say the expected extend of changes in weather patterns. Cost effective V&V for scientific usages will be different than for engineering/policy usages.

  10. In any field of science and technology, more or less complex models are used. The state of art for verifying and validating these models has always been the same: you run the model and compare the outputs to the results of an experiment performed with same (initial / limit) conditions…

    As engineer working in space industry, I ‘m quite familiar with models development, verification and validation. And I’ve really shocked and embittered after having read previous Judith’s post related to Climate Models V&V, when I understood that NONE OF THESE MODELS HAD EVER BEEN FORMALLY VALIDATED !

    According to this post Climate Models are only inter-validated i.e. compared to each other but never formally validated…. Such an inter-validation process is of course acceptable and even used in many other fields…. Provided you have a “reference” model that successfully passed a formal V&V process, with which you can compare the new model you intend to validate…

    It thus appears that IPCC and most climate scientists are promoting AGW theory and subsequent policies to cut manmade GHG emissions, that are likely to cost hundreds of billions $, only based on few models that are incriminating human responsibility, but that have never been formally validated ! This is indeed revolting and appears as the biggest scientific fraud I’ve ever seen.

    For sure, the verification and validation of climate models will turn out to be one of the key challenges for Climate Science in the very next years. If failing in this utmost needed objective, Climate Science and AGW will be definitely (if not already) discredited.
    But I’m actually afraid that none of these nice models will ever be able to successfully pass a validation process, as none of them is able to correctly hind-cast past temperature profiles (see IPPC AR4 – WG1 – FAQ8.1)

    • Eric –
      This is indeed revolting and appears as the biggest scientific fraud I’ve ever seen.

      Yes. And this is not the only evidence for fraud.

      But I’m actually afraid that none of these nice models will ever be able to successfully pass a validation process

      Which is very likely the reason the climate community has never done V&V, much less IV&V, on their tools (climate models).

    • “NONE OF THESE MODELS HAD EVER BEEN FORMALLY VALIDATED !”

      Yep, when I had that realization a couple years ago, I pushed my chair back from the computer and said “Woah! What?” To anyone that works in software engineering it’s then clear that the whole of CAGW “science” is a meaningless house of recursively self-referential cards. Almost all of it ties back directly or indirectly to these arbitrarily made-up, untested models.

      It’s actually easy to miss this point because it seems like it just *has* to be physically based somewhere, that you assume it is. All these scientists, journalists and politicians all running in circles and screaming like it’s actually real. No one would be mad enough to insist we stop the economic and industrial output of our civilization based on fanciful, unverified computer guesses, right? Right???

      Welcome to “(M)Alice in Wonderland” where the Mad Hatter runs the computer models and a Cheshire kitty cat keeps the data and code safely hidden behind his smile.

      We end up having so many deep, interesting discussions about various detailed aspects of CAGW theory that we sometimes forget that the freaking models didn’t start with real-world data, never had real-world data entered and are not verified against real-world data. It’s SimScience!

      There’s no real debate or even rational discussion to be had. That one simple fact means full stop, game over. Science is the testing of risky predictions against real-world observations. Therefore, CAGW science is by the admission of its own practitioners, not a science at all. CAGW science isn’t “settled”. It’s not even wrong. It doesn’t exist at all. No prediction vs real-world observation = no science.

      In fact, I’d argue that Lucia may be the FIRST actual CAGW scientist because of her running comparisons of IPCC GCM model predictions against actual observations.

  11. Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works.

    I’d love to see Easterbrook’s references for the above statement. In my entire career, I’ve never seen any of the big names in the Agile movement describe requirements as being unknown and unknowable up front. Emergent, yes, but that’s hardly the same thing.

    • The simple translation for this BS is they are working with this thing like a toy. They play around with it, find something interesting, work with it as while, get bored, work on something else, etc. A deadline approaches, they declare success, and then work on it some more or discard it.

      Not to demean the method totally, installing formal procedures on software too early in a prototyping process just slows things down and can be counter-productive. Fine. Just don’t expect us to take the result seriously until you have done the seriously hard work.

      Academia research tends to not produce the best code, to say the least, in my experience. Quite frankly it is extremely rare that they would be called upon to produce anything remotely ready for industrial use. I imagine it can and does happen, but they don’t have the culture for this type of work IMO.

      These models aren’t ready for prime time, they know it, we know it, and we like to jab them with a stick over it every now and then. They flinch, respond with some plausible deniability and try not to admit anything too damaging. And hey, if someone else overstates the efficacy of their models, well that is not their problem to police that…

      • Yup.

      • That sounds about right. Can you imagine 20 years worth of code modifications? It took Hansen months just to get a copy of the GISSTEMP code out that kinda worked. Before he got it out, another programmer, several really, had better programs online. Now let’s hand them a state of the art supercomputer so they can load FORTRAN subroutines.

      • “Toy” might be a little too harsh, but they definitely have contradictory objectives (which Easterbrook has remarked on, although he dismisses it as a problem). The models are seen by the modellers as an investigative tool, but they’re also being used for production purposes (CMIP runs, etc.). The processes and product may be satisfactory for the former, but hardly the latter.

    • “Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works.”

      That is not Agile development. Not by a long shot. What you are describing is a corruption of the term Agile to excuse a lack of methodology.

      The correct name for what you are describing is trial and error. Given enough time, if we keep looking we will eventually discover what it was that we were looking for.

      Agile development start first by defining what you are looking for and a formal test of how you will recognize it when you see it. Until this is done, how to you know when development is finished?

      In the case of climate models, what is the formal criteria that determines that you have a working model? For example, if you today wrote a model that was 100% reliable in predicting the climate 100 years in the future, how would you know?

      What would prevent you from in fact breaking the working model by making changes that would make it unreliable, in the mistaken belief that you are improving the model?

    • I think I agree with the objections and caveats (in general): See in Curry, Judith A. 2010. Climate model verification and validation. Scientific. Climate Etc. December 1. http://judithcurry.com/2010/12/01/climate-model-verification-and-validation/#comment-18044

      However, I suggest that a requirements document precedes and is addressed to a customer audience. The specification is discussed with the customers but is addressed to the implementers.

      Further, I suggest that a predictive model is not “validated” by comparing its results to requirements, specifications or test scenarios. Its measure of success are future observations. Estimating the predictive uncertainty is useful, but the actual uncertainty appears in production (prediction).

      It has been a long time since I studied statistics. At that time, correlation implied goodness of fit (how well the factors “explained” the observations). Perhaps the definition has changed. If not, the Taylor diagrams referenced by reanalysis.org suggest either some factor is missing, there may be problems with data, or both. So, fire away and bring me up-to-date. (I don’t have problems with reanalyses per se, they appear to be a good idea. My problem is with placing too much confidence in the reliability of the result.)

      Taylor diagram of Global Annual precipitation skill of existing reanalyses in 12.3a_Bosilovich.pdf (February 2011)
      Precipitation Skill Correlation (p 18) ranges from (68 ERA40?) to (92 CMAP?)
      http://www.wmo.int/pages/prog/gcos/aopcXVI/Presentations/12.3a_Bosilovich.pdf

      A Taylor diagram also appears in MERRA Brochure.pdf (June 2010)
      http://gmao.gsfc.nasa.gov/pubs/brochures/MERRA%20Brochure.pdf

      As to “agile” development, it appealed to cost and schedule objectives. I used it, but called and labeled it “prototyping”. The correllary rule was: “Never implement a prototype.”

  12. Agile software development? Good god! Agile is good for ongoing development of commercial and consumer software, where the deadlines are tight, the specs float, new features are being added, and the worst that happens is that the user reboots or loses some work. Agile is also useful if a team is essentially hacking around trying to get their arms around a problem.

    But for serious high grade tools with high impact like GCMs, at some point you must get off the agile merry-go-round and institute some serious V&V.

    I’m frankly horrified to learn this. It means that the orthodox modelers are still treating climate science as their own private sandbox in which to play and no adults are allowed.

    Furthermore it means that they have not really thought through how to check their work because they are so busy tweaking the nextest, bestest iteration — which, in my cynicism, I suspect is the iteration that gets them closer to the results they’ve already decided they want.

    All modeling efforts will inevitably converge on the result most likely to lead to further funding.

    — Charlie Martin, Reasons To Be A Global Warming Skeptic

  13. Norm Kalmanovitch

    I am a Professional Geophysicist and my professional code of practice demands that I work only from hard data and I am not afforded the luxury of dabbling in unfounded theory as academics are allowed to do.
    Reservoir models are far more complex than climate models in the modelling of each cell but orders of magnitude less in size. Reservoir models serve an engineering purpose of optimizing recovery at the lowest cost and if a parameter was not fully evaluated the model is less than perfect (as is usually the case) and the economics suffer to some degree, but not to the degree that developing the reservoir strictly on gueses assumptions would result in.
    I also use models to test assumptions about the seismic response of various reservoirs. If the model confirms my interpretation of the seismic data it doesn’t prove me correct mut merely increrases the confidence. If the model refutes my interpretation I know that I am wrong.
    If the climate models were used properly they would first have been used to test the CO2 forcing parameter for validity by doubling and quadrupling the CO2 as well as cutting it in half and into a quarter of its value and see the effect on global temperature. This would have shown the parameter to be faulty and an investigation into the genesis of this parameter would have revealed that there was no scientific basis for it.
    If applied scientists instead of academics were in charge this would have put an end mto the use of models to predict global temperature changes from changes in CO2 but apparently when data disagrees with theory academics tend to discard data and keep the theory.
    If an engineer built a bridge based on theory as flawed as AGW the bridge would collapse long before the construction was even close to complete.

    • I am a Professional Geophysicist and my professional code of practice demands that I work only from hard data and I am not afforded the luxury of dabbling in unfounded theory as academics are allowed to do.
      Reservoir models are far more complex than climate models in the modelling of each cell but orders of magnitude less in size.

      From my experience in studying oil depletion, reservoir engineers show little interest in the current state of oil reserves in the USA and for that matter, the world. All they are interested in is in optimizing the recovery of oil from a reservoir that they were told to look into. Nothing else matters but the bottom line.

      Serious questions: Do you have an idea of what the aleatory uncertainty in the size of a random oil reservoir is? Do you think this is important information for the public to know, and for the people’s government to make policy decisions on? Why do we not verify and validate estimates of future oil production ? Surely, this is important information for our country to have so that we can remain competitive with the rest of the world. Wouldn’t you say this information is important, regardless of the impact of consuming this oil has on the climate?

      If the climate models were used properly they would first have been used to test the CO2 forcing parameter for validity by doubling and quadrupling the CO2 as well as cutting it in half and into a quarter of its value and see the effect on global temperature. This would have shown the parameter to be faulty and an investigation into the genesis of this parameter would have revealed that there was no scientific basis for it.

      And how do you propose to do this experiment on global temperature?

      • Norm Kalmanovitch

        All that I can say is that you have very little experience and understanding about oil. The Bakken of the williston basin has TOC (total organic carbon) ranging up to 36% and creating potential light oil reserves that are second only to Saudi Arabia, and the Colorado Oil shales contain several times the potential, reserves of the Bakken so in terms of oil reserves the US has by far the greatest oil reserves in the world.
        As for how do I propose to do this experiment on global temperature; the simple answer is that I wouldn’t because physical data demonstrates that if we removed all the CO2 from the atmosphere the greenhouse effect would still be the same 33°C.
        This is because clouds and water vapour alone provide the entire insulation for the 33°C greenhouse effect but because CO2 has such a powerful effect on the radiation from the Earth’s surface at a wavelength band close to the peak thermal radiation the current 390 ppmv CO2 concentration ends up providing about 10% of the insulation that would otherwise be provided by clouds.
        Since the greenhouse effect is essentially stable but the Earth temperature changes over time and the greenhouse effect is nothing more than the difference between the theoretical blackbody temperature of the Earth and its actual temperature it would be the theoretical blackbody temperature that would be changing and this is caused by changes to incoming energy. The only sources of incoming energy are the sun and geothermal heat transfer so this is where I would investigate how global temperature changes

      • All that I can say is that you have very little experience and understanding about oil.

        I certainly have more knowledge and understanding than you do, Norman.
        I wrote a book called The Oil Conundrum. What is the lifetime of a Bakken rig?
        What is the EROEI of Colorado oil shale? Oil shale is not anywhere close to the same as shale oil. That’s what I am talking about. You people in the oil industry knew about resource limitations all along, yet could not lift a finger to get the public at large a better understanding of what was in store for us. What is in store are meager returns on investment.

        That other stuff regarding your GHG theory is nothing but rhetorical spew as far as I can tell. Do you have a reference for any of this?

      • Norm Kalmanovitch

        As you apparently have no knowledge about geology or what is involved in the Bakken play let me enlighten you.
        The Bakken reservoir is a dolomitic siltstone encased in high TOC shale source rock. The kerogin is converted to oil at a thermal threshold and the expansion that takes place causes horizintal fractures in the Bakken reservoir. Horizontal drilling with mile long extensions and multistage fracs allows production from these horizontal fractures at economic rates exceeding 1000bbls/day.
        If you do the math 2000 such wells would replace all the oil imported by the US from Saudi Arabia.
        When people use the term “you people in the oil industry” they immediately expose the fact that they know nothing about oil, geology, geophysics, engineering or science in general but in spite of their lack of first hand knowledge write books based on misconceptions.
        For your information there is no GHG theory because there is no such scientifically defined term as GHG a term that was introduced by Hansen in his 1988 paper which excluded water vapour and ozone the only two gases in the atmosphere other than CO2 which have any sort of measurable effect on the Earth’s radiative spectrum.
        I welcome honest scientific criticism based on fact and this does not fall into either category

      • Horizontal drilling with mile long extensions and multistage fracs allows production from these horizontal fractures at economic rates exceeding 1000bbls/day.

        You should be ashamed of yourself, Norman. How long will that rate last? According to the North Dakota Dept. Mineral Resources, https://www.dmr.nd.gov/oilgas/presentations/ActivityandProjectionsWilliston2010-08-03.pdf, slide 17, a typical rig will drop from 1000 barrels per day down to 200 barrels per day in 2 years. That is the nature of these reservoirs, in comparison to a longer-lived traditional reservoir.

        If you do the math 2000 such wells would replace all the oil imported by the US from Saudi Arabia.

        OK, it will for one freaking year.

        The USA consumes like 20,000,000 barrels per day. So we need 20,000 of these that might average 1000 barrels per day drilled to match our current consumption. And then 20,000 new ones will have to be drilled again every 2 years, and so on, ad nauseum.
        But, the actual plan is to create only 20,000 new wells in the next 10-20 years, so at best it will cover only 5-10% of our oil consumption at any one time, and 2000 new ones will have to be drilled every two years to meet demand. That is not considering how much liquid and other energy will be required during the extraction.

        When people use the term “you people in the oil industry” they immediately expose the fact that they know nothing about oil, geology, geophysics, engineering or science in general but in spite of their lack of first hand knowledge write books based on misconceptions.

        We do it because you oily types won’t. You can’t stand to see the truth spoken. You just keep on deceiving us with marketing blurbs.

        For your information there is no GHG theory because there is no such scientifically defined term as GHG a term that was introduced by Hansen in his 1988 paper which excluded water vapour and ozone the only two gases in the atmosphere other than CO2 which have any sort of measurable effect on the Earth’s radiative spectrum.

        Based on your deception in revealing the actual oil outlook, how can we trust anything you say about climate science?

      • WHT, You kill joy you :) It would be more expensive to use these reserves, refracking may be cost effective once or twice plus new holes. Not like the good old days.

      • I am not a killjoy to the people that will make money off of the Bakken.

      • Norm Kalmanovitch

        WHT
        You seem to have missed the class in arithmatic on multiplication and division.
        1000bbls/d for two years is 730,000bbls @ today’s price of $78/bbl this is cash flow $56,940,000.
        The well cost is about $10million and the lifting cost is under $10/bbl or $7.3million which leaves $39.64million for taxes and royalties which benefit North Dakota and profits which allow further drilling which generates more taxes and royalties which further benefit North Dakota.
        At 200bbls/d which is typically sustainable for at least five more years the net cash flow at today’s oil price is $15,600 minus $2,000 lifting cost which equals $13,600 each day for the next five years once again generating income for North Dakota out of taxes and royalties.
        The year to date import of oil from Saudi Arabia averaged 1129bbls/day (June 2011) so in fact it would only take 1200 wells to replace Saudi imports and as production drops off to 200bbls per day the continued drilling more than replaceds the lost production.
        My recent work has been on developing a simple siesmic data processessing technique to help delineate “sweet spots” for Bakken drilling which will yield the 1000bbls/d plus production; apparently you are not familiar with the factors involved or the positive economic impact of Bakken production
        http://online.wsj.com/article/SB10001424052970204226204576602524023932438.html
        Perhaps in light of your complete ignorance about Bakken oil production you might consider retracting your comment about my knowledge of both oil production and climate.
        Science is about facts and when there are no facts for support non scientists simply use ad hominem attack for their arguments and you are clearly in this category

      • Like I said, no doubt that someone can make money off the Bakken oil, but look at the data that is coming out. Take the Parshall Field wells production, fully developed with a few locations to be drilled around the edge, with 1 well per section. Since 2010, Parshall has shown 30% decline, even with the addition of 40 new wells. That looks about as fast a decline rate as the data point from the typical well.

        Month—oil—-wells
        Jul-11 843130 209
        Jun-11 838090 211
        May-11 909915 209
        Apr-11 881032 211
        Mar-11 991687 210
        Feb-11 918401 209
        Jan-11 1026912 209
        Dec-10 1035831 206
        Nov-10 1036619 204
        Oct-10 1149233 202
        Sep-10 1225116 195
        Aug-10 1277485 195
        Jul-10 1344916 195
        Jun-10 1343254 188
        May-10 1445796 179
        Apr-10 1382433 172
        Mar-10 1451893 169
        Feb-10 1262549 168
        Jan-10 1337524 168

        Explain this to me, Norm. Good for you that you can make money off of this.

  14. ” Greater accountability is needed not only because of the policy applications of these models …”

    I’m a minor practitioner in a field of Applied Science, but whatever I model as a predictive tool is subject to professional and marketplace scrutiny, with real consequences and accountability for stuff-ups. And yes, whatever models I produce are used to try and quantify that which was previously unknown

    Obviously, I agree with Judith C on the urgent need for greater accountability in GCM’s and sequential scenarios that blithely cost much national treasure. Accountability is a much feared concept, it seems from Easterbrook’s disingenuous dissembling

    • I sure wouldn’t want to get a CAT scan from a machine running the latest build from agile software development.

  15. Nobody is being specific about what V&V they need to see. Normally for models V&V involves checking its predictions and hindcasting. The former, of course, is not so easy without waiting for the future to unfold, but if they can’t account for the current climate, they are non-starters. What else is needed, specifically, in the V&V?

    • JimD –
      What else is needed, specifically, in the V&V?

      Nobody knows because nobody has ever written a spec to define what and how the GCM’s are supposed to do. That’s precisely what Easterbrook is saying. Only he’s sayiing that’s SOP – and it’s NOT.

      Without that spec, nobody, including the modeler can tell you whether the model is performing as it’s supposed to.

      Specs are a PITA – but without them, you got nothing. And your output is worthless. Regardless of what ANYONE says.

      • GCMs are to study and understand the general circulation of the earth’s atmosphere. They have been developed and verified for that purpose, and do this job very well in terms of determining the transports of energy by convection and weather systems, and correctly locating the jet streams and tropopause. They give confidence that the major processes are all understood, because if anything important was missing, the bias errors would accumulate when you run them for centuries, which is how they are tested. We have enough observational information about the real general circulation to validate them very well for that purpose.

      • Go watch the video i linked to above. you’ll find an example where the errors introduced by the time stepping algorithm equalled the physical process being modelled. Search around some more and you’ll see that models dont get close to representing the real temperature.
        Without a spec there can be no Independent validation.

        The models are fit for research. They are not fit for policy. The way they become fit for policy is for policy makers to specify what information they need and the accuracy they need it to.

        That spec needs to be formalized. Then models are tested by independent teams to see if they conform with the spec.

        “reproducing reality” is not a spec.
        “understanding the planet” is also not a spec.

      • How do the policymakers become convinced that a model fits the spec for a prediction if the future hasn’t happened yet? This is the dilemma in a nutshell. Maybe a couple more decades of warming will bear the models out, but I don’t see any shortcuts.

      • Jim D.

        You like many other do not understand what validation is. validation is not conformity to reality. Validation is conformity to a specification

        wiki is your friend

        Verification: The process of evaluating software to determine whether the products of a given development phase satisfy the conditions imposed at the start of that phase. [IEEE-STD-610].
        Validation: The process of evaluating software during or at the end of the development process to determine whether it satisfies specified requirements. [IEEE-STD-610]

        in the simulation world the notion of conformity to “reality” is defined and determined by the intended use.

        Spec: The GCM shall simulate the global mean sea level over the period of 1850 to 2000 within the following tolerance: xyz

        There are several important things here.

        1. The spec is written by some one other than the developer. It is typcially in a customer requirements document. The customer drives the spec. As I’ve said, policy makers need to decide what kind of accuracy they need to make decisions. The spec drives everything.

        2. The developers have to meet the spec or demonstrate why the spec is un meetable.

        3. The I V&V team determines whether the product meets the spec.

        In a GCM the problem is they are developed without a spec. No customer is driving development. The scientists developers are thus doomed to failure in the eyes of some customer who comes in after the fact and demands a different product.

        If you understand the problem you will see that it all devolves into a spec war. That fight is the fight that needs to happen. The end user, policy makers, own the spec. They can make it easy or they can make it hard. What most people dont get is that IV & V is actually a good thing for scientists. As it stands now they write the spec, they write the code, and they test their own code. That’s actually bad for them. They cant win.

        If a policy maker says:

        1. The GCM shall be able to simulate every molecule in the atmosphere and hindcast sea level to .0001 mm.

        Then the scientist can merely say. Sorry, if that is your requirement of accuracy to make policy, then we have nothing for you. good bye. we are going to go build research models.

        What develops from this is a give and take between the user who needs a tool and the developer who faces the limits of computing.

        What also happens is you tend to get one or two validated models rather than 22.

      • “Spec: The GCM shall simulate the global mean sea level over the period of 1850 to 2000 within the following tolerance: xyz”

        Unfortunately that doesn’t tell us anything about the predictive ability of the model. Curve fitting will do this, with zero predictive skill.

        However, there is a solution. If the models do have predictive ability then they need to be isolated from the full dataset before training begins. Train the models on the data from 1850 – 1950. Then see how well they do predicting 1950-2000.

        Now restore the models to their state before training and repeat the exercise using 1900-2000 and see how well the model predicts 1850-1900. Keep repeating this, always returning to the untrained model. This won’t actually tell you which models have predictive ability, after all a model can get lucky, but it will eliminate those that do not.

        As soon as you use the full dataset for training you cannot evaluate the models predictive ability, because they already know the answer. You need to wait 50 more years to accumulate data to check the results.

      • JimD –

        We have enough observational information about the real general circulation to validate them very well for that purpose.

        Really? Then why do you believe it (validation) hasn’t happened ?

        You might want to watch this –
        http://www.crassh.cam.ac.uk/events/1449/

        It IS worth the time.

      • Verification has been done. The GCMs simulate a verifiable general circulation. Validation, by its strict definition, requires the future to become the past to check if the model was able to predict it. How else would you validate a model for predicting the future? Or maybe you can use the past 100 years to check if there is a difference between a CO2 increase and not, which has been done. Is that validation or not for a future prediction? If not, what would be?

      • Jim D.

        you are confused. Validation is meeting the spec.

        Requirement: it shall be purple
        Spec: it will be purple
        Test: Purpleness will be tested by measuring the wavelength of light.. blah blah blah.

        Validation occurs when the product meets the spec.

        you are thinking that “valid” means “matches” reality.
        it doesnt unless the spec demands that.

      • “you are confused. Validation is meeting the spec.”

        Verification and validation are often used interchangeably.

        In formal QA verification checks to see if the spec is satisfied. Was it “built right”. Validation checks to see if the product does what the customer wanted. Did we built the right product?

    • Staying well within the confidence interval would somewhat validate the models. If a neutral or cooling period of ten years is not unexpected, I would expect that period toot push the limits of the confidence interval. Then if the confidence interval is realistic enough to allow for the natural variability, how useful is the model?

  16. Models based upon trends (climate history, economics, etc.) are without scientific value. Both the “prediction” and the “data” have to be replicable.

  17. See: S. Fred Singer Uncertainty in Climate Modeling, SEPP Science Editorial #2011-1, (Jan 1, 2011)

    1) Uncertainties of the scenarios that determine the emission of greenhouse gases, principally economic growth, which is closely tied to the use of energy. . . . The IPCC . . .calculates global temperatures for the year 2100 with an uncertainty spread of an order of magnitude [IPCC 2007, Fig. SPM.5, p.14].
    2) Structural uncertainties. . . .uncertainties in climate forcing, both anthropogenic and natural; in climate feedbacks; and in the hundred or so parameters that go into constructing a model, mainly concerned with clouds. . . .
    The uncertainties listed for aerosols are quite large, particularly for the indirect effects of aerosols in providing condensation centers for cloud formation. [IPCC-AR4 2007, Fig. TS-5, p.32]. . . .
    Parameterization is a vexing issue for climate modelers. James Murphy [Nature 2004] lists some 100 or more parameters that must be chosen, using the modelers.”best judgment.” Varying just six of these parameters related to clouds can change the climate sensitivity from 1.5 up to 11.5 degC [Stainforth et al 2005]. . . .
    the feedbacks (from WV and from clouds) may actually be negative rather than positive (as assumed in all climate models). This possibility follows from the analyses of satellite data [by Lindzen and Choi 2010 and by Spencer and Braswell 2010].

    3) Chaotic Uncertainty. . . . One can show [Singer and Monckton 2011] that taking the mean of an ensemble of more than 10 runs leads to an asymptotic value for the trend. . . . of the 22 models in the IPCC compilation of “20 CEN” [an IPCC term for a group of climate models] there are 5 single run models, 5 two-run models, and only 7 models with four or more runs.

    Testing requirements:
    List ALL 100 parameters for every run.
    Report ALL runs. Document the runs for every result.

  18. V&V is not the same thing as an “engineering quality” exposition of how doubled CO2 gets to 3 deg C or so and how that causes problems.

    • According to the strict engineering definition of a simulation product used to make policy decisions, you can’t validate a model until the actual product it is meant to simulate has been built. The problem is that we are in the middle of a grand experiment to figure out what global warming will do. Therefore it is not possible to validate anything.
      I have a feeling that you don’t have any engineering experience or haven’t read the policy directives.

      • “what global warming will do”

        WHT, do you mean AGW? Mankind has already been living with Global Warming and Cooling since I surely don’t know when. You need to be more specific or quantative or something. It seems like you are presenting sloganeering that’s in search of an issue.

        Andrew

      • It’s not sloganeering if the government gets involved in making policy decisions.

        Validation involves the comparison of the M&S behavior and results to the data obtained from another credible domain. The credible domain is either believed to be the real-world, has been proven to closely approximate the real world, or is from a source that is recognized as expert on the relevant characteristics of the real world. The standard of quality that the M&S is expected to meet is a part of this identification process.

        This kind of definition is easy to find, and you can consider it either gobbledygook or it says that we need to have a proxy that closely approximates the entire earth to be able to validate the model. Or it says we need an expert to validate a model, so if the expert wants to step up, please do.

        We certainly can’t use the earth itself because the future data has not played out yet. If you think the future is what happened in the past, then I suggest you can volunteer this as a model that the government can use.

      • WHT,
        Governments are the bestest at slganeering.
        Just like your problem with peak oil, you seem to miss out on a lot of reality.

      • Just like your problem with peak oil, you seem to miss out on a lot of reality.

        Yea, just like that beat-down I gave Norman upthread. He shouldn’t have let on that he was a geophysicist :) :)

      • WEb –
        I have a feeling that you don’t have any engineering experience or haven’t read the policy directives.

        Well, I do, Web.

        And validation of the models is absolutely possible. It just hasn’t been done – for reasons I won’t expound on right now.

        But, as Andrew said – the climate system was built a long, long time ago and it’s just waiting for us to figure out how it works. GCM’s might” be a useful tool in that process if they were properly constructed – including V&V. But without V&V all you’ve got is a defective Tinkertoy.

        Remember – we’re not talking about validating the climate – only the GCM’s that simulate the climate. You got the shoe on wrong end. It doesn’t work well if you put it on your head. :-)

      • Well, I do, Web.

        And validation of the models is absolutely possible. It just hasn’t been done – for reasons I won’t expound on right now.

        A space shuttle simulator was validated once the real thing was built, launched, and data was collected, right?

        Was the space shuttle simulator validated before that time? If the answer is yes, then what happened to that validation after the shuttle was launched. Was it then super-duper-validated?

        Remember – we’re not talking about validating the climate – only the GCM’s that simulate the climate. You got the shoe on wrong end. It doesn’t work well if you put it on your head.

        You have yourself in a trick box. We are talking models of future climate here. This future climate involves parameter spaces (CO2 levels, rate of CO2 increase, etc) that haven’t happened yet and can’t be duplicated in a laboratory experiment, which is what some customers require for validation. This is the problem of not being able to state that whatever happens will be a stationary ergodic process. Sure, you can assert that something like this happened in some historical paleoclimate data, but I don’t think this will help with validation, because then you run into a problem of substantiating the paleoclimate data.

        But, as Andrew said – the climate system was built a long, long time ago and it’s just waiting for us to figure out how it works. GCM’s might” be a useful tool in that process if they were properly constructed – including V&V. But without V&V all you’ve got is a defective Tinkertoy.

        Ok, then figure it out.

      • Web –
        I was finding errors in that “validated” Space Shuttle software long after the Shuttles were operational. If you want to talk about that, we’re gonna have to do it someplace else – this isn’t the place.

        But, as Andrew said – the climate system was built a long, long time ago and it’s just waiting for us to figure out how it works. GCM’s might” be a useful tool in that process if they were properly constructed – including V&V. But without V&V all you’ve got is a defective Tinkertoy.

        Ok, then figure it out.

        That’s the purpose of the exercise. And the fact that you think future climate performance is unpredictable “may” be correct. BUT – the system is what it is – and if it’s not totally chaotic (or perhaps even if it is) we WILL at some point in the future figure it out. Your future comment is irrelevant. ALL models predict future performance of the system they model IF they’re properly constructed. GCM’s do NOT yet do that.

        In the meantime, the models cannot be believed without V&V. So the questions are – a) how to do that and b) how to get the modelers out of the way so it CAN be done.

      • That’s why I brought it up Jim. And I kind of figured that you would say that this is not the place to discuss the problems that you encountered. It looks like you won’t admit that you can’t validate without a “built product” in place. Sounds like your excuse is the typical management run-around. The problem is that I don’t work for you, and you can’t tell me that “this isn’t the place” for the discussion. Yet, of course it is, because NASA and agencies like that are where the definitions of V&V are institutionalized.

        That’s the purpose of the exercise. And the fact that you think future climate performance is unpredictable “may” be correct. BUT – the system is what it is – and if it’s not totally chaotic (or perhaps even if it is) we WILL at some point in the future figure it out. Your future comment is irrelevant. ALL models predict future performance of the system they model IF they’re properly constructed. GCM’s do NOT yet do that.

        I didn’t say anything about unpredictability, I was talking about not being able to compare against real-world data, where no data actually exists.

        Look, I have a model of CO2 rise given a fossil fuel forcing function that I have worked at for a little over a year now. It works very well for the data up to the current time, and I think it will continue to work in the future. It is also completely verified in the software sense because it takes about a dozen lines of code to do the convolution, and I can check that against analytical expressions perfectly well. Yet, I harbor no illusions that it will ever be validated, other than somebody deems it “valid” (the other definition of validation, which means that a customer or whoever I made it for — hypothetically speaking since this is a hobby for me — is willing to buy into it).

        It either works or it doesn’t, just like any theory needs experimental support. And until that experimental support keeps playing out, the validation will always have to wait.

      • . Yet, I harbor no illusions that it will ever be validated, other than somebody deems it “valid” (the other definition of validation, which means that a customer or whoever I made it for — hypothetically speaking since this is a hobby for me — is willing to buy into it).

        Indeed a constraint on various confidence assumptions,and why vigilance is necessary.In his prescient paper Nalimov (1976) summarises succintly.

        A few words must be said about another approach to simulating complex systems by computers, which does not refer to “mathematical spectroscopy.” I have in mind the grandiose program of simulating five
        ecosystems: the desert, coniferous and foliage forests, the tundra, and the prairie, carried out in the United States in 1969-1974. Expenses for the research of only the three latter systems exceeded 22 million dollars, 8.6 million of which were allotted directly for the simulation, synthesis, and control of the whole project; 700 researchers and postgraduates from 600 U.S. scientific institutions participated in the project; 500 papers were published by 1974, though the final report is not yet ready. Mathematical language was used in the project to give an immediate (not reduced) description of the observed phenomena. A lot of different models were used which were divided into blocks with an extremely great number of parameters (their total number reaching 1,000), and yet it was emphasized that the models described the system under study in an approximate and simplified manner. The researchers had to give up any experimental verification (or falsification) of models; instead they used “validification,” which means that a model is accepted if it satisfies the customer and is particularly favorably evaluated if bought by a firm. At present, all these activities are being evaluated. Mitchell et al. (1976) conducted a thorough analysis of the material and evaluated the simulation of the three ecosystems in an extremely unfavorable way. From a general methodological standpoint, the following feature is important to emphasize: the’ language of mathematics, for the first time, is allowed to unify different biological trends, and this has happened without a generally novel or profound understanding of ecology. Mathematics was used not to reduce complexity but to give a detailed, immediate description. This is a new tendency in science. But where will it lead? The time is not yet ripe for a final conclusion, but scepticism is quite in order.

      • WebHubTelescope
        See: Steve McIntyre Some Thoughts on Disclosure and Due Diligence in Climate Science Feb. 14, 2005 ClimateAudit.org

        I spent about 35 years in the mining and mineral exploration business. During the last 20 years of this, I worked in the micro-cap exploration business and have a great deal of practical experience in dealing with prospectus and securities issues from the company point of view. Concepts like audit trails, due diligence and full, true and plain disclosure become second nature when you work in such an environment.

        McIntyre’s observations highlight the severe lack of due diligence in climate science:

        An audit trail in this case is easily defined: the data in the form used by the authors and the computer scripts used to generate the results. . . .None of the major multiproxy studies have anything remotely like a complete due diligence packages and most have none at all. The author of one of the most quoted studies [Crowley and Lowery, 2000] told me that he has “mis-placed” his data. . . .
        In the case of the Mann et al [1998,1999] study, used for the IPCC’s “hockey stick” graph, Mann was initially unable to remember where the data was located, then provided inaccurate data, then provided a new version of the data which was inconsistent with previously published material, etc. . . .
        authors typically refuse to make their source code and data available for verification, even with a specific request. Even after inaccuracies in a major study had been proven, when we sought source code, the original journal (Nature) and the original funding agency (the U.S. National Science Foundation) refused to intervene. . . .
        In 2004, I was asked by a journal (Climatic Change) to peer review an article. I asked to see the source code and supporting calculations. The editor said that no one had ever asked for such things in 28 years of his editing the journal. He refused to ask for source code; the author refused to provide supporting calculations. . . .
        Ross McKitrick and I have demonstrated that there were serious calculation errors in the most famous IPCC graph–the 1000 year climate hockey stick. In this case, the methodology had been incorrectly described in the journal publication. . . .
        it was nobody’s job to check if the IPCC’s main piece of evidence was right. The inattentiveness of IPCC to verification is exacerbated by the lack of independence between authors with strong vested interests in previously published intellectual positions and IPCC section authors. . . .
        For someone used to processes where prospectuses require qualifying reports from independent geologists, the lack of independence is simply breathtaking and a recipe for problems, regardless of the reasons initially prompting this strange arrangement. It seems to me that prospectus-like disclosure must become the standard in climate science, certainly for documents like IPCC reports . . .
        In business, “full, true and plain disclosure” is a control on stock promoters.. . .There is no such standard in climate science. . . .
        now that huge public policy decisions are based, at least in part, on such studies, sophisticated procedural controls need to be developed and imposed. Climate scientists cannot expect to be the beneficiaries of public money and to influence public policy without also accepting the responsibility of providing much more adequate disclosure and due diligence.

        When will Climate Scientists implement professional due diligence, validation and verification corresponding to their trillion dollar policy recommendations?
        Should not at least 30% of budgets be applied to such efforts?

      • When it comes to McIntyre, give me a break. The mining and mineral exploration business has about the same level of validation procedures that a venture capitalist would demand. Kriging as used in exploration is the application of a stationary ergodic process to map out what might be underground.

        If McIntyre is such an expert at exploration issues, why does he not discuss issues that IMO are arguably much more important than climate science, that of non-renewable energy resource limitations? I am sorry to change the subject, but this has always bothered me about McIntyre. The amount of audit trails in the fossil fuel industry is abysmal. Sure you might be able to get some data, but it costs thousands and thousands of dollars in yearly subscription fees to get anything useful, and even this is probably corrupted by corporate book-keeping. Yet McIntyre has the gall to call for audit trails on some academics? What a phony.

        I know all this because McIntyre always reprimands me when I try to contest him on his blog. He doesn’t want to talk physics and he doesn’t want to discuss forcing functions, as those he deems as inappropriate. Fortunately, ClimateEtc has the “Etc” which allows us to make some real progress in our understanding.

      • WebHub’s in a large crowd who all wish Steve were doing something else.
        ==================

      • If Mohammad’s such a mighty warrior, why isn’t he fighting for the Pope? Non-sequitur, much?

      • No problem, I do both oil depletion modeling and climate science modeling.
        Do you lift a finger much?

      • Web –
        If McIntyre is such an expert at exploration issues, why does he not discuss issues that IMO are arguably much more important than climate science, that of non-renewable energy resource limitations? I am sorry to change the subject, but this has always bothered me about McIntyre.

        1) Because he does what HE wants, not what would make you happy. And he doesn’t need your permission.

        2) “non-renewable energy resource limitations” are your own personal crusade. An eventually they may become important. But you and I are likely to be long gone by the time that happens. I suspect he knows that far better than you.

        you might be able to get some data, but it costs thousands and thousands of dollars in yearly subscription fees to get anything useful

        And this is differeent from climate science HOW ?

        Yet McIntyre has the gall to call for audit trails on some academics

        He’s retired – this is his hobby. Why do you have a problem with HIS hobby? Would you wamt me to dictate what you do with your time after you retire? The only one who can do that to me without losing body parts is my wife.

        I know all this because McIntyre always reprimands me when I try to contest him on his blog. He doesn’t want to talk physics and he doesn’t want to discuss forcing functions, as those he deems as inappropriate.

        1) It’s HIS blog. He doesn’t have to put up with your whining. Nor does he have to allow comments that are not appropriate to the purpose of his blog. Why do you think you should be “special”?

        2) He’s not a physicist – why should he discuss physics just because YOU want to do so?

      • “I know all this because McIntyre always reprimands me when I try to contest him on his blog. He doesn’t want to talk physics and he doesn’t want to discuss forcing functions, as those he deems as inappropriate. Fortunately, ClimateEtc has the “Etc” which allows us to make some real progress in our understanding.”
        ——————
        He does this because he doesn’t pretend to be an expert in every area like most commenters do, only areas where he has working expertise. He is narrowly focused on the statistical methods used by authors to support their conclusions period. Refreshing.

      • “non-renewable energy resource limitations” are your own personal crusade. An eventually they may become important. But you and I are likely to be long gone by the time that happens.

        IOW, Jim Owen’s motto is to get it while the getting is good, and let future generations sort it out.

    • Steve –
      Exactly. But if the doubled CO2 -> 3 degC process is uncertain or unkown or badly handled, then V&V isn’t the major problem – and, in fact, is useful only to corroborate that the GCM construction is problematic. Which looks a lot to me like one of the present problems with the GCM’s. But YMMV.

  19. Thoughts on Verification & Validation

    Implement standards for scientific forecasting
    GLOBAL WARMING: FORECASTS BY SCIENTISTS VERSUS SCIENTIFIC FORECASTS Kesten C. Green and J. Scott Armstrong, ENERGY & ENVIRONMENT VOLUME 18 No. 7+8 2007

    Validate water transport
    Trenberth, Kevin E., John T. Fasullo, Jessica Mackaro, 2011: Atmospheric Moisture Transports from Ocean to Land and Global Energy Flows in Reanalyses. J. Climate, 24, 4907–4924.
    doi: 10.1175/2011JCLI4171.1

    Using the model-based P and E, the time- and area-average E–P for the oceans, P–E for land, and the moisture transport from ocean to land should all be identical but are not close in most reanalyses, and often differ significantly from observational estimates of the surface return flow based on net river discharge into the oceans. . . . Precipitation from reanalyses that assimilate moisture from satellite observations exhibits large changes identified with the changes in the observing system, as new and improved temperature and water vapor channels are assimilated and, while P improves after about 2002, E–P does not. . . . Results are consistent with the view that recycling of moisture is too large in most models and the lifetime of moisture is too short.

    Report ALL the results from EVERY run.
    S. Fred Singer Uncertainty in Climate Modeling, SEPP Science Editorial #2011-1, (Jan 1, 2011). e.g. See <a href=http://www.drroyspencer.com/2011/09/the-rest-of-the-cherries-140-decades-of-climate-models-vs-observations/Spencer’s 140 runs

    Verify atmospheric mass closure See:
    The Mass of the Atmosphere: A Constraint on Global Analyses, J. Climate, Vol. 18, 864-875

    The dry air mass should be virtually constant to the order of 0.01-hPa surface pressure and is estimated to be 5.1352 0.0003 1018 kg, corresponding to a surface pressure of 983.05 hPa.

    Check precipitable water – Miskolczi finds 2.61 prcm in TIGR versus 1.26 prcm in USST-76.
    See: presentation at the European Geosciences Union General Assembly, Vienna, 7 April 2011 Slide 3/28

    Verify Energy Closure
    Especially check latent heat flows and water flows:
    Trenberth, Kevin E., John T. Fasullo, Jessica Mackaro, 2011: Atmospheric Moisture Transports from Ocean to Land and Global Energy Flows in Reanalyses. J. Climate, 24, 4907–4924.
    doi: 10.1175/2011JCLI4171.1

    Using the model-based P and E, the time- and area-average E–P for the oceans, P–E for land, and the moisture transport from ocean to land should all be identical but are not close in most reanalyses, and often differ significantly from observational estimates of the surface return flow based on net river discharge into the oceans. . . . Precipitation from reanalyses that assimilate moisture from satellite observations exhibits large changes identified with the changes in the observing system, as new and improved temperature and water vapor channels are assimilated and, while P improves after about 2002, E–P does not. . . . Results are consistent with the view that recycling of moisture is too large in most models and the lifetime of moisture is too short.

    Validate atmospheric lapse rate against data, thermodynamics
    K. Trenberth et al. Accuracy of Atmospheric Energy Budgets from Analyses

    The standard 17-level reanalysis pressure level archive does not adequately resolve the atmosphere, and we propose a new set of 30 pressure levels that has 25-mb vertical resolution below 700 mb and 50-mb vertical resolution in the rest of the troposphere. The diagnostics reveal major problems in the NCEP reanalyses in the stratosphere that are inherent in the model formulation, making them unsuitable for quantitative use for energetics in anything other than model coordinates.

    Robert H. Essenhigh, “Prediction of the Standard Atmosphere Profiles . . . Energy & Fuels 2006, 20, 1057-1067 & Sreekanth Kolan Study of energy balance between lower and upper atmosphere

    What is the status of implementing these issues?

    • The models can internally account for all these quantities perfectly, but unfortunately observations can’t, so which to believe? This does open up the problem of the data with which the V&V is to be done, which is far from perfect in its budget closure as these papers demonstrate well.

      • Jim D
        “The models can . . ”
        But do they? Are any of these checks actually being done?
        Furthermore, see Green and Armstrong for systemic failures on scientific forecasting in climate science.

      • A model would have to conserve mass, water, energy by its construction, and the equations it uses can account for all the flows of these quantities. Nothing is lost, missing or hidden like in the observations.

      • A model would have to conserve mass, water, energy by its construction,

        Are you gonna tell me that you KNOW ALL the details of the carbon cycle? Or of the ocean circulation?

        and the equations it uses can account for all the flows of these quantities.

        I started life as an aero engineer. Aero/hydrodynamics was part of that – and I was damn good with it. And I don’t believe that at all.

        Nothing is lost, missing or hidden like in the observations.

        Pull the other one, Jim – it got bells on it. Where computers are concerned, I don’t believe in fairy tales. And that’s what you’re trying to tell me.

        If the model doesn’t fit the observations, then it doesn’t fit the system – regardless of any other consideration. And in this case, the system is NEVER wrong.

      • Some models have a carbon cycle model, so they can represent where the carbon goes and comes from. I don’t understand why you think things in models can’t be accounted for numerically to machine precision when all the equations are given. Observations, on the other hand, are a different story. We don’t have a full 4d picture of the atmosphere, the way the model gives. We only have samples. It is unfortunate, but that is one of the limitations in verifying models, and in trying to improve them.

      • We don’t have a full 4d picture of the atmosphere, the way the model gives.

        That’s part of your answer.

        1. You CANNOT model something for which you cannot describe the process.
        2. I don’t believe you have ALL the N-S equation solutions. So you CAN’T model them.
        3. The BEST you can do is to homogenize the processes – and that’s not reality. Are you really gonna tell me that the processes in the Arctic are the same as those in the Tropics? That those in the desert are the same as those in the mountains? I’ve spent too much time in those places living with the reality to believe that.

        Someday – it may well happen. I believe that. But WE are not there yet.

        And the recent cloud papers are something you should pay more attention to.

      • Jim D
        Even if the physics is known, that does NOT mean that the models have accurately incorporated energy and mass conservation. In the teething stage of CFD model development, lack of mass or energy conservation was a key discovery in correcting some of the models. I have not heard if such basic tests have been run – especially in light of the systemic errors in the data.
        Then add Jim Owen’s observations.
        Till proven otherwise, engineers must assume the models have NOT been verified or validated, and can NOT be relied on.

      • The models can internally account for all these quantities perfectly, but unfortunately observations can’t, so which to believe?

        If you have to ask that question, then your scientific method is defective.
        If you seriously believe that models can internally account for all these quantities perfectly then your knowledge of computers is defective.

      • Are you talking about numerical round-off error which is orders of magnitude better closure than the observations currently have. The observation error is in about the 3rd or 4th significant digit, while round-off is in about the 7th or 8th on 32-bit machines.

      • Jim D –
        Are you talking about numerical round-off error

        LOL!!!

        Hell no. You’ve obviously never dealt with massive software systems – or with machine language systems – or even minor software driven science instruments.

        Example 1- in 1985 I was doing IV&V on the Shuttle/HST Interface software and found errors in the Shuttle software. You DO realize, of course, that the Shuttles had been flying for 4 years at that time? And that the Shuttle software had been Validated, Verified and Certified?

        Example 2 – In 1991 the UARS launch (release from the Shuttle) should have been completed by 2100 EDT. The spacecraft software had been Validated, Verified and Certified. Launch was NOT completed until o430 EDT the next morning because of “glitches” in the spacecraft command system software – that were only discovered after 2 years of spacecraft testing and were only critical under launch conditions.

        Both of those examples were on Validated, Verified, Certified and thoroughly tested software. And I have several dozen other similar stories, ALL of which occurred on softare/hardware systems that were far more reliable, better tested than ANY of the GCM’s you think operate perfectly. And I flat out don’t believe that.

        So anyone who tells me that GCM’s don’t need IV&V is just saying that they don’t know what they’re talking about or that they’re hiding something or that they’re just a damn fool.

        Numerical round off error, BTW, is larger than you seem to realize. And cumulative – which observation error is not.

      • Hear ye! Hear ye!
        Well spoken from a world class expert.

        Now how do we get the politicians representing “We the People” to require / mandate / insist on / and fund the IV&V essential to give objective reliable information for impartial policy evaluation.

      • You DO realize, of course, that the Shuttles had been flying for 4 years at that time? And that the Shuttle software had been Validated, Verified and Certified?

        Just like I predicted upthread, Jim. You didn’t have to tell anyone this. A valid model on an unfinished system will become invalid the minute the final product is let loose in the wild.

        As I said, a space shuttle simulation and other products like that can’t be validated until the actual systems are produced. So what’s the point of validation when you realize that it is just a bureaucratic step in the cycle?

        The climate right now is not the final climate and so definitions of validation are out the window, unless you can find a stand-in for the earth and its atmosphere, which I kind of doubt exists.

      • Web –
        Just like I predicted upthread, Jim. You didn’t have to tell anyone this. i>

        You REALLY don’t understand models or modeling, do you?

        How many major systems have you worked with, Web? How many models? Have you EVER done IV&V?

        Those examples are NOT reason to NOT do IV&V. They’re the reason you do BETTER IV&V. They’re the reason you don’t allow your spec writers and coders to do your V&V. As is done in climate science.

        A valid model on an unfinished system will become invalid the minute the final product is let loose in the wild.

        No – a valid model on an unfinished system allows prediction of what the system will do when it is, as you say “let loose in the wild.” I used that kind of model numerous times. I used them to run large sacle system tests on spacecraft like UARS, Landsat and —- wait for it —– Hubble Space Telescope months or even years before those spacecraft were launched. And I used those same models do investigate anomalies and equipment failures after the spacecraft were launched and operational.

        In climate science the purpose is to predict future climate – and, in time, that “may” become possible. I believe it will. If you are right, then we’ve wasted $100+ Bn on the GCM’s. But I don’t believe you’re right, in large part because you apparently know little about the subject. Not being insulting – just telling it like I see it.

      • Jim Owen, your description of validation is not yet clear. For a climate model to be validated, does it have to prove it predicts future climate? If so, how would you go about that task? What spec defines success? How about if we wait 20 years and find out the current generation of models were right, would they be validated? The climate system is not self-contained like an engineering system, so does that make a difference to how you would define specs and validation, or would it be the same anyway? Lots of talk about specs and validation, but still nothing specific is being said here.

      • Also, Jim Owen, I was not saying GCMs operate perfectly, but that their budgets of mass, water and energy for sure are better closed than the budget we have for the real observed system. That is, it can all be accounted for within the model equations. This is not saying that models are perfect representations of reality, and they never can be, having far fewer degrees of freedom. Perhaps in engineering you can match the available degrees of freedom on a small computer, but for the climate you can’t by a long shot.

      • The space shuttle is a sensitive topic for obvious reasons.

        One thought at the time was that the launching bending moment was beyond spec, and the o-rings were a red-herring.

        You can see it right in the official presidential commission report from 1986:
        http://history.nasa.gov/rogersrep/v5p1351b.htm

        BASE MOMENT: DESIGN VALUE 347.000,000 IN-LBS. 51-L RIGHT SRB 291,000,000 IN-LBS.

        SRB=solid rocket booster, of which there are TWO, a right and a left.

        Assuming identical torques for left and right:
        291,000,000 * 2 = 582,000,000 in-lbs >> 347,000,000 in-lbs

        Not within margin.

        It was just weird speculation that was going on at the time, could be a typo.

        Stuff fails. Shuttle goes into dynamic overshoot. Climate goes into dynamic overshoot. God loves you, deal with it.

      • Jim D –
        Jim Owen, your description of validation is not yet clear. For a climate model to be validated, does it have to prove it predicts future climate?

        No, Jim. It has to be proved that it will meet the spec AS WRITTEN. That’s not possible if there is no written spec.

        If so, how would you go about that task? What spec defines success?

        Validation is testing the system (GCM in this case) against the spec. One common error in the spec is non-specificity. In which case validation means testing the system to and beyond the limits the spec writer/modeler/whoever EVER imagined or intended. I’ve broken more than one system because the spec failed to define the limits that the system was designed for.

        Example – the HST science data interface was designed for the full realtime spacecraft data rate – 1 Mb. But the transfer rate between the Control Center and the Science Institute was not sufficient to handle that data rate. Bad interface specification – so – a major problem. And this wasn’t discovered until very close to the planned spcecraft launch date. Bad dog – no cookie.

        How about if we wait 20 years and find out the current generation of models were right, would they be validated?

        No, they’d be “verified”. See above for “validation”.

        The climate system is not self-contained like an engineering system, so does that make a difference to how you would define specs and validation, or would it be the same anyway?

        The spec is NOT on the climate – it’s on the GCM. Specifically WHAT do you want the GCM to do? Do you want to predict temp – or precipitation – or….whatever? Or everything? To what accuracy, with what uncertainty? Gotta make choices – and nail those choices to the wall (or at least on paper). I coujld do that for you – but it takes time – LOTS of time. And therefore – money. It took us 6 months to determine the spec for the ground C3 system for Landsat. GCM’s not so simple.

        The purpose of the spec is to define what results you want from the system, how you intend to build the system to achieve those results, what data input will be used, what data output is required, coding language, computer operating characteristics, interfaces, data processing chioces/options – ALL the physical and processing details necessary to build, operate and evaluate the system and it’s outputs.

        And you CANNOT write a spec on any process that you CANNOT specifically define.

        Which is one place the present generation fails from the beginning. Example – AFAIK, nobody has specifically defined the effect of clouds. Or GCR’s. For example – have you ever hiked through desert? I have – and I’ll tell you that clouds have a HUGE effect on local temps on the ground. Local in this case meaning anything from a single cloud shadow to a 100o square mile area (or more). HOW? WHY? And how do you define that time-varying effect so it can be included in your GCM?

        The climate itself is involved in the spec ONLY in terms of what data and processes are to be used, how (whether) they can be coded and what data is available/required.

        Lots of talk about specs and validation, but still nothing specific is being said here.

        There’s been LOTS of specific information here, Jim. For those who understand it. See ANYTHING written by Steve Mosher for example.

        One of the (your) problems is that you’re trying to relate the information to the climate. And that’s NOT where the problem is – it’s in your understanding of HOW to build a GCM to produce an accurate, reliable and believable predictive output. And that can only be accomplished by using ALL the necessary data, ALL the necessary processes AND an accountable, properly documented construction process. Do you REALLY believe that’s been done?

      • Jim Owen,
        The believers are truly painting themselves into a corner. But they do not seem to have a problem with tracking paint all over the place.

  20. I like the Easterbrook’s parallel to agile software development. While the development process may be one of groping along towards a goal, in the end, you still need to demonstrate that the software can do what you want it to do. In model development the goal is also known. But for all of the current models, none are able to demonstrate that they can do what they were built to do. No one knows if they are 10% along the development cycle or 80% along the development cycle. And yet we expect governments to make trillion dollar decisions based on them. And we expect that the “climate emergency” debate is over based on them.

  21. We cannot scientifically model everything. We cannot model the future of evolution, scientific breakthroughs, economics, sociological trends, fashion or the climate.

    The universe is governed by causal behavior, but it is not deterministic. There are things that we can never know.

  22. What climate model development principally lacks to be Agile is well-explained in the first line of the wiki page, “collaboration between self-organizing, cross-functional teams.”

    As Dr. Curry points out,

    “However, climate models (particularly the NCAR Community Climate Models) are built as tools for community research, and these models are used for a wide variety of scientific research problems. Further, these climate models are used in the IPCC assessment process, which reports to the UNFCCC, and in national-level policy making as well. Greater accountability is needed not only because of the policy applications of these models, but also to support the range of scientific applications of these models by people that were not involved in building the models.”

    You can’t be Agile if you don’t include all the players.

    All you can be is sloppy, old-school hackers.

  23. I notice that V&V enthusiasts are keen to tell others how they should do it.

    I think leading by example would be more effective. Write a GCM, verify and validate it. Then we can compare it with the others and see which approach works best.

    • How would we tell which works best?

    • You are not big on the principle of independent oversight and adhering to professional standards are you?

      Defendent to judge in Stokesworld:

      ‘Yerhonner, the plaintiff claims that I was negligent in treating his child who subsequently died. But I am a very clever doctor and lots of my mates agree with me. The plaintiff has not even attended medical school, let alone nearly finish it like I did. So she is not qualified to make any judgement on this case. And neither are you’

      Judge:

      ‘Good point Mr Corpsemaker. Case dismissed’

      Happy?

      • No, the point in that analogy is that the child received the only treatment available. You can jail the defendant, but that’s not much use in the absence of an actually working alternative.

      • Great. You propose that we shoud give carte blanche to a few zeaots who believe that they there is some dire and dreadful disease afoot. And that killing the patient is OK because ‘there is no alternative’

        The alternative is easy. Keep the zealots away from the patients. They do not have a serious disease, they do not need zealotry and they will not die. Simples.

    • That is not how IV &V works.

      The idea is precisely that you do not write, verify and validate by yourself.
      I = independent.
      That means, yes, somebody else gets to tell you that you failed. I V&V guys don’t write code. They test it. two different creatures. and two different chains of command

      • Fine. Create two teams.

      • You actually need three teams. The critical team is the requirements team. That is the team that actually works with the customer (policy makers) to define the use cases. I dont think you understand what is involved in validation. validation is conformity to a spec. It is not something that you can just “do” yourself. Part of the issue is that the process of development has not been driven by requirements. that is why you have 22 different models. A huge waste. So to do IV &V you need a buy in from policy makers who have to explain what they intend to use the model for.

        In terms of actual models are a few that could be put through I V&V, so the notion that somehow one has to rewrite from scratch also so a gross misunderstanding of the process and its benefits. The chief beneficiaries are actually those who are writing GCM code today.

      • Steven stated that V and V needs to be done independently of the original code writers. Whilst I accept that he would know from his experience, I cant see how this splitting of functions can work effectively in practice.

        Don’t climate scientists ever work in multidisciplinary environments? I ask this because setting up a separate team to handle V and V seems wasteful.

        I have worked on university IT projects during my student days and while the systems design (based on the functional specs provided) and coding is usually done by IT professionals in the team, the V and V process is usually carried out by the “subject matter experts” in the team, who drew up the functional specs in the first place.

      • V&V from anything other than an independent team is open to mischief – like only reporting certain runs, etc. Mosher is completely correct. Independent verification is the only acceptable type.

      • Peter
        The challenge is dealing with deception and wickedness.
        Medical research has found it essential to do double or even triple blind clinical experimentation to get objective results without researcher bias.

        How much more independent rigorous V&V is essential in climate science where we have very strong political advocacy with $1900 trillion in public funding being advocated by the EPA.

        The whole “enterprise” needs to be reworked from beginning to end with the public policy requirements put front and center and rigorous methods established to obtain OBJECTIVE IMPARTIAL results.
        Currently we have the fox guarding the hen house.

      • You just put your finger on a big problem. IV&V is inherently an adversarial system. It doesn’t work in a consensus environment. To do it correctly, you have to have someone developing the requirements who’s not a member of the team. The “team” doesn’t do business like that.

      • Peter Davies –
        I cant see how this splitting of functions can work effectively in practice.

        I worked for NASA – I wrote specs – for spacecraft, for control centers, for software – and other things better not mentioned. I did not code the software. I DID do V&V – on the systems that I wrote the specs for, which then also underwent IV&V from different and separate organizations. I also did IV&V on systems for which I neither wrote the specs nor coded.

        Let’s take one example that comes to mind – UARS (yeah – the one that just splashed down in Never-never Land). I wrote several of the instrument specs (but not all – there were 10 of them), I wrote the Control Center spec, I wrote the TDRSS Interface spec – and I did IV&V on the spacecraft planning and scheduling software. Note – I did NOT do IV&V on anything for which I’d written the spec. Why? Because, as someone here said, IV&V is an adversarial process. The purpose of IV&V is to break software and kill systems, to find the limits of viability, to determine the robustness of the system. One does NOT do a thorough job of “breaking and killing” on systems that one has created.

        Note – for ALL the systems for which I wrote specs I had the terrifying experience of having to operate those systems. And live with (or correct) any problems that I’d written into the system. So I learned very early to write the specs properly and thoroughly. Most spec writers don’t have that experience – and so are not as thorough or diligent as they should be. Which is why many specs are so downright sloppy.

        Steven is exactly right – IV&V has to be a tri-partite system. If you write the spec, you cannot code or test the system properly. If you code the system, you may operate it, but you CANNOT test it properly. And if you’re to test it, you CANNOT have been inovlved in the spec writing – or the coding.

        All of this has been known for nearly 100 years – it’s one of the basic tenets of Human Engineering – and any good secretary or Admin Asst learns it on their first job. The axiom is that no matter how intelligent, smart, good, clever, whatever you are – you CANNOT find all of your own errors. A simple example is – the number of typos that you’ll see in comments on this blog. Probably including this one. :-)

      • Thank you jim

        Spec writers have all the power. they are the only ones who make mistakes. hehe.

      • Thanks Steven, Ron, David, P.E. and Jim. I understand where you are coming from now and agree that human nature being what it is, there is a clear need to have an adversarial processes in place, especially in a commercial environment.

        University and academia are not real life.

      • The customer requirements are to be found here:
        UNFCCC. 1992. United Nations Framework Convention On Climate Change. United Nations. http://unfccc.int/resource/docs/convkp/conveng.pdf

        Article 2: OBJECTIVE
        “The ultimate objective of this Convention and any related legal instruments that the Conference of the Parties may adopt is to achieve, in accordance with the relevant provisions of the Convention, stabilization of greenhouse gas concentrations in the atmosphere at a level that would prevent dangerous anthropogenic interference with the climate system. Such a level should be achieved within a time frame sufficient to allow ecosystems to adapt naturally to climate change, to ensure that food production is not threatened and to enable economic development to proceed in a sustainable manner.” (emphasis added)

      • Steven and Dixie.

        Thanks for the link to the 1992 United Nations Framework Convention On Climate Change. United Nations http://unfccc.int/resource/docs/convkp/conveng.pdf and the objective statement.

        Earlier in the document this this was stated-

        “….Recognizing that steps required to understand and address climate change will be environmentally, socially and economically most effective if they are based on relevant scientific, technical and economic considerations and continually re-evaluated in the light of
        new findings in these areas……………”

        So who and what is involved in the design review (“re-evaluated”).

    • Nick: You fail to realize that the onus is on the side of the climate orthodox to write a GCM, verify and validate it, if the rest of us are to pay vast sums of money and change our lives to mitigate the bad news from such a GCM.

      Easterbrook concedes that the “the requirements and the set of techniques needed to implement [GCMs] are unknown (and unknowable) up-front.”

      That ought to shock anyone paying attention down to their toes.

      I appreciate Easterbrooks’s honesty, but if climate science is in such a primitive state that modelers are basically hacking around for good models without being able to verify and validate them, then the rest of us can say, “Keep up the good work and get back to us when you really know what you are modeling.”

      • Huxley, the difference is that many people see that there may be a problem for them (which GCMs didn’t create) and they would actually like to know more about it. And to say, go away and don’t come back until you have written a code that jumps through a set of V&V hoops, is not helpful. Offerring to write one, or fund the wqriting of one, might be.

      • Again, there are 22 GCMs that get used by the IPCC. As you know there is a wide disparity in their accuracy. Writing yet another GCM from scratch is a huge waste of effort, however, one could reprogram the funds from the worst 11 to have them build a best of breed. So the money is there. no need for new funds. Last time I did the analysis it was quite feasible and actually pretty darn cheap. The process is straightforward, so there is no need to ask for more money or get new people. It’s simply a change of priorities.
        The new priority for this team would be.

        1. specifications up front
        2. use best of breed components from other models ( that is already being suggested by others and actually being implemented)
        3. Use NASA IV & V facility for example.

        When you look at the problem like an software manager instead of a debater and actually try to solve the problem, it turns out to be not that hard. When you try to protect your turf, then everything is hard.
        In the end since the software may impact our lives we do get to voice an opinion and offer suggestions on how to improve the process. When my car mechanic fixes my brakes, I do get to ask him to give me the old parts after he replaces them. He may reply ” fix your own damn brakes” and I would have to remind him of the regulations. When my congressman writes a bill taxing me, I do get to explain my displeasure with him. He can say “write your own damn bill” but since he needs my vote I don’t think he will respond that way. When my doctor leaves a scalpel in my gut after surgery, I do get to complain and have him fix the problem. he might respond ” fix your own damn cancer” but I don’t think so.

      • Offering to write one, or fund the writing of one, might be

        .

        The level of disingenuous-ness whenever this subject is brought up always reaches new levels, previously thought to be unattainable.

      • Nick
        See Spencer’s graph of 14 models with 10 runs each.
        Their mean of 0.8 is only 28% of the data of 2.8 at 4 months lag. The deviations run from -1.6 to +2.5. Yet that provides > 90% likelihood?
        Lets get serious.

      • huxely

        you too do not understand IV &V. it is not up to climate science to do the validation. That is not how it works. It is also not up to them to decide how good a model has to be to be of use to a policy maker.

      • I agree. However there surely has to be some step between a climate moddler presenting a model and when the policy makers use it to make decisions.

        Once the modes are being used out of pure research (i.e. for policy decisions) then surely IV+V is required.

      • Steven M: I said the “climate orthodox” was responsible for validation and verification. Ideally that would be done by an independent body, as you say. However, the responsibility for ensuring the GCMs are verified and validated lies with the climate orthodox who have the big budgets and motivation, not skeptics and certainly not those posting at Climate Etc, and that was my point to Nick S.

        I’d also hope that climate modelers would design and code with some consideration of how their work could be verified and validated, but by Easterbrook’s testimony I now realize that they are far away from that.

      • I agree. But it is up to climate science to demand a spec be produced if they really desired to take on IV&V. There is a decided lack of enthusiasm on their part for wanting to take on this process. Status quo is just fine with them.

        The only way it will be forced on them is if their model results are not taken seriously until they do the hard work. With the IPCC process this is never likely to happen. The climate modeler’s customer, the IPCC, is quite happy with the results. That is what needs to change.

        We are effectively asking them to prove their models aren’t working very well, at least that is the result I would anticipate. Of course the first step to solving a problem is admitting you have one.

    • Nick Stokes

      With respect, I’m not a V&V enthusiast. I’m a former V&V professional. I recognize in the mix of comments that some are chock full o’ nonsense about V&V, however some seem on point and ought be recognized for their value.

      V&V professionals generally have absolutely zero interest in telling people how to write their software, how to do their jobs, or what the output software under V&V means to the experts.

      IV&V is a distinct discipline that generally ends up not only boosting the credibility and reliability of the software under V&V, but also reducing iterations in development, detecting methodological weaknesses early to reduce propagation of poorer methods, reducing cost and finding efficiencies.

      Also, IV&V provides one thing for software that climatology certainly could use more of. By certifying that the software performs within the bounds of its design purpose, it clearly draws a border around the software that tells everyone what expectations are out of bounds.

      For instance, the argument that because the models don’t predict weather therefore the AGW hypothesis is wrong would have fallen apart if there had been a sufficient IV&V in place. At least, it would have had that effect on real V&V enthusiasts, some of whom are strangely susceptible to faulty logic as in the case of this spurious argument.

      From my point of view, the current and past models have nothing particularly wrong with them for the narrow purposes their authors appear to have originally intended. However, the world then and the world now are changed; it’s good to recognize the change and it would be a sign of institutional maturity for climate modellers to employ methods adequate to the uses their models will be put by interested stakeholders — which is a considerable expansion from the original parameters.

      • Bart,
        I’m a former CFD professional, and yes, I’m not an enthusyast for V&V in that context. Mainly because I don’t see codes written that way actually working (maybe I’m out of date), and I have seen at least one ambitious effort fail. The reaon is that, for the purpose, it’s a misallocation of effort.

        Now maybe someone has a purpose in mind for which it is essential. Fine. Let them implement it. And leave the current modellers to do what has worked for them.

      • Surely you mean ‘not worked’.
        =============

      • Nick, you are very much behind in the present state of IV&V for CFD software. Professional societies require demonstration of the Verification of CFD software that is the basis of a submitted paper. See the following books:

        Pat Roache, a pioneer and leading authority, has written and revised a book on the general concept of V&V for scientific and engineering software. The First Edition, Verification and Validation in Computational Science and Engineering, is available for about half the cost of the Second Edition, Fundamentals of Verification and Validation. And recently Oberkampf and Roy, also among the pioneers, have written Verification and Validation in Scientific Computing.

        All viable commercial CFD codes have been subjected to IV&V. Check the Web site of your favorite.

        Efforts to move to Certification of CFD, and software in other fields, have been started. Certification in the sense that many engineered equipment and systems are certified.

      • Dan,
        That’;s not my belief. I’ve published reasonably recently, and I’ve never been asked by a Journal to produce such a demonstration. Nor am I aware of colleagues currently publishing having that experience. Can you point to a journal stating such a requirement?

        I checked ANSYS Fluent and did not see any IV&V statement.

      • Nick,

        “leave the current modellers to do what has worked for them.”

        We know that the status quo is working for the current modellers. They get funded, their output is being used by politicians to demand massive changes in society, and they are loving it.

        We’re concerned about how what they are doing has NOT worked for us — the people who bear the burden of their mistakes.

      • It’s worked for the modelers alright; it’s worked to get them into a Hell of a mess.
        =============

      • Three years ago @ DotEarth I said that the modelers were working to keep their toys on circular tracks on the ceiling, but I was just guessing then.
        ==========

      • Not only are CFD codes put through IV& V the codes can be accredited or certified for particular uses.

        First product I checked

        “Many applications require the use of software products that have traceable and well-established quality assurance processes. Because of roots in the nuclear industry, ANSYS has a deep and longstanding commitment to quality. The organization was the first engineering simulation software to become ISO 9001 certified. For over 40 years, the company has supported and invested in processes encompassing all aspects of software development including developing requirements, writing software, testing, documentation and more. The most recent evidence of this is that ANSYS fluid dynamics (and structural mechanics) software meets the requirements of the international Quality System Standard ISO 9001:2009 and the applicable requirements of the United States Nuclear Regulatory Commission (NRC) as adopted by American Society of Mechanical Engineers (ASME) and Nuclear Quality Assurance (NQA) NQA-1 standards. Because of the commitment to quality and related quality assurance services, customers use the ANSYS fluid dynamics suite with full confidence.”

      • I don’t hear them saying IV&V was performed on the code. ANSYS didn’t write Fluent. And the people who did sure weren’t writing to specs.

        I’ll vote for ISO 14064 certification for GCM’s.

      • steven mosher

        Nick. You misunderstand the process of IV % V especially when it comes to legacy code. Legacy code such as Fluent may not have been written with Specs. Such is often the case with research code. But to bring that code base into compliance with standards the whole documentation package is created. You might want to examine the proceedures required to get ISO certs. Its those principle we espouse.

        ISO 140064 is not even applicable to GCMs. Unless you paid the money and downloaded the standard, I’ll suggest some arm waving going on.

    • Nick,
      That argument is totally lame.
      What climatologists should hve done years and years ago is to dip into some the largesse poured out on their projects and make coalitions with good V&V and stats people to provide hard nosed reviews of their work.
      Instead, we are where we are, with the AGW community fighting a strange rear guard fight to avoid accountability at nearly every turn.
      It will not succeed, long term.
      Why not embrace it- do the disclosures expected of any group demanding huge sums of money and big changes in policy?

      • …and here Hunter makes the one salient point. Discussion of the details, methods, and merits of V&V is noise.

        When teams were put in to place to create GCMs more attention should have paid to how their quality would be measured. [period]

      • Eric,
        Thanks.
        I am glad at least you get it.
        Unless and until Cliamrte Science can make its major tools pass V&V, it is worthless for policy making. Frankly Climate Science should be reviewed with any eye towards justificatoin of the levels of funding it receives until the question can be resolved.
        We need V&V of the programs, full disclosure of the agendas and relationships between the NGO’s, CS opinion makers, government agencies and fudning sources.
        We need the data reviewed carefully.
        And, at long last, we need an actual critical review of climategate and the obvious whitewashes regarding climategate.
        until these issues are resolved, Climate Science is not playing a role very different from that of Harold Hill in The Music Man.

      • Hunter,
        You don’t know the history. The best known GCMs were developed in the 70’s and 80’s. There was no largesse, no IV&V, no big teams.

        They did it in the regular scientific way, with validation by replication. The programs worked. They agreed with each other, mostly, and the disagreements could be traced to different scientific models. They weren’t all making the same computing errors.

        So what to do? Say, now we believe in IV&V. Chuck all that and start again?

        I’m all in favour of someone trying it. In the mean time, those who actually want to know the answers will stick with what currently works.

      • Nick,
        If it was working, I would be all for it.
        But as Pielke, Sr. and others point out, the models have zero predictive power for anything that matters in the real world.
        And since those who wrote them and have built lucrative careers off of them are obviously playing games and making wild claims using those models, I seriously doubt that your appeal to authority to let those wise ones who actually work stick with it is valid.

      • “They agreed with each other, mostly, and the disagreements could be traced to different scientific models. They weren’t all making the same computing errors.”

        The models could agree and still be in error if they all made the same mistake in their assumptions.

        For example, 100 modellers could make 100 models of the solar system, all using the same wrong formula for gravity.

        The models would all agree, they might all appear to work for a period of time, but they would still be wrong. Over time they would have no predictive skill.

        Like a watch than runs slow or fast. It looks OK for awhile, but over time it is useless. Worse, it is misleading. If you use it for deciding policy, it can be disastrous. For example, when trying to catch a plane or attend a critical business meeting, or meeting your wife for dinner in a restaurant.

        The same for climate models. They could all appear to be right when written because they assumed temperatures were going to keep going up with CO2, even accelerating as CO2 levels accelerated.

        However, when temperatures leveled off even though CO2 kept going up, this pointed to the possibility that all the models had a common assumption that was in error. That their skill was not due to skill at all, rather due to co-incidence. A stopped watch is exactly right twice a day. That doesn’t mean it has skill.

      • IV&V won’tt est the science of the GCM’s. That’s done in the published literature.

  24. Recent advances in technology allow the meaning of the Eastbrook code to become apparent. Translated from the original climatologish the English version is:

    ‘We are all having far too much fun playing with the big big computer and writing code to worry about trivial stuff like documentation and verification and other long words we don’t understand’

    which – by some quirk of history – correpsonds almost exactly to the plaintive cry of the amateur and incompetent programmer since Turing and Colossus.

    Still further detailed examination reveals the subtext. Work still continues, but an early attempt at translation is:

    ‘B….r off you grubby little professional engineering and programming oiks. We are the Masters now and unless you have a PhD in Radiative Physics you worthless scum are not qualified to have an opinion about any aspect of our Great Works. You just keep on paying the money and we keep on playing in our nice sandpit. Capiche?’

    Other workers claim that they can detect the words ‘Trust us – we are Climate Sceintists’ hidden there also, but this is so evidently a contradiction in terms that it seems unlikely to this observer that anyone with a modicum of ‘nous’ would have included such a preposterous statement.

    With grateful thanks to Dan Brown and J-F Champollion

  25. Easterbrooke is right in one sense, a programme written without a spec is impossible to validate. When it’s in use , it becomes the spec, even if its not much use.
    In the commercial field, if we get it wrong, we lose money or our jobs. In climate science there seems to be no penalty or comeback. If Easterbrooke agreed to take 1000 volts through the temples should one of his models predictions fail, then I might start to buy his agile position

    EO

  26. Not sure why people are getting angry with Easterbrook. Seems to me he has done us all a favour by demonstrating that the GCMs are not and cannot be fit for any purpose involving the determination of public policy.

    • Where are the verified and validated codes that tell us that it is safe to double the amount of CO2 in the air? Something a policy-maker should ask.

      • cart – horse.

      • OMG!

      • Biosphere brimming with life, temperate zone expanded considerably, sea level up a foot, climate change occurring gradually enough for adaptation even during the 21st Century’s grand solar minimum, the earth supporting 11 billion human souls, temperature up a degree. Your worst nightmare, Nick. Code and V&V available when I get around to it.
        ==================

      • There it is in a nutshell.
        Most people want laws that tell them what not to do. The Green Jihadis want laws that tell people what they CAN do

      • No, such a law would say, don’t double (and redouble) the amount of CO2 in the air until someone has shown (with V&V code) that it is safe.Nothing unusual about such laws – they are rather common.

      • Nick

        The suggestion is absurd until there is reliable evidence that there is harm.
        Perhaps you would limit breathing.

        Perhaps you could identify what you fear will happen in a warmer world—that is not based upon the results of a model not shown to be able to accurately predict future temperature or rainfall

      • Nick, you first have to show that human industrial activity can increase the amount of CO2 in the air. It seems that it can’t, at least not significantly.

      • What do you think of a law that would say, don’t double (and redouble) the amount of H2O in the air. I think you would call that law stupid.

      • Do I have to explain that SO2 is completely different? First, it’s a (local) pollutant, unlike CO2/H2O. Second, it’s easily controlled/reduced, unlike CO2/H2O. CO2 concentration in industrial exhaust flue gases is ~ 10 – 15 %. How do you reduce that? SO2 on the other hand is in ppm and I repeat EASILY controlled, either by using low sulfur fuels or by emission control equipment (or both).

      • Nick,
        You demonstrate the problem of the believer rather well.

      • maybe you dont understand the process. So let me explain.

        It starts with a specification. The spec writer has all the power. You’ve asked for a validated model, so I will take the role of the spec writer. That gives me the power to write any spec I damn well please. Here is a spec.

        1. The software shall return a value of safe/not safe.
        2. There shall be one input: C02 concentration in parts per million.
        3. The software shall use a logrithmic response curve for the temperature response to C02.
        4. Temperatures less than 20C shall be returned as safe.

        Now, the developer writes to that spec.

        The test guy checks that the software meets the spec.

        So, its a simple matter to write this software and to VALIDATE that this software meets its spec. IV &V guys dont get to argue with the spec.
        the developer can only argue with the spec if it is logically incoherent or physically impossible to meet. The developer cant say ” but 19C is unsafe!”. Well he can, but he is ignored.

        Validation does not mean that the spec is complete, or correct, or useful. Validation means that it does what you agreed it would do.
        Building a validated model to say that doubling C02 is safe is easy. The trick, you should be able to see is in the SPEC. the spec defines what it MEANS to be safe. That is why the real fight is over the spec and always over the spec.

        You don’t want to turn the spec writing over to a skeptic for this problem.

        Now, you may not like my spec above. tough. you have no voice in the matter. you gave me the job of writing the spec. Jim D can now in a couple lines of code write a model that meets that spec. Dan hughes can then validate that the code meets spec. And there you have a validated model. easy. Do you see how the real problem is actually buried in the spec?

        The reason why people wrongly criticize GCMs is that they have a spec in their head: “shall be perfect”.. and the climate scientist has a spec in his head “shall be what I just finished writing” what’s missing, what IV and & V would foreground is not the testing process. its not about the testing. That part is utterly trivial. It would foreground and raise to the surface the following question:

        what information do policy makers need? when do they need it? and how much uncertainty can they live with.

        That is, it would raise the question of the spec.

        .

      • Nick, this is the exact same absurd argument that Kevin “reverse the null” Trenberth made:

        My paraphrase:
        “Until you can prove that my prediction of the future can’t/won’t occur, you must agree to my strategy for avoiding it.

        Sorry – your prediction, your burden of proof.

      • Nick and Terry,

        I came across a graph showing temperature increase by fossil fuel usage that was noted at Climate Progress- “Dr. Hansen’s 1981 Projections Compared to Observations”- graph showing us how bad things will be if we don’t stop using fossil fuels. http://images.sodahead.com/profiles/0/0/2/1/6/9/7/0/5/1981cfobs-46485264070.jpeg

        I came across the graph in a post by Icarus at 11;26 am- http://thinkprogress.org/romm/2011/09/10/316193/if-you-could-ask-a-climate-science-one-question/

      • John Carpenter

        Hmmm.. I can hear the policymakers now…

        “What are all these damn people doing around here and what are we going to do with them all!”

      • “Where are the verified and validated codes that tell us that it is safe to double the amount of CO2 in the air? Something a policy-maker should ask.”

        They are in the Paleo records. This history of the most accurate model we have for the earth. The earth itself. You can burn all the known an suspected fossil fuel on earth and average temperatures will not go above 22C, and not for many centuries at that.

        Unprotected human beings cannot survive temperatures below 28C for any period of time, they die of exposure. So even if the earth does warm to its historical maximum, human beings will still find most of the planet too cold to survive without fire. Without fire none of us would be here. Humans would be extinct, or scrapping out a meager existence at the edge of the tropical jungles. Even the food we eat requires fire.

      • Unprotected human beings cannot survive temperatures below 28C for any period of time, they die of exposure.

        Do you run anything you write through a sanity check?

      • Mistyping C for F is hardly a gauge of sanity. Reload.
        ==========================

      • Mistyping C for F is hardly a gauge of sanity. Reload.

        Do you even know what a sanity check is?

        1. The act of checking a piece of code (or anything else, e.g., a Usenet posting) for completely stupid mistakes.
        ….
        3. Conversationally, saying “sanity check” means you are requesting a check of your assumptions. “Wait a minute, sanity check, are we talking about the same Kevin here?”

        I think this is very relevant to the topic at hand, which apparently is about V&V of assertions and theories. Take a look at the little sanity check I applied to the Challenger shuttle data upthread. Oh, there are two boosters?

      • Ah, jargon, thanks. Shoulda known.
        =========

      • I’ll sacrifice to Hygeia in penance.
        ===========

  27. Agile software development is a poor analogy for climate modelling.

    If you simply tweak a model until its results look good, you will almost always get good results and they will almost always be wrong.

    Computational models are built on some mathematical foundation. We have known (or theorised) inputs and known laws of physics and we use them to simulate reality. If the results are poor, it probably means that we have missed some input(s) and/or we have an error in our physics. Instead of tweaking the model, we should return to first principles and look for problems with our physics. If you tweak an algorithm, it must be based on sound physical reasons, not “try this and see if it works”.

    Agile software development is used where the nature of the problem is poorly understood. Not the answer, but the question is poorly defined. This is not, I believe, the case with climate modelling. We know the questions and need to justify answers from a theoretical basis.

  28. Tomas Milanovic

    To comment on the “do it yourself” line, indeed the primary importance is on specs.
    Some 6 years ago, I have written just for fun a set of specs which I would demand from a model, in order to accept it as a useful tool for planification purposes.

    Let Ti be time period for which we have data for a parameter I. Ti is not constant but varies with the parameter.
    Any specified parameter is of the form I(x,y,z,t) where x,y,z, are spatial coordinates and t time.
    For each parameter will be computed gliding 5 and 10 years averages IT.

    The metrics to be used to evaluate the distance of the predicted parameter field ITp(x,y,z,t) to the observed parameter field ITo(x,y,z,t) is :
    DI(t) = ∫ (ITp-ITo)².dV
    The distance between the fields may be numerically evaluated on the smallest grid consistent with the data density but not larger than 500x500x1 km for the atmosphere and 200x200x0.1 km for the oceans.
    Data reconstructions (interpolation) are allowed but subject to Specification 2.

    The specification for each of the following parameters and for 99% of their gliding averages (both 5 years and 10 years) is that DI(t)/AVI(t) < 5% where DI(t) is the already defined distance and AVI(t) the global average of the field.
    – velocity field
    – pressure field
    – temperature field
    – cloudiness fraction field
    – ice density field
    – relative humidity field
    – precipitation field

    If the proposed model doesn't meet the specification, return to the labs and come back only then when you have one that does. I wonder how many of the 22 would pass.

    2 remarks
    – Even if a model met the specification, it would not be a guarantee for skilled forecasts over time scales significantly exceeding the validation period. But at least it would have been reasonably validated by the data
    – The spec contains implicitely the requirement that the model be able to accurately reproduce ENSO and other oceanic oscillations .

    • . . . but not larger than 500x500x1 km for the atmosphere and 200x200x0.1 km for the oceans

      .

      Thomas, how do these compare with the resolutions presently in use.

      Thanks.

      • Tomas Milanovic

        Dan this is in the ballpark of what they can do today. It obviously depends on the model and I don’t know what each of the 22 is using.
        Clearly going down a factor of 10 for each dimension couldn’t be done, that’s why I was being rather friendly with my specs.

        Btw you have surely noticed that with this kind of resolution we are very far from the “the models are solving equations of natural laws” (in this case Navier Stokes) paradigm.

  29. The critical issue is the sign and magnitude of water vapor feedback.
    IPCC models show strong positive water vapor feedback of about 3. However a growing number of researchers are reporting negative water vapor feedback. E.g. See:
    Climate forcing and response to idealized changes in surface latent heat and sensible heat George A. Ban-Weiss, Govindasamy Bala, Long Cao, Julia Pongratz, Ken Caldeira, Environ. Res. Lett. 6 (2011) 034032 (8pp)

    We find that globally adding a uniform 1 W m−2 source of latent heat flux along with a uniform 1 W m−2 sink of sensible heat leads to a decrease in global mean surface air temperature of 0.54 +/- 0.04 K.

    See discussion on Water vapor feedback: evaporation

    Model validation would go along way if all global climate models were applied to reproduce or correct the results of Ban-Weis et al.
    1) Extract all 100+ input parameters so they are accessible in each model.
    2) Use the same input data
    3) Use same input parameters.
    4) Input the +1 W/m2 latent heat and – 1 W/m2 sensible heat “forcing” applied by Ban-Weiss et al.
    5) Run each model for 10 runs to get sufficient statistics (per Singer & Monckton 2011 in press noted by S. Fred Singer SEPP Science Editorial #2011-1
    6) Report and compare all 10 runs for all models.
    7) Examine and identify the cause of all discrepancies.

    We need to come validate/clarify this critically important water feedback issue before spending billions of dollars on “runs” where we have little understanding or confidence in the outcome.

    • David

      imo-you are not seeing the big picture. If models are developed differently, you can not readily compare the outputs of the models to find where they differ

      • Rob
        In computational fluid dynamics (CFD) modeling, engineers assume that given the same physics (equations) with the input parameters and the same starting conditions, the models should produce the same output if run for the same length of time with sufficient runs to give statistical significance.

        What I hear you you saying is that
        1) The science is so poorly known that we cannot even agree on the same physics, or even input the same parameters and data for an agreed on set of physics.
        And/OR
        2) The implementation and/or V&V is so bad that models cannot be quantitatively compared to each other.

        While gravity is reasonably well known, my understanding is that clouds are so poorly known that we do not yet know the physics sufficient to determine the sign let alone the magnitude of the feedback. e.g.

        See Nigel Fox
        Accurate radiometry from space: An essential tool for climate studies Dr Nigel Fox 25 Jan 2011
        http://www.bipm.org/utils/common/pdf/RoySoc/Nigel_Fox.pdf

        With the video of the talk:Seeking the TRUTHS about climate change
        http://www.youtube.com/watch?v=BalCag7fQdE&feature=player_detailpage

        Note particularly Slide 13 of 55
        Feedback factor uncertainty
        Total uncertainty 0.26
        Cloud uncertainty 0.24
        Ie cloud uncertainty is 93% of total.

        We at least need to get models to reliably reproduce each other given the same assumptions on physics.
        Then we need to get ALL the physics sufficient to reduce the massive uncertainty on both sign and magnitude of the feed back, especially water evaporation and clouds.

        Any claims to the contrary will be stamped “NOT PROVEN” without quantitative evidence.

      • David

        Eternal’s comment highlights some of the relavant points. Looked at another simple way. Consider four automobile engines. All are supposed to produce 300 HP.
        One is an internal combustion V-8
        One is a rotary engine
        One is an all electric

      • My comment got cut off somehow- my error I’d guess, but now I have to catch a plane

      • Thats right. Some areas will be abstracted, often for very good reasons, sometimes mistakenly. Other areas are dealt with in atomic detail. Another model might combine areas and THEN abstract them.
        Another model might have an atomic level model as its input as an abstract.
        The chances of comparing the models must be slim. Testing their predictive power is the only useful method imo

      • We still have input boundary conditions of geography, atmospheric mass & composition, heat capacities, solar radiance, etc.
        On clouds, one model might say 30% clouds, one three types of clouds with parameters abc, and the third 9 types of clouds with parameters c1 to c20.
        Then test against common data and Ban-Weiss et al. 1 W/m2 latent and sensible heat.
        If we get 2 deg warming, 0.2 deg cooling and 0.5 deg cooling, then we are making progress in identifying errors.
        Now use model 3 for all three programs.
        If we get 0.45, 0.50 and 0.53 cooling we have some basis for advancing our understanding. Otherwise we are just tilting at windmills.

    • David you are deriving a spec that may be totally useless to a policy maker.
      next you are defining test cases. You need to decide whether you are a spec writer or a test case writer. you don’t get to do both.

      • Steven
        Ah well. Wishful thinking out of frustration.
        Thanks for the clarity.
        What do you propose to “drain the swamp” and deal with the “alligators” as we grope through the current London fog aka “climate change”.

      • hehe.

        well I can only point out the error because I spent a bunch of time trying to do the same thing and then I realized that I was violating every principle I was promoting. so that hit the bit bucket

        But lets get down and dirty. Lets say I’m in state government in california.
        As a policy maker what do I need to know about the future climate and how well do I need to know it. I may want a model that predicts

        sea level in 30 years
        average precipitation for the next 30
        changes in growing seasons.

        for the state of california. I could care less if it get the Amazon correct.
        And I dont need to know that information to super fine detail to take the kind of local decisions I should be taking.

      • You mean that the fact that theSahara Desert Was Once Lush and Populated (after being transformed from desert to paradise), might be of concern to California? And that there could be a major increase in agricultural productivity by increasing CO2? But that might reverse that much faster growth in Texas than California?

      • Steve,

        I was at a California Energy Commission (CEC) Advanced Generation meeting a couple years back and I got my first exposure to the variability in the output of the climate models for the expected temperatures in CA. The expected rainfall models were discussed but not presented. The output of these models interest me the most personally- as my water is provided by a 260 foot deep well- with a flow rate (as measured a few times (n=4 that I have data for) over the last 50 years) of 20 to 35 gallons per minute depending on the time of the year AND the snowpack and it’s runoff characteristics. The shallow well (about 60 feet deep) put in place around 1880 right next to our old farm house still has water in it by the way- unfortunately I don’t have any records/data of the actual flow rates for this well. As California is larger then many countries with very distinct climate zones (coastal, semi desert, desert, mountains, etc) I was concerned that an average expected value for the state as a whole would not be sufficient for me advise my elected officials (for my specific climate zone) what actions we should take that would be EFFECTIVE in minimizing the risks of climate change- as the error bars on the different models output was so large that I couldn’t say if we were better off planning for more rain/snow, less rain/snow, more heating/cooling degree days, etc. For me I would also like to know if my minimum temperature in the winter is expected to the same as the historical range- I have over 70 years of reliable data about 200 yards from my place- or not. I will have to modify my water infrastructure if the minimum temperatures (and the duration of the time below 27F or so) are expected to decrease.

        I was aware that there was some disagreement between the output of the climate models- I was not aware of the magnitude of the disagreement. Being a process and product development guy I found that the models were not able to help me plan very well. What I formally recommend at the CEC meeting was that funds be provided to improve the resolution of the models for the DISTINCT climate zones in CA. I would highly recommend that we don’t ask the politicians what they want the models to predict- the design requirements- unless we have a list of outputs that we have ALREADY discussed with the infrastructure experts/organizations- water, power, transportation, agriculture, etc., on what THEY would like to know.

  30. Which standard applies: engineering or science?

    The American Engineers’ Council for Professional Development (ECPD, the predecessor of ABET)[1] has defined “engineering” as:

    [T]he creative application of scientific principles to design or develop structures, machines, apparatus, or manufacturing processes, or works utilizing them singly or in combination; or to construct or operate the same with full cognizance of their design; or to forecast their behavior under specific operating conditions; all as respects an intended function, economics of operation and safety to life and property.

    http://en.wikipedia.org/wiki/Engineering

    Two issues decide whether something belongs in the domain of engineering or science is public safety. One issue is public safety. The other issue is design. Clearly, the structural design of a bridge belongs to engineering. Clearly, scientific investigation of interest only to other scientists belongs to science.

    Once you say, “The science is settled” and begin to issue policy advice of grave consequence, you have IMHO strayed into engineering. In that light, it is illegal in most jurisdictions for anyone but an engineer to design and implement climate models whose output will be used by anyone other than scientists.

    Climate models done by scientists should come with a disclaimer:

    This work is for academic interest only. It is highly speculative. No one should rely on this model to provide accurate, reliable predictions.

    Every time someone trots out the results of a climate model, the above disclaimer should be prominently displayed.

    • Add: “no policies should be based on these results until they have been independently verified and validated”.

      • Well, now you are trying to tell policy makers what evidence they can consider. Policy makers may well decide that they will act on the best available information they have even if that information is not what they would wish.

        As a policy maker I would gladly make policy based on the simple model of the log response to C02.

      • Steven
        I heartily agree on the log response to CO2
        .
        However we have to keep all the lawyers well fed!
        Besides:

        It’s not just some lawyer’s disclaimer that you can ignore,”Past performance is no indication of future returns” is a fundamental principle of investment. In fact, good past performance is more often associated with poor future returns, for reasons that go to the very heart of the theory of investment.

        Correspondingly in “climate science’, when a climate model shows “exponential acceleration” – my hunch is that the “acceleration” is but the rising portion of an oscillating natural cycle and that de-acceleration will shortly begin. e.g.
        Is Sea-Level Rise Accelerating? with the red rapidly rising sea level graph..
        Contrast David Stockwell’s finding of deaccelerating sea level compared to previous IPCC hype of accelerating sea level.

        So this warning is to wake up politicians just as SEC required warnings are to wake up investors on potential risks of declining trends. A better caution would probably be:
        These results have not been validated. Past trends are not a reliable indicator of future trends

        How would you state such a warning?

    • Once you say, “The science is settled” and begin to issue policy advice of grave consequence, you have IMHO strayed into engineering.

      The claims about scientists saying “The science is settled” are useful as a meme for advancing partisan goals – but it isn’t terribly useful for advancing reasoned debate. First, contrary to the ubiquitous claims of “skeptics,” if that expression has been used by climate scientists with reference to AGW, it has been by a tiny minority. Second, It might be argued that something essentially very similar has been said by a larger number of”pro-AGW climate scientists,” but when weighed against the IPCC statement that GW is 90% likely to be more than 50% A, we see that in fact, there is broad-scale agreement on the “pro-AGW side” that the science is not “settled.”

      Every time someone trots out the results of a climate model, the above disclaimer should be prominently displayed.

      If you look through threads at places like Climate Etc., and WUWT, you will see many, many comments criticizing qualified statements of probability in the conclusions of research that examine the likelihood of AGW. The use of qualifiers such as “may” or “might,” or routinely criticized as being evidence that research findings should be disregarded (because, the argument seems to go, if we aren’t absolutely certain about the impact of AGW then we shouldn’t look towards implementing mitigation policies).

      Now I’ve never seen you make such a criticism of qualifications of probability in “pro-AGW” research – but surely you have seen the types of comments I described above?

      Criticism of “prog-AGW” research as being either too uncertain or not sufficiently sensitive to uncertainties seem valid; all science should be subjected to scrutiny along those lines of inquiry. But a “skeptic” misrepresenting the extent to which “pro-AGW” science does recognize uncertainty suggests an agenda-driven perspective, particularly if it isn’t accompanied with criticism against those in the “skeptical” community who point to appropriate qualification of probability as being a reason to dismiss research findings.

      • A friend told me that if I glued feathers to your arms, Joshua, you could fly.
        ===============

      • kim,
        What does a flying troll look like?
        ;^)

      • Joshua –
        FAIL.

        One of the criticisms – by skeptics – of the recent cloud papers related to uncertainty. But then your own bias would have precluded your seeing those criticisms.

      • “If you look through threads at places like Climate Etc., and WUWT, you will see many, many comments criticizing qualified statements of probability in the conclusions of research that examine the likelihood of AGW. The use of qualifiers such as “may” or “might,” or routinely criticized as being evidence that research findings should be disregarded (because, the argument seems to go, if we aren’t absolutely certain about the impact of AGW then we shouldn’t look towards implementing mitigation policies).”

        Let’s test this;

        Go to WUWT and find many many comments that routinely express this view. What you may find is one or two individuals who make this argument. Do better research Joshua and get back to us.

      • Joshua:

        ‘But a “skeptic” misrepresenting the extent to which “pro-AGW” science does recognize uncertainty suggests an agenda-driven perspective, particularly if it isn’t accompanied with criticism against those in the “skeptical” community who point to appropriate qualification of probability as being a reason to dismiss research findings.”

        lets find an example of this to discuss. You seem to have this weird notion of what constitutes an agenda driven perspective and an even stranger untested notion of how you detect it.

        You also need to distinguish between the IPCC presentation of uncertainty and the actual science. Ignore all the BS that skeptics spout. Most of them have never even looked GCM code or output.

        Start here. Some interesting insight on climategate and climate models uncertainty policy and politics. watch the whole thing and learn something

        http://www.newton.ac.uk/programmes/CLP/seminars/120617001.html

  31. –> “As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work…”

    A good question to ask at that point is, when and why during the process did climatology transform itself from basic research to a government agency dedicated to facilitating social engineering?

  32. When scientists forsake honor, exposing a hidden agenda is very difficult unless you discover–e.g., a “READ ME HARRY” file.

    “In an age of spreading pseudoscience and anti-rationalism, it behooves those of us who believe in the good of science and engineering to be above reproach whenever possible. Public confidence is further eroded with every error we make. Although many of society’s problems can be solved with a simple change of values, major issues such as radioactive waste disposal and environmental modeling require technological solutions that necessarily involve computational physics. As Robert Laughlin noted in this magazine, ‘there is a serious danger of this power being misused, either by accident or through deliberate deception.’ Our intellectual and moral traditions will be served well by conscientious attention to verification of codes, verification of calculations, and validation, including the attention given to building new codes or modifying existing codes with specific features that enable these activities.”

    ~ Roache, P. J., “Building PDE Codes to be Verifiable and Validatable,”Computing in Science and Engineering, Vol. 6, No. 5, September 2004.

  33. Side observation: note the inverse correlation between a) whether one believes that models should be validated, and b) whether one is in favor of policy action now.

    Discuss.

  34. Richard Saumarez

    When I was a young sprog, I was taught a very simple rule:

    “A trivial computer program is one without mistakes!”

    Surely we have a very fine verification process already. There are umpteen computer models of climate, which have been “rigorously validated” by hind-casting, yet they appear to give remarkably different outputs. This is a validation process that appears to have failed. However, since their predictions cannot be tested in the short-term, they are not verifiable.

    Another poster mentioned medical devices, my field. A medical device has to do something and produce a result that can be verified, while climate models merely produce an intangible result that is very difficult to verify. Having made devices that record from, and stimulate the (human) heart in real time, I am aware of the difficulties in defining a model that is safe, a hopefully foolproof algorithm and demonstrating that the system performs safely, a problem well known to medical device manufacturers, aerospace engineers and anyone else making safety-critical digital systems. The verification of these devices is extremely time-consuming, incredibly boring and requires meticulous attention to detail.

    In view of the far reaching consequences of climate model predictions, I think that their authors should conform to the same regulatory standards.

    A starting point might be a generic climate model that is potentially parameterisable to incorporate features that might be important in the future so that it can be modified. Given a specification of how the model is to constructed, with attention to the structure of the program, two independent groups should produce versions of the same model. Agreement in benchmarks could then be tested. Agreement in predictions, using the same, standardised, inputs could then be tested and at least one might have confidence that the model works numerically.

    This approach is analogy of the “double-fault” system widely used in medical devices so that two independent computational process have to adjudicated and agree before an action can be taken. (Or more complex variations in flight-control systems etc.)

    If the code were open-source, various different modelling assumptons could be explored by a greater number of users, preferably with two groups addressing the same problem.

  35. Steve Easterbrook’s statement (regarding V&V) that
    “It turns out almost none of it seems to apply, at least in any obvious way” is particularly troubling. This seems to be another example of the idea that normal scientific procedures somehow don’t apply in the climate field.

    Regarding blog censorship, there is another recent example at Zone 5. A guest post and all the comments were deleted.

    • I wonder if the Warmers know that there are millions and millions of unconvinced skeptics out there who don’t care to read climate blogs at all. I don’t know what purpose the censorship is for other than maybe intimidating the faithful and keeping them “on message”? Otherwise, it just makes Warmers look like closed-minded fanatics to people who actually read their stuff. That, of course, it what they actually are anyway, so I guess it makes sense that way.

      Andrew

  36. Richard Saumarez

    PS: I have just looked at the seminar by P Williams, referred to by Mosher. It is a valuable cautionary tale and shows that the “science” of numerical analysis is less settled than it might appear.

    I should have said, in my previous post, that the numerics need to be scrutinised, with different iteration schemes and tested against each other.

    I was involved in modelling cardiac impulse propagation, which is dependent on highly non-linear, impulsive processes. I couldn’t understand how a colleague got results which caused substantially different physiological conclusions, although we both thought we were modelling the same thing. It turned out that he was using Euler’s method to integrate the ODEs in the model while I was using 4th order Runge-Kutta! This made me reflect on whether any integration scheme can be absolutely trusted in complex systems when there is a no simple, analytical result as a benchmark. Having talked to professional numericists, while they agrreed that you can analyse the errors to some extent, they recommended that I ran the simulation using increasingly shorter time-steps and see if the solution converged as the time step was decreased.

    • Richard Saumarez: It turned out that he was using Euler’s method to integrate the ODEs in the model while I was using 4th order Runge-Kutta! This made me reflect on whether any integration scheme can be absolutely trusted in complex systems when there is a no simple, analytical result as a benchmark.

      Welcome to the club.

    • Yes,

      It amazes me how few people will actually sit though an actual presentation about issues with a GCM, so thank you for watching.

      There are better arguments about the limitations of a GCM WITHIN the science than there are from outside the science

  37. “If climate models were built only for science qua science, to support the academic research programs of the people building the models, then Steve Easterbrook would have a valid point…Further, these climate models are used in the IPCC assessment process, which reports to the UNFCCC, and in national-level policy making as well.”

    I don’t see how the purpose of GCMs can determine whether or not they are suitable for verification and validation. Either they are or they aren’t.

    The fact seems to be that skeptics and CAGW alike are fairly certain that the models cannot be pass V & V. So not surprisingly the CAGWers claim it is not necessary while the skeptics claim it is essential.

    The interesting part of this debate for me is those poor hopeful souls in the muddled middle, the luke warmers. They as usual want to have things both ways. There is too much uncertainty in the models, because the climate is too complex and chaotic to adequately predict future outcomes (so no CAGW). But the issues are too important, so we shouldn’t give up hope that they can ultimately pass V & V (so AGW will still allow us to engage in some central planning, just not as much as the CAGWers want).

    As a lay skeptic I say we just accept Easterbrook’s apparent own goal, and agree with his suggestion that GCMs may not ever be able to pass V & V. It is irrelevant in the political debate whether this is because GCMs are such that proper V & V cannot be established, or because once an appropriate V & V process were established, they would fail. The political point is that the consensus is beginning to admit that it is urging drastic economic and political change based on computer models that cannot be validated or verified. (Rather than claiming V & V is simply not necessary.)

    Easterbrook doesn’t quite admit that GCMs will never be able to pass V & V. He writes “So does this really mean, as Valdivino suggests, that we can’t apply any of our toolbox of formal verification methods? I think attempting to answer this would make a great research project.” But he has stepped on the slippery slope of admitting that it may not be possible to ever verify or validate the models.

    I suspect skeptics like Willis Eschenbach would love nothing more than to force the modelers to divert some of their budget toward developing V & V processes that the models would then fail. There would be great satisfaction in forcing the consensus to forge the very sword on which they would then be required to fall. But in what alternative universe is that ever going to happen?

    It seems to me that GCMs are the heart and soul of the CAGW political movement. Easterbrook’s flirting with a concession that they perhaps cannot be validated or verified because of the nature of the models is an attempt to spin the issue. But I think it is just the latest shot fired by the gang that couldn’t shoot straight, and is headed directly for their own collective foot.

    • I would argue that any model is suitable for V&V, suitably configured for the particular model and its applications. The question is whether V&V is needed or not, and I am arguing that public accountability provides the need.

    • The issue is not pass or fail. Never has been.

      The issue is that in all other activities in which computer software provides information relative to public-policy decisions, that software has always been subjected to Independent V&V, is maintained under an approved SQA process, in many cases users have been required to have shown qualifications for application of the software, and the application procedure itself has been Independently judged and approved.

      As the issue has been discussed it has come to the light that, it seems that the Climate Science Community is the only arena in all of science and engineering the software of which either (1) is impossible for IV&V processes to be applied, or (2) IV&V processes are claimed to not be necessary.

      Many who have first-hand experience and expertise in scientific and engineering software find those positions to be very very strange: singularly unique in fact. Additionally, when we attempt to discuss these issues with members of The Climate Science Community, citing peer-reviewed reports and papers, some how the focus can never be brought to the issues. It seems to just never work out.

      • Speaking of first hand experience, any one questioning the necessity for IV&V should read: Data horribilia: the HARRY_READ_ME.txt file
        This “climate science” culture is diametrically opposite professional software development with full IV&V.

        I affirm Judith’s support for IV&V as essential because of the massive public implications.

        Perhaps we need to insist on NO funding until a formal rigorous collective approach with IV&V is developed. Funding 22 lousy programs with unknown performance is a waste. Better to put all funds into 3 major programs with one third of funds put into proper documentation, planning, and IV&V.

    • I don’t know enough about modelling or V&V to comment technically, but if these models are going to be held up as pointers to the policy response then it is essential that there is confidence in what they are saying. And given the hghly politicized nature of the debate, assertions of quality by those using the models is not enough.

    • GaryM: As a lay skeptic I say we just accept Easterbrook’s apparent own goal, and agree with his suggestion that GCMs may not ever be able to pass V & V. It is irrelevant in the political debate whether this is because GCMs are such that proper V & V cannot be established, or because once an appropriate V & V process were established, they would fail. The political point is that the consensus is beginning to admit that it is urging drastic economic and political change based on computer models that cannot be validated or verified. (Rather than claiming V & V is simply not necessary.)

      A related possibility is to match the current GCM forecasts to the actually obtained climate 2 – 3 decades from now, and let the public decide post hoc whether the forecasts are accurate enough to depend on for the subsequent 2-3 decades. Paraphrasing your penultimate sentence (one doesn’t often have the opportunity to write “penultimate” so the opportunity should not go to waste), it looks like that is what the American polity has decided in effect to do. On the record of the last few decades, the longer rigorous V&V on the GCM models is postponed, the less political impact those models will have. That’s my political guess.

    • GaryM: I don’t see how the purpose of GCMs can determine whether or not they are suitable for verification and validation. Either they are or they aren’t.

      The purpose of software IV&V is to independently verify and validate that the program is qualified for its intended use. For example, I worked at a nuclear facility where we used two versions of the same electrical design program. Both versions had almost exactly the same code! But the “nuclear” version had undergone more intense IV&V that qualified it for safety critical usage, whereas the non-nuclear version had not.

      GaryM: The fact seems to be that skeptics and CAGW alike are fairly certain that the models cannot be pass V & V. So not surprisingly the CAGWers claim it is not necessary while the skeptics claim it is essential.

      Both sides agree that IV&V are essential. The difference is in the level of effort, scope, and rigor that the various stakeholders believe are justifiable. (Sure, there are people like Easterbrook who, IMHO, are unwilling to engage in civil, honest debate. But I don’t think they are typical of most on either side of the debate.)

      I believe consensus IV&V can be reached. I think IV&V experts like Dan Hughes believe so too. The reason why is that it so happens that I have done a lot of “remediation” IV&V for legacy scientific and engineering codes for nuclear facility design, a very contentious area, and was uniformly successful. I’m not that smart. So I don’t see why the same can’t be done for the GCMs. Sure, some effort will be required. But the GCMs are important and the IV&V experts know how to get it done.

  38. Dr. Curry,

    Correct me if I am wrong, but my understanding of your position has been that the combination of known unknowns, unknown unknowns, and uncertainties inherent in the climate system mean that we are not capable of predicting climate with sufficient precision to justify the types of policies advocated by the consensus.

    If CGMs are compilations of the known physical processes and current theories of climate, how then could a model designed to provide such predictions be capable of passing V & V, when all the processes are not known?

    I guess my confusion is caused by what I see as your rejection of the claims the consensus make based on their models, contrasted to what I took to be your belief that the models can ultimately pass some form of V & V.

    When you say “suitable for V & V,” that seems to me to beg the question of whether the GCMs could actually survive V & V. If you think the modelers should be forced to submit their GCMs to V & V for the sole purpose of seeing them fail the process, then I understand it. You are on the same page as Wilis Eschenbach, on this issue at least. He seems to want the models to be subjected to V & V for the express purpose of seeing them fail. Is that your goal as well?

    • If they fail you can always rewrite the spec. It depends upon the failure.

      the issue is the spec. the issue will always be the spec.

    • Gary, V&V does not guarantee any kind of a “perfect” model or perfect forecast. The Roy and Oberkampf paper gets it right. They discuss careful documentation of the uncertainties, and documentation and testing of model calibration as a response to fixing empirical inadequacies. I suggest people read this paper, it is quite readable. From the introduction:

      Scientific computing plays an ever-growing role in predicting the behavior of natural and engineered systems. In many cases, scientific computing is based on mathematical models that take the form of coupled systems of nonlinear partial differential equa- tions. We will refer to the application of a model to produce a result, often including associated numerical approximation errors, as a simulation. While scientific computing has undergone extraordi- nary increases in sophistication over the years, a fundamental disconnect often exists between simulations and practical applica- tions. Whereas most simulations are deterministic in nature, engi- neering applications are steeped in uncertainty arising from a number of sources such as those due to manufacturing processes, natural material variability, initial conditions, wear or damaged condition of the system, and the system surroundings. Further- more, the modeling process itself can introduce large uncertainties due to the assumptions in the model as well as the numerical approximations employed in the simulations. The former is com-monly addressed through model validation, while the latter is ad- dressed by code and solution verification. Each of these different sources of uncertainty must be estimated and included in order to estimate the total uncertainty in a simulation. In addition, an understanding of the sources of the uncertainty can provide guidance on how to reduce or manage uncertainty in the simulation in the most efficient and cost-effective manner. Information on the magnitude, composition, and sources of uncertainty in simula- tions is critical in the decision-making process for natural and engineered systems. Without forthrightly estimating and clearly presenting the total uncertainty in a prediction, decision makers will be ill advised, possibly resulting in inadequate safety, reliability, or performance of the system. Consequently, decision makers could unknowingly put at risk their customers, the public, or the environment.

      • Uncertainties, Uncertainty Quantification, and lack of full-scale testing is recently getting a lot of press.

        A report from Los Alamos on validation when full-scale system results are not available. Clearly not related to climate science, instead an engineered hardware system, but you never know from where you can learn stuff.

        Evolving desiderata for validating engineered-physics systems without full-scale testing See also: LA-UR-10-01494; LA-UR-10-1494: LA-UR-09-01969

        The Abstract

        Theory and principles of engineered-physics designs do not change over time, but the actual engineered product does evolve. Engineered components are prescient to the physics and change with time. Parts are never produced exactly as designed, assembled as designed, or remain unperturbed over time. For this reason, validation of performance may be regarded as evolving over time. Desired use of products evolves with time. These pragmatic realities require flexibility, understanding, and robustness-to-ignorance. Validation without full-scale testing involves engineering, small-scale experiments, physics theory and full-scale computer-simulation validation. We have previously published an approach to validation without full-scale testing using information integration, small-scale tests, theory and full-scale simulations [Langenbrunner et al. 2008]. This approach adds value, but also adds complexity and uncertainty due to inference. We illustrate a validation example that manages evolving desiderata without full-scale testing.

        I like this part:

        We conclude that the following operational definition of validation is a reasonable response to evolving desiderata:

        Validation- The continuing process of establishing robustness, goodness-of-fit, and predictive accuracy based on information that becomes available, even if the information is outside the original domain of applicability.

        I had to look up desiderata: something that is needed or wanted.

        Validation can be viewed as building a wall. Testing of the individual building elements provides information to help ensure that the integrated wall will meet its application requirements. Of course, full-scale information for each application area provides the better information, but sometimes we don’t have that luxury.

        Langenbrunner et al. 2008 is this, and URL’d above:
        Langenbrunner, J.R., Booker, J.M., Hemez, F.M., Ross, T.J. (2008), “Inference Uncertainty Quantification Instead of Full-scale Testing” American Institute of Aeronautics and Astronautics, 11th Non-Deterministic Approaches Conference, Schaumburg Illinois, April 7-1 1, 2008, LA-UR-08-1669.

      • Thanks for this link. This is similar to what Vicky Pope proposed for climate models, towards the end of this post
        http://judithcurry.com/2010/12/01/climate-model-verification-and-validation/

        A more comprehensive framework for climate model validation is presented by Pope and Davies, that includes simplified tests, testing of parameterizations in single-column models, dynamical core tests, simulations of the idealized aquaplanet, climate model intercomparisons, double call tests, spin up tendencies, and evaluation in numerical weather prediction mode. Several of these methods are used by all climate modeling centers; others are merely proposed strategies for climate model evaluation that have not been implemented in a formal way by any of the climate modeling centers.

      • Dr. Curry,

        “V&V does not guarantee any kind of a ‘perfect’ model or perfect forecast.”

        That was not the issue as I saw it at all. I began my first comment quoting this from your post: “…these climate models are used in the IPCC assessment process, which reports to the UNFCCC, and in national-level policy making as well.”

        It is in this context, not an abstract academic or scientific sense, that I framed my comments. It is because of the CAGW movement’s reliance on GCMs to influence policy on such a massive scale that skeptics demand that the models be verified for that purpose.

        Rewriting the spec, as Steven Mosher writes, might well be an interesting exercise, and might result in demonstrating that GCMs are indeed suited for purposes other than driving global energy policy. But it is the massive economic and political change that the consensus is pushing that raised validation and verification (or more accurately the lack thereof) to the level of such importance.

        Are climate models predictive with sufficient certainty to justify using their projections as the basis for massive economic and policy initiatives? Any V & V process that was designed that did not answer that question would seem to me to be irrelevant in the political context. While one that did meet that requirement would seem to be one that the models are certain to fail. Which is why I believe that skeptics are so eager for the process to be undertaken, and the consensus will fight it at every turn.

        I guess I just don’t see any middle ground on that issue, in that context.

      • The appropriate question should be to what extent climate models are useful in policy deliberations regarding climate change, in view of their uncertainties. Uncertainty is key information in the decision making process.

      • One thing that V&V can do is to provide a measure of Risk.

    • GaryM: He seems to want the models to be subjected to V & V for the express purpose of seeing them fail. Is that your goal as well?

      No. The idea, common in science, is to subject the programs to a test that they will pass if they are good enough, but fail if they are not good enough.

      As it stands now, an agreed-upon standard for “good enough” has not been arrived at. By itself, such a lack of an agreed-upon standard is taken by some (or so it seems to me) as a sufficient argument against any testing. I’d prefer to see at least a list of possible standards addressed, derived perhaps from some of the lists earlier in the thread..

      • MattStat,

        I don’t doubt that pass/fail testing of programs is common in science. But we are talking about climate science. And we are talking about massive governmental, NGO, research and “green” economy budgets. At least hundreds of billions of dollars, and political control over the global energy economy, are at stake.

        There is no agreement on what constitutes “good enough” because the cost of failure to meet an adequate, objective standard would likely be the death knell of the CAGW movement.

        This issue of V & V is like a proxy for the greater CAGW debate. GCMs are not paleo reconstructions, disputes about the rise in temperature in the Antarctic or arguments about the forcing/feedback effect of various GHGs. The existence of reliable climate models is the sine qua non of the case for decarbonization. “If we don’t eliminate fossil fuels, this is what will happen.”

        Take that argument away and the whole larger political debate goes with it.

    • The possible answer that the current implementation of the model cannot predict the climate to a “useful” degree of accuracy is a very tangible and useful result for policy makers. It may not be the answer you like, but that is not of concern to what needs to be known.

      What is the weather going to be 30 days from now? The daily weather models are not useful for that purpose. That is a perfectly valid and useful answer. I will use the historical averages for that particular day as my best input.

  39. As a software developer, I get to work in the Agile arena, and I get to work in the formal arena. So I know what I am talking about.

    But I do have one grey area – maybe youse guys can help me out
    What is the most complex to model ?

    Climate
    Stock market
    warfare

    EO

    • Good question! It depends how detailed you want to model? Now you got me thinking. All 3 can be very complex!

    • EO –
      Nobody has ever successfully modeled warfare except under strict – and unrealistic – constraints. Every war – including the present wars, are initially fought based on the last war – and are won by those who most rapidly adapt to the new paradigm.

      Then there were those who reecently thought the had modeled the stock market successfully. Need I mention how that worked out for them – and for the rest of us?

      Climate? After 40 years and reportedly $100 Bn or so, there are those who claim to have modeled climate successfully. There are also those who believe them. And there are those who see more clearly that those models fail to meet the specs that were never written. Of course, you DO understand that if you don’t write the specs then you can’t be held to those standards? Should I mention climate model performance ? Naaah – it’s not necessary here.

      • New paradigms are a bummer eh ?

        just like humans stopping use of CFC or coal.
        half way through a model-run , those humans change, adapt or innovate

        what a bummer. sort of invalidates the model really

      • EO –
        what a bummer. sort of invalidates the model really

        Yup – they’ll do it every time. :-)

      • Jim,

        The recent economic models worked perfectly, The spec said housing prices will never all fall at the same time. Do you have a problem with that?

    • What you want is a crystal ball.

    • a warfare model could be built to call a climate model as a subroutine.
      usually we use a “standard” atmosphere and standard winds. But hey just add a call to “weather” and the warfare model suddenly encorporates the climate model.

      Biggest issue with warfare is unknown unknowns.

      • And there are probably more of them in climate than most people want to believe, too.

      • the crazy thing about unknowns unknowns is that its unknown how many exist. hehe.

      • The biggest issue with modelling warfare, is that as well as the counterfactual and the conditional ,you have to deal with the human reactive element. It’s the human response that is the spanner in the works. and that is my point
        climate models ignore this.

      • Well I did read recently that climate change causes civil wars, so that would be appropriate. We also need to incorporate the mental illness caused by global warming, which would affect the military’s decision making.

    • My 1st question is: what are the criteria to determine the success or failure of each of the models. How will you decide if the model met your expectations? Often the difference between success and failure in model development is actually one of over selling the capabilities of what we can accurately model. For each of the three examples, step one-determine what it is you wish to accurately predict.

      Generally speaking a Warfare model may be the easiest because the timeframes are the shortest, but it really is dependent on how you declare success or failure.

    • Warfare without a doubt! Closely followed by the Stock Market. Climate would be the easiest simply because of a much reduced involvement of humans!

  40. My understanding of the applications of V&V to climate models of the GCM type is inadequate to judge the debate in this thread. Clearly, GCM reliability is currently too uncertain to serve as a precise guide to future climate trajectories, or to make policy decisions that require highly accurate forecasting decades into the future.

    For me, this reality serves as a reminder of what I consider the remarkable nature of a recent thread in this blog – Probabilistic Estimates of Transient Climate Sensitivity. The posting was motivated by a paper by Padilla et al 2011, and the thread also addressed a related paper by Gregory and Forster 2008 (GF08).

    What I found remarkable was the implication that a simple energy balance model based on geophysical principles that are largely unarguable, at least in a general sense, could permit us to anticipate global temperature changes over the course of the multiple decades remaining in this century as a function mainly of the level of climate forcings from greenhouse gases, anthropogenic and volcanic aerosols, and changes in solar irradiance, independent of assumptions inherent in GCMs that are difficult or impossible to verify such as the rate of deep ocean heat uptake. Anyone interested should revisit the thread, but in essence, the two papers concluded that it would be possible to make these projections with reasonable accuracy simply from an equation relating temperature change to forcing. An example from GF08 is the estimate of a 1.3 to 2.3 warming over 70 years for a hypothetical CO2 increase compounded at 1% per year, a rate that doubles CO2 in those 70 years. This could of course be scaled to account for different rates and/or forcings from other climate variables.

    The described approaches are not without limitations. These include questions as to the appropriate assignment of pdfs as Bayesian priors in Padilla et al, or uncertainties regarding volcanic forcing in GF08. The greatest limitation, in my opinion, however, is simply that the methods require that our climate for the next century not differ dramatically from that of the last. For example, the relative role of natural vs forced variation from the last century is used in the Padilla estimate, and a substantial deviation from this proportionality would significantly affect the estimates. The papers do try to assess the effect of uncertainties on the temperature ranges they project, but one can never be certain that every possibility has been considered.

    These transient projections – transient climate sensitivity (TCS) in Padilla, transient climate response (TCR) in GF08 – do not, of course attempt to estimate equilibrium climate sensitivity, as do GCMs, but projections 70 years from now are of more practical import than those 700 years into the future. Unlike GCMs, they provide no simulations of humidity, clouds, wind, ENSO, precipitation, regional differences, and many other variables of interest. Their main thrust is to say, “If we put this much CO2 and methane into the atmosphere, emit these quantities of aerosols, and if the sun changes by such and such an extent, here is the range of temperature change we can expect within 90 or 95 percent confidence limits, based on how anthropogenic and natural influences have operated in past decades.”

    As long as we are interested in climate change within the current century, I believe the greater confidence that appears to be justified in these TCS or TCR approaches will be useful for future planning, and should, in my view, circumvent many of the concerns raised by doubts about the use of GCMs for planning purposes. Of course, it remains to be seen whether that confidence will be validated by the observations that emerge in coming decades. At a very minimum, TCS and TCR represent additional ways of looking at future climate change in a manner independent of GCMs.

    I’ve discussed some of this in more detail in one of my Comments on TCS and TCR, but for an adequate sense of the topic, including the advantages and potential limitations of TCS/TCR, readers should probably look through the entire thread.

    • Fred

      I am stuck at another airport.

      Based on what you have written above will you finally acknowledge that the IPCC AR 4 has numberous poorly supported conclusions and should not be used as a basis for policy implementation?

      • Rob – My perspective is more complex. I don’t think AR4 should be used for any policy requiring highly accurate forecasts. The question as to which policies would be justified by more general predictions with a fairly large uncertainty range (e.g., CO2 will cause non-trivial warming but we don’t know exactly how much) involves a combination of scientific, political, philosophical and economic considerations that are more than I can handle in this comment. This is one of the reasons for what Dr.Curry calls “no regrets” policies that might be reasonable in the face of uncertainty.

      • Bingo

      • Other than the no regrets policy of the construction of things like nuclear power plants and the construction of good infrastructure, then there are not many other steps to take correct? It would not make sense to shut down a coal fired power plant due to CO2 emissions based upon a no regrets policy.

    • I personally think policy can be made with a much cruder form of model than a GCM as you suggest.

      yes, a very simple model can predict the warming we will see in a few decades. even with wide errors bars, its clear that a pro nuclear policy derives from this.

      Then again we knew decades ago that C02 was a potential danger and that nuclear was a path forward. But the anti science left had its way with nukes.

      • Steven Mosher writes “Then again we knew decades ago that C02 was a potential danger ”

        I am not sure who the “we” are that knew this. It certainly does not include many eminent scientists. There is no proper physics to support the idea that more CO2 does anything that is detrimental. In fact the more CO2, we put into the air, up to, say,1500 ppmv, the better things become.

      • There is a fairly large “we” that feel that too much CO2 too fast can have some detrimental effect. The “we” mainly disagree with the most reasonable course to reduce the potential of the detrimental effect. Personally, I think from a US perspective, that land use and secondary pollutants, black carbon, methane, nitrogen Xs provide the best immediate bang for the buck for hedging our climate change bet, stimulating the economy and improving the general overall quality of life. An agnostic may not believe in hell, but why tempt fate?

        As an engineer, I see the value of properly selected nuclear energy. Even light water reactors have excellent potential with some easing of enrichment limits and reduction in scale, big is not really better in LWR design, medium with redundancy makes more sense. Not reprocessing spent fuel is insane. Restricting underground long term storage because of that insanity to 15 millirem per year because of linear no threshold pseudoscience modeling is insanity squared.

        Wind energy is warm and fuzzy but has real limits not to mention butt ugly aesthetics. Community scale solar, based on the affluence and desires of the community makes sense. Biomass produced on arable land is great for land owners, but only warm and fuzzy in positive impact. Trash, being our new major resource, has excellent potential with realistic regulation. And, we still have plenty of room to improve process efficiency in most energy intensive areas.

        Personally, I believe CO2 impact is limited to about 1 to 1.5 C, because we have a positively elegant but complex planetary environmental system with two preferred temperature set points that produces non-linear overall radiative/convective/latent feedbacks. So I guess I am “we”.

      • Steven Mosher writes “Then again we knew decades ago that C02 was a potential danger ”

        I’m not aware of anything of benefit that doesn’t have a potential danger. For example Penicillin. Swimming. Sex.

        What we do know is that CO2 is very strongly correlated with economic well being. As Chinese CO2 levels increase so does their wealth and with this economic power comes political power.

        At the same time CO2 levels have stagnated in the West, as has economic power and political power. In the process of trying to “save the planet” the uncertainty over energy policy has destroyed economic prosperity.

        Maurice Strong and his invention, the IPCC, may yet have the last laugh.

      • Alexander Harvey

        Steven:

        “we knew decades ago that C02 …”

        This is the earliest reference I have come across:

        Question to Gordon Macdonald at a hearing of the Interior Insular Affairs Committee chaired by Clinton Anderson from New Mexico (~1966):

        “If your suspicions about climate change were correct, isn’t that going to be a great benefit to, or a great reason for, pursuing nuclear power?”

        As recalled by Macdonald (AIP Oral Hitories)

        “[Clinton Anderson] was at that time chairman of the Joint Atomic Energy Committee and having Los Alamos in his constituency he was a great proponent of nuclear, and he immediately grasped on climate change as a reason to promote nuclear energy.”

        That was five decades ago, so this knowledge is on its sixth decade at least.

        Alex

  41. Actually, many engineers agree with the current approach of climate scientists — but you wouldn’t know it, from looking a this thread.

    I’m curious about the almost total absence of a discussion of model pluralism (ensemble of models) on this thread. That is the approach that is making the best sense to many of the most experienced scientists and engineers, along with philosophers of science, both in epistemic and practical terms. E.g. Betz, the brilliant Eric Winsberg, W.S. Parker – from all, persuasive arguments against a concept of ‘the best’ model, and for a pluralistic modeling approach that uses the divergence, rather than convergence, of model assumptions and results to more satisfactorily address uncertainty, and future forecasts of the climate system. The size of the dispersion in results informs the measurement of uncertainty.

    Along with an emphasis on falsifications and modular structure in model construction, this is the most current modeling and, together with methodological pluralism, it forms the approach from the climate science community for AR5. So, why aren’t you discussing this?

    By the way, you are missing out on the relevance and expertise of an outstanding Georgia Tech student if you are not reading his blog:
    http://rockydunlap.wordpress.com/category/verification-validation/

    • …emphasis on a modular structure….this is the most current ..approach.

      Wow. Knock me down with feather! What strides the modellers have made to think up a ‘modular structure’! That they have arrived at such a highly technical approach is a cause for wonder!

      When I learnt commercial programming (PL1, COBOL, RPG etc) in the mid 70s, great emphasis was placed upon the concept of Structured Programming – which basically meant using a modular structure for lots of good reasons. Only 40 years later this simple and useful concept has finally penetrated climatology. Let th bells ring out, let the rejoicing begin.

      So, if our modellers contimue their progress at their current rate, we should expect them to cave in and adopt a V&V approach as current everywhere else in about 2045-2050. Only 10 years after the last Himalayan glacier has melted.

      I am not overconfident that in actual fact that the modellers high regard for their abilities and their work would survive a rigorous externa; scrutiny. But if they ask nicely I still have my basic textbooks on Structured Programming somewhere in the loft. Perhasp I shoudl lend them to Harry of Read_Me fame…………..

    • martha:

      did you read any of the crap you linked to.

    • Martha
      The approach you suggest (model pluralism) will not result in a good model except by accident. Averaging the results of models will only disguise a good model and continue the use of a failed model. From an engineering perspective it is a terrible approach to model development.

      The effort should begin with the definition of the key criteria that are of importance to policy makers. IMO these are very simple. If I am the leader of a specific country I would want to know what the impact of climate change will be to my country. This probably means that a climate (or weather) model needs to only provide accurate information regarding future temperature and annual rainfall in my country. Others have will suggested criteria such as sea level, but we already have very good measures of sea level changes, so that does not seem necessary.

      How can any reasonable leader support their populace incurring costs based on someone’s philosophical argument? That just seems unreasonable to the point of insanity.

    • Martha, go back and read my uncertainty paper, which provides a perspective on this

    • Thanks for the link to Rocky Dunlap’s blog. This is pretty much the same thing that I have written in numerous previous posts, with the same references to Winsberg, Parker, etc.

    • Martha
      To give some idea of the “benefits” of an “ensemble” see
      Roy Spencer’s 140 runs (10 runs each of 14 models).
      The results show very wide variation with a mean at 4 months lag being much lower than the data, and all the model runs were below the data. When ALL the individual runs are outside the data in this example, that does not give me great confidence in the future performance of global warming models. Thus the need to check out each sub model against data against with thorough IV&V.

    • Ensembles aren’t talked about much because they don’t provide much help. Averaging outputs which each may or may not have a great amount of uncertainty does not necessarily improve the result.

      Model 1: Toss coin 1000 times, determine chance of getting heads.
      Model 2: Toss coin 3 times, determine chance of getting heads.

      Average model 1 and model 2. Do you like the result?

      If you don’t understand the error characteristics of the model, mindlessly lumping them together just invites more complexity and chaos.

      The model of averaging random noise to improve SNR doesn’t work here.

  42. A time-honored verification methodology is solving the same problem with an alternative algorithm. One of the more intriguing aspects of climate sensitivity theory is the lack of mention of the thermodynamic underpinnings of this flux-to-temperature ratio. In high-school physics one learned watts = volts * amperes. This general thermodynamic expression describes the rate of energy dissipation for a nonlinear, non-equilibrium, steady-state system. It holds no matter whether the internal flux be due to electrons, holes, anions, cations or any combination thereof. Sensitivity directly follows from differentiation of dissipation wrt potential. Now consider a simple thermal conductivity experiment in which an energy flux of 1W transits a temperature difference of 1K. What is the rate of energy dissipation? How does this rate change when we double the temperature difference if our sample’s thermal conductivity is a linear property? … a nonlinear property? There is an equally general thermodynamic expression that answers these questions which a physical chemist should be able to derive in a few strokes following Onsager’s phenomenological description of entropy production (1931).

    That the troposphere is a thermodynamic dissipator of solar energy is a fundamentally different concept from what climatologists have been teaching for nigh a generation. In the steady-state, entropy increases at a constant rate and free energy decreases at a constant rate. The latter is the rate of energy dissipation and equivalent to the work required to maintain the steady-state. The distinction between fluxes of free energy and energy may be difficult to comprehend, hence the thermal conductivity experiment mentioned above. As with electric dissipation, the expression for thermal dissipation holds whether internal transport is conductive, radiative, convective or a blend thereof and leads to sensitivity expressions independent of such assumptions. In the limit of large negative feedbacks, results are equivalent to textbook AGW theory while, with large positive feedback, thermodynamic considerations strongly attenuate sensitivity and excessive temperatures.

    The ‘all talk, no chalk’ format of this forum is not really suitable for a physical science but those thermodynamically inclined should be able to work through the appropriate phenomenological expressions.

  43. Martha,
    Wow, thanks for providing that link with this article title:
    “Why coupling should help with climate model verification but may not in reality”
    And this conclusion:
    “But, here’s the rub. Coupling interfaces themselves are very complex. Often, it is hard to maintain a completely clean separation between modules. For this reason, the coupling interface is often handled by an entire software component (the coupler) which is itself highly complex (i.e., there is a lot going on inside—data transfer, parallel communications, interpolation, etc.). Furthermore, in any given model there are multiple coupling interfaces, and not all coupling interfaces are the same. They have different properties based on the two modules that are interacting. For example, the interface between the atmosphere and the ocean is one in which each component provides a boundary condition for the other. On the other hand, within the atmospheric component, the interface between the “physics” and “dynamics” components exhibits a different kind of complexity—the dynamics component is primarily concerned with horizontal motions while the physics component deals with parameterizations in the vertical. Furthermore, coupling protocols are often multi-phased even within a single timestep. This is due to complex data dependencies between the coupled components.

    So, the challenge is the following. Verification and comprehension of climate models is intimately tied to our ability to represent complex coupling interactions in a manner that can be analyzed for correctness and comprehended readily by model developers. At the same time, performance cannot be sacrificed or codes will be too slow to be useful. We want abstraction and efficiency. We want our cake and we want to eat it, too. Am I asking too much?”

    Did you even read what Rocky wrote?

    • No, she didnt. She is trying to drag a student into this fight, a georgia tech student.

      • lol.
        I make plenty of mistakes, but I try to at least read stuff before I post to it.
        The student sounds like he is trying to be very reasonable in his approach.

      • Steven, students (especially doctoral candidates) are research professionals who are contributing to the development of climate modeling knowledge. The linked blog (not just the one post) is more relevant to the content of this post and a number of questions raised by it than ninety-five percent of the comments on this thread – including yours.

        I often read and appreciate student contributions and I have been reading Rocky’s blog for awhile.

        Since Judith Curry teaches at GT and often works with database group students, she probably already knows Rocky’s work. Regardless, even if the link is more interesting to her and to others, than to you, it is a blog that is relevant to anyone interested in the topic.

        Rocky’s blog has the ear of some relevant professionals. You do not.

        Get a grip on your angry and incredibly juvenile male behaviour.

      • Wrong again.

    • Yes, I read (and understood) it. I don’t think you understood his answer to the question posed. All you seem to know how to do is scan text for something you think says what you want, when it doesn’t. Who taught you to do that? You need to unlearn that, because it doesn’t help you read for comprehension.

      cheers

      • Martha,
        You may have actually read it. You may even think you understand it. But you would be wrong.
        You must be an academic to be able to parse out what you desperately need to cling from things that do not support you at all.

      • Of all the corruptions, that of the word ‘cheers’ makes me the most cheerless.
        ==========

      • That’s fine, hunter. I don’t feel obligated to try to discuss his observations and reasoning, with you. Rocky has the ear of Steve Easterbrook. You do not. My, what a surprise.

      • ……..Rocky has the ear of Steve Easterbrook…

        Is that meant to be some kind of official seal of approval or recommendation?

      • We need Carrick in Easterbrook’s ear.
        ==========

      • Do you mean Rocky has done something terrible and has dismembered Easterbrook?
        I hope he took Easterbrook’s ear very carefully. What about the rest of Easterbrook?
        Or do you think that someone like Steve Easterbrook, waving his arms as to why a basic tool of climate science is a failure is an important source of approval?

  44. “Verification and Validation for ESMs is hard because running the models is an expensive proposition (a fully coupled simulation run can take weeks to complete), and because there is rarely a “correct” result – expert judgment is needed to assess the model outputs.”

    Dr. Curry:
    If the above is correct, then what are we talking about? There must be something grossly wrong with the scientific models to the point that they are worthless.

    • That is why climatologists are considered by those outside Western civilization to be little more than witchdoctors.

  45. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification. And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.

    Now I am a total layman here, but my mother wit tells me that code written without knowing how it is supposed to behave will not be able to “predict” anything! That is nothing more than guessing.

    As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work, because the relationship between particular earth system processes and the overall behaviour of the climate system is exactly what the researchers are working to understand.

    Modelling something, without knowing how to model it, is useless, but then doing more modelling to figure out where the model went wrong and how to model the model better next time is just a model for disaster…
    First go out there and find (empirically) what the rules are!

    • Wijnand | September 26, 2011 at 5:54 pm | Reply
      Now I am a total layman here, but my mother wit tells me that code written without knowing how it is supposed to behave will not be able to “predict” anything!

      The software will be able to predict when you are satisfied with what it predicts, because you will stop changing it at that point. Of course it isn’t predicting the future, only what you believe the future will be,

  46. GCMs are ‘tuned’ through the use of parameters to mimic observations– after the fact–and have no demonstrable ability to forecast the future because GCMs fail hind- or back-casting ability. Moreover, the statistical significance of GCMs—which are reductionist models—can never be known as their method of construction makes the degrees of freedom unknowable.

    A problem in making decisions based on unverifiable models is that those who should be interested in the outcome of such decisions must take the validity of the underlying data on faith. That would be a big mistake.

    For example, McShane and Wyner demonstrated in their 2010 that the data upon which Mann’s ‘hockey stick’ is founded contain absolutely no global warming ‘signal’ whatsoever and the foi2009.pdf disclosures of CRUgate paint a picture of fraud and collusion and corruption in the hiding of previous interglacial warming and the dismissal of previous declines.

  47. Now how can we get climate modelers and politicians to listen to such common sense.

  48. To dispel some myths before they get further out of hand.

    1) V&V is not “industry standard” except insofar as lip service is V&V, in most industries.
    2) V&V is not magic; without V&V the opposite of what the software says is not automatically true.
    3) There no is one single source of truth on V&V.
    4) IV&V is not made more costly, difficult or unobtainable for software depending on how difficult the computations.
    5) Just because you can’t meet the ultimate goal of ‘complete’ V&V isn’t a very good reason to do no V&V.
    6) You can slap V&V on after the software has been developed.. Not usually. That’s called “post mortem” or “forensic audit.”
    7) V&V drives up costs and delays development, but only in some narrow circumstances.

    And on a completely different note, much of what is said about the dynamics of policy decision making is very worrying, in that it implies what is called in the trade, “fingers of glue, feet of clay” by kleptocratic decision makers who don’t actually want good information to tie their hands and force them to have to take bold and unpopular steps.

    I’d like to hear more of Dr. Curry’s experiences with this process.

    • Bart R,
      Without V&V, there is no way that the software being correct is more than a role of the dice.
      As to decision making in the real world, show me the lifelong politicians who show up poor and leave poor.

  49. Somewhere up-thread Nick Stokes inquired about professional socities activities in IV&V. The string had reached max-indent, and potentially MAXENT, so I’m leaving this here. I hope all the URLs are correct.

    The Preface and Chapter 1 of Pat Roache, Fundamentals of Verification and Validation

    The ASME here, here, here, here.

    ASCE activities are noted by Roache.

    A related paper presented at the Second International Conference on CFD in the Minerals and Process Industries, CSIRO, Melbourne, Australia, 6-8 December 1999:

    The AIAA Computational Solid Mechanics

    NASA

    An application to a specific industry.

    • It looked fine in the BBEdit Preview. This one does, too.

      The AIAA Computational Solid Mechanics

    • Dan,
      I didn’t enquire about professional societies activities. I asked for evidence to back this statement:
      “Professional societies require demonstration of the Verification of CFD software that is the basis of a submitted paper.”
      I don’t see any.

      The section you cite from Num Eng is simply setting the requirements for order of numerical approximation. Nothing to do with IV&V.

      Yes, I’m sure there are committees of the ASME producing statements of the kind you linked to. And I’m sure there are enthusiasts like Mr Roache, and people who do find IV&V useful. But there are others who don’t. There’s no rule that works for everyone.

      The NASA doc did not prescribe any IV&V processes. It only required (redundantly) that if any are used, they should be documented.

      • If a requirement is set, do you find it useful to test if it is met

      • Here’s the ASME. You will find like statements in other cites above:

        The Journal of Fluids Engineering will not consider any paper reporting the numerical solution of a fluids engineering problem that fails to address the task of systematic truncation error testing and accuracy estimation. Authors should address the following criteria for assessing numerical uncertainty.

        This statement and the 10 criteria are effectively a working definition of Verification.

      • There’s nothing new about that. Truncation error testing and accuracy estimation has always been part of CFD practice. You can’t avoid doing it with iterative solution, which proceeds until you have achieved a stated accuracy.

        None of this has anything to do with IV&V.

  50. One area the climate models would like fail V&V is over the use of assumptions. This would be classified as high risk.

    None of the IPCC climate models for example consider what might happen to future climate if the cloud feedback is negative. This is a gaping hole when it comes to deciding public policy, because no one has yet proved that feedback must be positive.

    On a question such as feedback, until it is proven that feedback is positive (or negative) the models cannot make any claim of accuracy if they assume it is positive. Thus the IPCC now calls the model forecasts “projections” rather than “predictions”.

    The models are based on assumptions that have not yet be proven to be true. One of the largest of these is the question of whether or not cloud feedback is negative or positive. As we would see on courtroom TV – objection your Honor, assumes facts that are not in evidence.

    • Fred
      One of the most important points that you (and the “climate” model developers) seem to miss is the customer.

      Why is the model being developed, and who is the customer? If the model is being developed to assist in formulating sensible governmental policies then the output of the models should be the ability to predict criteria that the government policy maker considers important. The current model developers ignored this basic point, but then continued to claim the models should be used.

      A number of the comments here are written around what should be included in the models, but we should start with “what do we NEED them to accurately predict” than go to how to achieve that goal

      The key criterion for a policy maker is the potential impact of climate change to THEIR environment (individual nation or state). It is not necessary to have a global model to accomplish that goal, but it is required to have accurate predictions of what additional atmospheric CO2 will do to regional rainfall and temperatures.

      Today-we do not have models that will perform that task for policy makers with a degree of accuracy that is tight enough to be useful.

      If a specific location has been experiencing an average of 30 inches of rainfall annually over the last 20 years and a model predicts that due to a 40% increase in CO2 after 30 years that the rainfall will be 27 inches +/- 40%—it is not really of a high value, but it would be better than we have today.

      • Rob – The sentiment you express here, illustrated in your words “it is not necessary” regarding global change predictions, is one you’ve stated on many previous occasions. It is that we need be concerned only with climate change that affects us where we live and work – our nation, our state, perhaps our city or street or house (forgive the exaggeration). The truth is that I agree with the entire sentiment with the exception of the word “only”. Without question, we do need better regional forecasts than climate models can now provide. But as part of a global community, we must also concern ourselves, I believe, with what will happen elsewhere. It’s partly a moral issue, but also a matter of enlightened long term national self interest, because ultimately disruptions of civilizations far from our own nation will harm us.

        Now this thread is much too small a venue to argue the intricacies of these conflicting views, and so I won’t pursue it, I just wished here to point out that the conflict exists so that others can think about how they would resolve it in their own thinking.

      • Adaptation will always best be regional and local, as will be the politics. Mitigation of anthro CO2, if its net value is negative, may have to be transnational in some way. Please, the guilt will only worsen matters.
        ==================

      • Fred
        I tend to be very practical and view the situation as I believe a potential policy maker should and try to rapidly get to the bottom line.

        If I was a leader or citizen of a country that would get substantial benefits from a somewhat warmer world I would view the situation differently than I would if I had data that showed my country would be substantially harmed. If I was a citizen of a country that would substantially benefit from a warmer world, and some other countries did not benefit (or were harmed), I would expect my leaders to 1st look after the interests of my own country and not to the problems or benefits of other individual nations.

        I do think I wrote that is the ONLY consideration, but I did write that is the KEY criterion. The difference is important. If I wrote “only” somewhere that was an error in blogging. (Which I do frequently when typing without proofing what I am writing)

        As of now, we really have no good information to determine the potential net impact to any nation, but we still get an awful lot of people supporting economically unsound actions in response to a problem they do not even know is real.

      • The BRIC wall.
        =========

  51. Wouldn’t IV&V be a “no regrets” precautionary policy? Where’s the precautionary principle when it actually makes sense to use it?

  52. Steve Milesworthy

    The methods described by Easterbrook are not the only methods used to improve a model, so one cannot draw conclusions about models from just the view in Easterbrook’s post and say that Hughes view is better. The two “camps” are looking at two different things here. One is aiming to get the best physical representation of the current climate the other is using the model to assess the risks of increasing CO2.

    Regarding the Oberkamf paper, quantifying uncertainty is a big research area in climate modelling. It’s relatively new (a few years old) because it is only recently that the compute power existed to run sufficient ensemble members.

  53. Martin Schmidt

    One note about Agile Software Development:
    Perhaps the single most important aspect of going Agile is to do automated unit testing to hold things together. Yes, requirements might be changing, but you test against them, automated and repeatedly. That’s key.

    There is even the related idea of TDD, test driven development, where you are supposed to write the test (against the specification) first, then create the implementation that is checked against it.

    Automated unit tests can do wonders to code quality, and allow refactoring the existing code, because there is a safety net in form of the tests. But admittedly it takes experience to write good tests.

    • Steve Milesworthy

      The model I’m familiar has a daily and weekly set of tests done, with a limited number of changes per day allowed. Developers are required to run the tests prior to submitting their changes. So I think this is dealt with. In part though, this repeated testing of runs is and was already done by the teams of scientist developers long before the idea of agile software came about. They are after all one of the major users of the model.

      This is one of the reasons that has directed Easterbrook to think of the model development process being most similar to the “agile” methods.

      • Martin Schmidt

        Sounds good!
        With some luck the tests could be extended to a procedure like this then:

        Feed data for 1900-1998 into model, run model till 2010, compare simulated results against actual data for 1999-2010, raise an error if difference > 10%.

        Done you are, with the V&V.
        We might want to encourage Easterbrook to follow his Agile Software Development analogy, it could lead him straight down to V&V.

      • Steve Milesworthy

        If you had two identical copies of Earth do you think that you would get identical results that are not “> 10%” (at which point I must add 10% of what?). I think the two Earths would diverge in lots of different ways. So the test is poorly specified.

        The specific post of Easterbrook was looking at comparing models with climatology – essentially the test of models during their development is to compare against climatology. Models are not tested against how much warming they produce or how many catastrophes occur for 2xCO2. So the two points Judith is comparing are looking at different things.

      • Martin Schmidt

        Test spec draft: Say 10-30% difference (apply common sense and reasonable scales/granularities) for temperature, cloud albedo, precipitation, local averages, global averages, running means, variance, maybe even for multiple runs with slightly different start conditions, … you name it.
        To define good test criteria should be no problem.
        Beware: they might reveal how good the model in fact is.

        A testing framework could be used not only to check that the runs don’t crash and that the masses are preserved, but with some extensions hopefully for whole model runs, testing their predictive power with the data already at hand.

        I agree, testing has many aspects, can/should be/is applied at many different levels and form factors. I see it in the broadest sense, leading ultimately to V&V.

      • Steve Milesworthy

        So Easterbrook’s post includes a diagram listing diagnostics to enable a model to be compared to the previous version in a visually intuitive way. The “Agile”-ness, I think, refers to this comparison and iterative improvement – like writing a list of tests and developing the code till, one by one, each of the tests has passed.

        (Again, though, this type of testing is less relevant to Judith’s concerns. Looking at the use of perturbed-physics ensembles and the like would seem to be more appropriate for that type of testing which is to see how sensitive a projection of “CAGW” is to the range of realistic model assumptions).

  54. I missed the part where one climate model is proven correct. Can someone point out the correct one?

  55. “Automated unit tests can do wonders to code quality, and allow refactoring the existing code, because there is a safety net in form of the tests.”

    Absolutely, because done correctly, unit regression test will spot bugs where you haven’t even thought to create a test case. It is magic from a developers point of view – if you do it.

    To explain: If you run a program and get an output, save it. Then make a code change and rerun the program and compare this output to the previous output.

    Any line of output that has changed, if your test cases did not flag this as an error previously, you have just discovered a bug for which there is no test case.

    Either the corresponding row in the original output is in error, or the row in the current output is in error, or both. The fact that you didn’t catch this error shows that you need to add a test case to determine which is in error and which is not.

    When you use agile methodology (which is not trial and error as some would believe), and you apply regression testing religiously, then you will catch tons of bugs you didn’t even think to test for.

    The sad reality is that very few development efforts historically have used this approach, even though if was suggested in the scientific literature in the 60’s and 70’s. Instead most software development is poorly tested, with testing applied only as an afterthought. The only reason it delivers reasonable results is because a human being double checks that the results are reasonable.

    However, the reasonableness check cannot be used on any program that is intended to forecast the future, because no one can accurately say what future is reasonable.

    As such ,the traditional software testing techniques are next to useless for V&V of any program intended to predict the future. The only way I knew this could be done would be the “hidden future/hidden past” approach, where you purposely withhold information from the program under test using double blind conditions to see it the program can predict what is being withheld with greater confidence than chance.

    If the program can predict the withheld information, and the experiment is truly double blind (hard to do)., then you have the basis for a validation.

    • I used to work on a large compiler, and we had an enormous regressions suite of programs which had to both compile and run, or report deliberate faults in the test programs. As the years went on and computers got faster, this test suite grew and grew – roughly keeping step with Moore’s law!

      Without such a test suite, development would have been utterly haphazard.

      I have great doubts however about the value of formal specifications. A formal specification would bear little resemblance to the real specification, and would itself contain bugs!

      If there were a way to write perfect software, Windows would contain no loopholes to be exploited by hackers etc. and no bank would ever find its online banking had been compromised.

      From reading the HARRY_README file of programmers notes, I would say that the work at the CRU is typical of software written by new graduates in a subject other than computer science – they learn on the job and produce a fairly awful result. Clearly if there had been a regression suite at the CRU, ‘Harry’ would not have had such touble reproducing past results already in the literature!

      Rather than impose V&V on climatology, I think it would be best to recruit a number of experienced programmers and statisticians to examine the code, with authority to call for the withdraw past research results that seem to be based on excessively shaky evidence.

  56. Engineers are irrelevant to climate science.

    • I guess Cosmos thinks scientists are irrelevant to climate engineering?

    • Fortunately, engineers can’t help but debunk politicized science. It’s a compulsion. It’s a state of mind, AND compelled by their public standards.
      ==========

    • So we who develop and use computer models on a daily basis are irrelevant?

      Is it irrelevant that the people who know the most about computer models publically state that the process by which the current models were developed and the conclusions reached with these models are invalid based upon the fact that the models can not be validated?

    • Well yeah, cause the science is settled

  57. This just in: they are still at it.

    Russell, George. 2011. U.N. Seeks to Raise “Level of Ambition” in World Climate Regulations. Text.Article. FOXNews.com. September 30. http://www.foxnews.com/world/2011/09/30/un-seeks-to-raise-stakes-in-world-climate-regulations/?test=latestnews

  58. Just a technical comment, for interested reader to look further.

    in the Y2K period I’ve been working with a tool called CADNA.
    it is a tool to implement an interval arithmetic , based on stochastic model.
    the idea is to compute two possibilities of rounding, or mixing for every opération. However it does a random choice, not a worst case interval.
    So it is more realistic/loose than strict interval arithmetic that ignore error compensation.
    It can estimate the impact of roundoff, and help you detect the “stochastic zero”, when a result cannot be told to be anything positive or negative , and have no “precision”
    there are equivalent concept about comparisons, like the really strictly positive/negative. You can use them to drop a meaningless result, or refuse to take a decision when a data cannot decide.

    AFAIK it was (is maybe) used in meteo-france weather forecast model to estimate the reliability of results (beside montecarlo models).

    we used it in modeling to detect when computation (subtractions typically) remove any meaning to the result.

    ref: http://www-pequan.lip6.fr/cadna/

    by the way, using formal check for model is meaningless. It is realistic only for very small hard real-time life-of-death, or for cryptosystems.

    anyway my experience in modeling, applied to what i understand of climate, is that the problem is in “fudge factor”, “confirmation bias”, in a context of under-known data and over-tweakable models.
    phenomenological models should be tried before you try pretended full upfront physic (which is NEVER full, and never really upfront. full physic is always a phenomenological model, just looking like the basic laws, but always checked against reality ).

    like always, the problem with software is not even the software engineering, but the management who make the specifications.

    also note that the problem with lean programming, agile methods, and so on, is that it need a very very (unrealistic) good competence of the developers in software architecture, software patterns, and a very very strict test method (test oriented programming)…

    I suppose that what they call agile method, is in fact, an iterative cowboy programming method, done by non professional software engineer.
    classic AFAIK in research. hopefully, domain expert and peers can detect when the model get crazy (except when it get crazy in the expected direction).

  59. I’m not sure if the following info made it into the threads here, so here it is.

    Josh Stults posted an announcement for a V&V workshop at Norte Dame.

    This list of Abstracts now has URL links to some presentation slides.

    This presentation is related to models and methods for one aspect of the Earth’s climate systems: Validation and verification in global atmospheric chemistry models.

    The presentations address various aspects of V&V and UQA and SQA when applied to wicked problems in science and engineering. Applications to real-time, safety-critical software systems is not addressed.

  60. Alain;
    just a little English hint: it’s “physics”, not “physic”. The word “physic” is an archaic term for a purgative dose to cleanse the bowels. A strong laxative.

  61. Good perform! This is the types of facts that are supposed to always be shared on the internet. Pity on bing for the present time never placement that post increased! Occur in excess of along with check with my site. Thank you Means)

  62. What’s up, this weekend is pleasant for me, for the reason that
    this occasion i am reading this fantastic educational article
    here at my residence.