How should we interpret an ensemble of models? Part I: Weather models

by Judith Curry

Over the last two weeks, there have been some interesting exchanges in the blogosphere on the topic of interpreting an ensemble of models.

rgbatduke kicked off the exchange with a comment over at WUWT, which was elevated to a main post entitled  The “ensemble” of models is completely meaningless, statistically, which elicited an additional comment from rgbatduke.  Matt Briggs responded with a post entitled An Ensemble of Models is Completely Meaningful, Statistically.

Who is correct, rgbatduke or Matt Briggs?  Well each made valid points, and each make statements that don’t seem quite right to me.  Rather than digging into the statements by rgbatduke and Matt Briggs, I decided to do a series of two posts on ensemble interpretation.   Part I is on weather models, including seasonal forecast models.

ECMWF ensemble forecast system 

An overview of ensemble weather forecast models is given by the Wikipedia.  See also this excellent presentation by Malaquias Pena.

The European Centre for Medium Range Weather Forecasting (ECMWF) arguably produces the world’s best weather forecasting system. The ECMWF ensemble weather forecast system includes the following products:

  • High resolution atmospheric model: 1-10 days at 0.125o x 0.125o horizontal resolution, available at 3-hour intervals to 144 hours, and at 6-hour intervals at beyond 144 hours. Output variables include 10 m and 100 m wind velocities and maximum 10 m wind gusts.  Base time for forecasts: 00 and 12 UTC daily.
  • Atmospheric Ensemble Prediction System: 51 ensemble members, 1-15 days at 0.25o x 0.25o resolution to 10 days and 0.5o x 0.5o resolution beyond 10 days. Available at 6-hour intervals.
  • Monthly forecasting system: 51 ensemble members, 1-32 days at 0.5o x 0.5o resolution. Output variables include wind velocities at 10 m, 1000 hPa and 925hPa, available at 6-hour intervals.  Base time for forecasts:  00 UTC, twice per week Calibration forecasts (reforecasts) are provided once per week.
  • Seasonal forecasting system: 41 ensemble members, 1-7 months at 1.5o x 1.5o resolution.  Output variables include 10 m wind velocities, available at 6-hour intervals, once per month.  Historical (hindcast) simulations are provided back to 1980.

The ECMWF ensemble members are generated using a singular vector approach   that perturbs both model parameters and initial conditions.

A few weeks ago, I attended the annual users meeting at ECMWF [link].  Some background presentations on the ECMWF weather forecast system are provided in the following presentations, including some verification statistics:

For my research and the operational forecasts provided by my company Climate Forecast Applications Network (CFAN), we use ECMWF products.  I gave the keynote address at the recent ECMWF workshop, my presentation can be found at: Applications of ECMWF forecast products for the energy sector.

Ensemble interpretation

Specifically with regards to  ensemble interpretation, my presentation focuses on the following techniques:

1.  Statistical postprocessing using reforcasts and recent model performance, relative to observations, using Bayesian bias correction, quantile-to-quantile distribution calibration, and model output statistics.

2.  Provision of probabilistic forecasts of surface weather, and applications of extreme value theory to probabilistic forecasts of extreme weather events

3.  Expansion of  ensemble size through use of lagged forecasts and Monte Carlo resampling techniques.

4.  Ensemble clustering techniques

The techniques used by my team rank among the most sophisticated currently being used in an operational environment, although there are some more sophisticated techniques being used in research mode, e.g. ensemble dressing.

Averaging the the ensemble members to produce a mean is often done, effectively providing a deterministic forecast, but this does not take advantage of a primary rationale for the ensemble approach in terms of characterizing uncertainty.

If you make a deterministic forecast, then verification is simply done against observations using mean absolute error, correlation statistics, etc.

For an ensemble forecast, the following represent some commonly used verification statistics (from the Malaquias article linked to above):

Comparison of a distribution of forecasts to a distribution of observations:

– Reliability: How well the a priori predicted probability forecast of an event coincides with the a posteriori observed frequency of the event

– Resolution: How much the forecasts differ from the climatological mean probabilities of the event, and the systems gets it right?

– Sharpness: How much do the forecasts differ from the climatological mean probabilities of the event?

– Skill: How much better are the forecasts compared to a reference prediction system (chance, climatology, persistence,…)?

Performance measures of probabilistic forecast:

  • Brier Skill Score (BSS)
  • Reliability Diagrams
  • Relative Operating Characteristics (ROC)
  • Rank Probability Score (RPS)
  • Continuous RPS (CRPS)
  • CRP Skill Score (CRPSS)
  • Rank histogram (Talagrand diagram)

Multi-model ensembles

An ensemble size of 51 members works pretty well for many weather situations, although we noticed that last winter the ensemble size was definitely too small owing to highly variable and unpredictable conditions.  For longer time scales (e.g. seasonal forecasts), an ensemble size of 40 is generally regarded to be too small.

EUROSIP is a multimodel ensemble for seasonal forecasts including ECMWF, UK Met Office, and MeteoFrance; recently the U.S. model was added.  From the linked presentation by David Stockdale:

What would an ‘ideal’ multi-model system look like? Assume fairly large number of models (10 or more) 

  1. Assume models have roughly equal levels of forecast error 
  2. Assume that model forecast errors are uncorrelated 
  3. Assume that each model has its own mean bias removed 
  4. A priori, for each forecast, we consider each of the models’ forecasts equally likely [in a Bayesian sense – in reality, all the model pdfs will be wrong] 
  5. A posteriori, this is no longer the case: forecasts near the centre of the multi-model distribution have higher likelihood 
  6. Different from a single model ensemble with perturbed ic’s. 
  7. Multi-model ensemble distribution is NOT a pdf

Non-ideal case 

Model forecast errors are not independent. Dependence will reduce degrees of freedom, hence the effective n; will increase uncertainty 

In some cases, reduction in n could be drastic 

Initial condition error can be important. The foregoing analysis applies to the ‘model error’ contribution to error variance 

Initial condition error and irreducible error growth terms follow usual ensemble behaviour, and must be accounted for separately 

 What weight should be given to outliers?

Method for p.d.f. estimation (1)

Assume underlying normality 

Calculate robust skill-weighted ensemble mean. Do not try a multivariate fit (very small number of data points)

Weights estimated ~1/(error variance). Would be optimal for independent errors – i.e., is conservative.

Then use 50% uniform weighting, 50% skill dependent

Comments: Rank weighting also tried, but didn’t help.

QC term tried, using likelihood to downplay impact of outliers, but again didn’t help. Outliers are usually wrong, but not always.

Models usually agree reasonably well, and tweaks to weights have very little impact anyway.

Method for p.d.f. estimation (2) 

Re-centre lower-weighted models. To give correct multi-model ensemble mean Done so as to minimize disturbance to multi-model spread 

Compare past ensemble and error variances.

-Use above method (cross-validated) to generate past ensembles 

-Unbiased estimates of multi-model ensemble variance and observed error variance 

-Scale forecast ensemble variance 

-50% of variance is from the scaled climatological value, 50% from the scaled forecast value 

 Comments: For multi-model, use of predicted spread gives better results.For single model, seems not to be so.

An additional example for weather models is TIGGE (Thorpex  International Global Grand Ensemble).  A good overview of TIGGE is give in this presentation by Hamill and Hagedorn hamill hagedorn:

One goal of TIGGE is to investigate whether multi-model predictions are an improvement to single model forecasts
•The goal of using reforecasts to calibrate single model forecasts is to provide improved predictions
•Questions:
-What are the relative benefits (costs) of both approaches?
-What is the mechanism behind the improvements?
-Which is the “better” approach?

.

To cut to the chase, the best model (ECMWF) performs as well as the multi-model ensemble mean, and ECMWF calibrated by the reforecasts outperforms the multi-model ensemble.

Here is how my team has approached the issue.  We use multiple models in our hurricane track forecasts and in our seasonal forecasts.  However, we do not combine the simulations from the model ensembles into a grand ensemble; rather we consider each ensemble separately and the forecaster weights the ensemble based on recent model performance or uses the additional models in characterization of forecast uncertainty.

JC conclusion:  The weather modelling and forecast communities have developed sophisticated techniques  for the interpretation of ensemble simulations.  The extent to which we can usefully apply these techniques to climate models will be discussed in Part II, along with alternative strategies for ensemble interpretation.

Moderation note:  This is a technical thread, please keep your comments relevant.

 

130 responses to “How should we interpret an ensemble of models? Part I: Weather models

  1. Take a 40 m thermally isolated tank, with a 4 degree heat-sink at the base, and slowly fill with sea water at the same temperature and salinity as the ocean you are modeling.
    Then shine light onto the surface, at different frequencies, and monitor as the temperature comes to steady state. Find out if blue light and red light give rise different different temperature gradients. Monitor also the rate at which energy leaves the system in the form of water vapour.

    Nice simple model

    • David Springer

      Rob Brown had a cow over ensemble models because you don’t do crap like that with physical systems that are understood. Imagine chemists making decisions about refinery operations based on model ensembles or space shuttle operations or any number of other applications. You just don’t do it.

      The reason for having model ensembles is that you don’t have a single reliable model. You have a bunch of unreliable models and you hope that they all fail at different times under different conditions. So instead of one model that’s hideously wrong once in a while you trade it for an ensemble which is a little wrong all the time.

      This tradeoff is wonderfully demonstrated now in the CMP5 model ensemble which generated the recent sensitivity PDF that made its way into the main stream media recently. The models in the ensemble were trained on 1970-1990 data then run forward forecasting GAT (global average temperature). Actual GAT plotted against the model forecast had GAT slowly but surely drifting away from the ensemble mean to the low side until, after 22 years (1990-2012), GAT drifted below the 95% confidence bound of the ensemble. Classic.

      This however is useful. It demonstrates that the models are flawed and the flaw is making them run hot. There are at least a few good candidates for the flaw or some combination of the candidates ranging from instrumentation error, cloud feedbacks, ocean overturning, aerosols, galactic cosmic rays and solar magenetic field strength, and CFCs. Undoubtedly I missed some and the possibility exists for unknown unknowns too.

      • michael hart

        Yup.
        When my chemical-synthesis experiments failed, I had to think of a different synthetic route to actually make the product.

        Reaching for the statistics, in order to prove that the failed experiment was actually a “success”, was not an option.

        ——-
        If I have a disease and go to a doctor, I don’t want to see “projections”, I want real treatment. I don’t want to be shown a computer model that helps the doctor simulate sick patients and healthy patients, I want to be cured.

        There is no partial-credit for wrong answers in medicine. Some climate-changeologists think there are no wrong answers in climate.

  2. There are two means to control the Energy of Earth. Radiate the Energy or Reflect it before it becomes Heat. Moving Energy around is very interesting and important but only what gets out counts.

    The Radiation always works and does respond to changes in various things. I think a lot of the responses are in the right direction. I agree with John Kerr about this.

    It has no set point. It is a control but it is not an on and off thermostat with tight bounds.

    Ice and Water is an on and off Thermostat with a Set point that is determined by the temperature that sea ice melts and freezes.

    When the sea ice is melted you have a huge amount of warm wet water to produce moisture for snow. This snow falls for years and causes ice advances later.

    The ice advances increase Albedo and cools the earth. The water gets cold and freezes and stops the snowfall. The ice continues to advance and cool the earth but ice volume is now decreasing. At some point, the ice quits advancing because the heads of the glaciers and ice packs can’t push ice fast enough to replace what melts at the tails of the glaciers and at the edges of the ice packs. Ice retreats, albedo decreases, and Earth warms again.

    What part of this is different from your understanding of basic physics?

    Does anyone else have an opinion about this? Tell me what you think. Use general distribution or private email. We are trying to solve a serious problem. The serious problem is not Climate, the serious problem is how much damage are we going to do to ourselves to solve this thing that has no real data to prove it is a problem?

    • Herman,
      I think of open Arctic water as the open thermostat in an auto cooling system. Warm water exposed to space radiates far more heat than ice covered water does (plus the heat removed by evaporation), I think the albedo change from ice becoming water has much less an effect than most assume, first the sq meter exposed to space, is far smaller than a sq meter as exposed to incoming Solar, and at higher angles more of that light is reflected. In fact I think it radiates more heat than it absorbs.

      http://www.iwu.edu/~gpouch/Climate/RawData/WaterAlbedo001.pdf

    • Curious George

      Herman – what does it mean “moving energy around”? Do you really believe that depositing coal does not count?

  3. Guess the ensemble of Newtonian and Einsteinian mechanics would explain more. Poor Einstein was just not climatically intelligent

  4. Why are the first two comments completely off-topic? Jeez.

    That is a pretty chewy, jargony post. I assume under Ensemble Interpretation point one that “reforests” is a typo for “reforecasts.”

    Intuitively it seems that reforecast ensembles using the same model are easier to understand in some quantitative, parametric way because the variation is generated from the “same” basic data generating process. Multi-model ensembles seem very hard to handle in this way because the set of models doesn’t randomly sample some well-defined “space.”

    Prof. Curry says: “We use multiple models in our hurricane track forecasts and in our seasonal forecasts. However, we do not combine the simulations from the model ensembles into a grand ensemble; rather we consider each ensemble separately and the forecaster weights the ensemble based on recent model performance or uses the additional models in characterization of forecast uncertainty.”

    This seems sensible, but will not satisfy those who want to purge the process of human judgment in favor of something more algorithmic. The question is whether the human judgments applied are IMPLICITLY forming a grand ensemble, only in an unsystematic and uncontrolled way. (This question is related to the argument made by Bayesians that everyone is really a Bayesian whether they know it or not–for any conclusion a non-Bayesian draws from data, one can go back and derive a prior for which the data would have led to that conclusion.)

    • “Why are the first two comments completely off-topic? Jeez”
      Well Planks constant give you an estimate for the smallest size an object can be so we know the number of angels that can dance on the head of a pin is between one and 30 vigintillion angels.

      http://en.wikipedia.org/wiki/Saturday_Morning_Breakfast_Cereal

      However, experimental science gives us different models and different types of answers to computer generated very pretty rubbish.
      This type of analysis is so far removed from what was generally considered science; generation and testing of an hypothesis, that the questions and answers are more like sympathetic magic than science.

      Models that fair to match reality are failed models.

      • Is the unit for Planks constant board-feet?

      • Sorry Frank wrong link; here is the calculation

      • Steven Mosher

        matching reality is an ill defined metric, unless you can define in advance what you mean by matching and what you mean by reality.

        Lets suppose you are building a CCRP display for a fighter bomber. The CCIP display will continously compute the impact point ( ccip) for the munition and display it to the pilot. To build this display you run a bomb model. Basically a flight model for the bomb as it falls from the plane to the earth. I can well assure you that the model used to compute the flight of the bomb to earth does not “match” reality. As a physical object the bomb has 6 degrees of freedom in its flight x,y,z, psi theta phi. We disregard 3 of these, roll pitch and yaw and model it as a point mass.
        We disregard some of the flight dynamics of the early seperation. While the drag may change throughout the flight based on orientation we disregard that and use one Cd. Corealis effect has to modelled. Then there is the wind. While we know the wind at our altitude we dont know the wind at the surface, so the reality of how the wind changes is ignored.
        The model of the bomb doesnt match reality at all. In fact its utterly non physical and non realistic. We dont judge the success of the model by its ability to match reality. we judge its success by hitting the target. In other words is it accurate enough to get the job done. To be sure following the laws of physics makes this easier, but there are also times when you have to ignore the laws of physics to get the job done. You dont have time to compute an exact result and you might lack data ( like are there wind gusts at the surface? updrafts? etc ).

        before folks apply a standard of ‘matching” “reality” they need to understand that models will never and can never match reality unless you define ‘match’ to include a precision term and unless you specify the exact subset of ‘reality’ that you want to ‘match’

      • Steven Mosher, “Lets suppose you are building a CCRP display for a fighter bomber. trying to estimate climate sensitivity. Knowing that climate is not weather, you take weather models and replace accurate calculations with gross approximations and assumptions then let the models that would diverge in days do the diverging over decades without including realistic boundary conditions. Then you average the outputs of all the cluster f&%$s. After 20 years of repeated cluster F&%$ averaging, your confidence internals in the ensemble mean becomes more uncertain.

        When is it time to punt?

      • “We dont judge the success of the model by its ability to match reality. we judge its success by hitting the target.”

        Actually, hitting the target is as real as it gets, especially if you’re the target.

        In the case of climate models, the “target” is the mythical “global average temperature” that’s on all the famous scary charts.

        When climate models, even one of them, start hitting their target, get back to us.

      • “models will never and can never match reality…”

        This is potentially a very big issue.

        Andrew

      • Steven Mosher

        Gary

        “Actually, hitting the target is as real as it gets, especially if you’re the target.

        In the case of climate models, the “target” is the mythical “global average temperature” that’s on all the famous scary charts.”

        You actually dont have to hit the target. You just have to be close enough.

        That’s the point.

        If a climate model says ECS is 2 or 3 or 4 really whats the difference?

      • Steven Mosher, “If a climate model says ECS is 2 or 3 or 4 really whats the difference?”

        A few standard deviations and a few trillion dollars.

      • Steven Mosher
        “You actually don’t have to hit the target. You just have to be close enough.

        That’s the point. ”

        Matching a “Global” temp isn’t good enough when you know that there are points on Earth that have modeled temperature that are physically impossible, and the only reason the global temp is close is that there’s a second location that is equally as wrong but in the other direction.

        This is not close enough!

  5. Lance Wallace

    “1. Statistical postprocessing using reforests ”
    reforests? perhaps a misprint for reforecasts? But wait, what is a reforecast?

  6. BOTTOM LINE: ”predicting the weather past next Monday” is waste of time.

    predicting 6 months in advance, is tarot cards job

    predicting 50-100y in advance is a Nostradamus job; OR opportunistic Warmist crap, rip-off and entertainment for the ”bingo players/sandpit activist”.

  7. “Models usually agree reasonably well, and tweaks to weights have very little impact anyway.”

    Could that have something to do with having realistic boundary conditions instead of a “range of comfort”?

  8. maksimovich

    The European Centre for Medium Range Weather Forecasting (ECMWF) arguably produces the world’s best weather forecasting system.

    Lorenz 1982 used the ECMWF to ascertain the doubling rate of the initial error which was around 1.85 days.Presently the error has increased to 1.25 days.eg

    http://onlinelibrary.wiley.com/doi/10.1111/j.2153-3490.1982.tb01839.x/pdf

    The ensemble becomes very unstable in phase space.

    • “Lorenz 1982 used the ECMWF to ascertain the doubling rate of the initial error which was around 1.85 days.Presently the error has increased to 1.25 days.eg”

      Yeah, but if you halve your error without doubling the doubling rate, that is progress.

      • Actually no,the doubling rate has shortened to 1.25 days.More “progress” will approach the limit. the lyaponov boundary of 0.138 time units

      • Wouldn’t you also consider the model precision from the start?

      • Do not mistake increased observation with increased model precision.Say if aeronautical observation was present in the 19th Century Custer might have concluded it was a bad day for a picnic at the forks of the little big horn.

    • Max, Do you have a reference for this. If true, its devastating for any numerical consistent tracking of the “real” dynamics.

  9. “Who is correct, rgbatduke or Matt Briggs? Well each made valid points, and each make statements that don’t seem quite right to me. ”

    That is also where things were left with me. I suspect a part of the problem is in the language and detail. It seems that there very very few safe unspoken assumptions or conditions when a discussion of statistics is taken up. [I know everytime I've rushed, I find a cow paddy.] Blogs in particular are not conducive to that sort of thing, but they have other advantages. Thanks for the detailed posting. I look forward to following the discussion.

    Of particular interest are the threads of independence and representativeness of observations as treated in textbook discussions. Another theme in my mind is possible blurriness [to non-experts] when a bayesian talker informally interacts with a classic talker. For example Briggs early on states that Brown ‘ takes too literally the language of classical, frequentist statistics, and this leads him astray.’ So is he (Briggs) rejecting a frequentist approach as a priori out of hand? Who knows, because Briggs moves on. Yet to me this seems to be a point where more complexity and doubt (on fundamental appraoch) is introduced into the discussion. A little closure on that aspect would’ve helped.

    Brown’s comment really went offtrack (IMO) with the detailed discussion of many-body methods. MBT is pretty complicated in its own right and he should’ve stayed focused on climate models–or at least he should’ve condensed the MBT models discussion. Again, more discussion on independence and some discussion or representativeness of the ‘observations’ would have been better. In general practice how much do we follow and how much do we wink at those concepts? Establish that, then move on to a discussion of these concepts in the context of climate ensembles. That would’ve been helpful.

    I hope an interesting discussion unfolds here.

  10. How should we interpret an ensemble of models?

    A. With a large dose of salt.

  11. Both are right.

    rgbatduke is absolutely correct. It is utterly meaningless statistically to interpret, frame, use or abuse ensembles of models as WUWT seeks to.

    Matt Briggs is absolutely right, too, and moreso. It is quite possible to obtain substantial, useful, correct, meaningful statistics from ensembles of models if you do it right, with goodwill toward your readers and a rational approach to your methods.

    Which I said long ago in my criticisms here of some treatments of ensembles by representatives of the IPCC, in both senses.

    But I’m not here to cover myself in toldyaso glorification. What I say about statistics ought not matter to anyone in the least. Doing the statistics ought matter. So when I say Matt Briggs chose perhaps the weakest and least useful possible still true argument to address R. G. Brown’s premises, you don’t need to pay any attention to me.

    Predictive power of weather events is not a valid test of ensemble climate models.

    And another thing. While Von Storch argues that less than 2% of model runs would match the prolonged five-year trendology extended to the past sixteen years, he omits to consider that well over 15% of models quite exactly match the behavior of each of the two consecutive eight-year spans covering the past 16 years, and likewise of the three consecutive five-year spans making up the past 15 years, a series of events that while they do come up at under 2% or so likely, are themselves both bets that players make in Las Vegas and win quite often.

    Here: a bit of artwork showing an Italian Flag representation of the past six decades as Von Storch ought be depicting the climate to match his claims:

    http://www.woodfortrees.org/plot/esrl-co2/normalise/offset:0.48/detrend:0.25/plot/esrl-co2/normalise/offset:0.36/detrend:0.25/plot/hadcrut4gl/last:108/mean:29/mean:31/plot/hadcrut4gl/from:1994/to:2003/mean:29/mean:31/plot/gistemp/from:1984/to:1993/mean:29/mean:31/plot/gistemp/from:1974/to:1983/mean:29/mean:31/plot/hadcrut4gl/from:1964/to:1973/mean:29/mean:31/plot/hadcrut4gl/from:1954/to:1963/mean:29/mean:31

    Notice how the climate is climbing the CO2 curve within bounds of uncertainty, if you will pardon the expression, unflaggingly.

  12. David Wojick

    Weather models are tested repeatedly so it is possible over time to attach probabilities to their ensembles. This cannot be done for multidecadal and century forecasts of climate models. Hindcasting does not count because the climate models are tuned to the past.

    • “…there are inevitably processes occurring in the atmosphere and oceans which are partially resolved or unresolved and must be represented by some parameterized closure approximation.” ~T. N. Palmer

  13. Willis Eschenbach

    I have a couple of problems with “ensembles”.

    First, we have to recognize that an ensemble of gypsy forecasts is no weightier than a single forecast, and that if 19 out of 20 gypsies say you should give them your money to be blessed, this does not give you a p-value of 0.5 about anything but gypsy forecasts, and certainly not about reality …

    Next, we have to define “ensemble”. The word is used for two very, very different situations. One is the EMCWF method. You vary the parameters a little in your model and in your inputs, and call a hundred or a thousand of such results an “ensemble”.

    The other use of the word is for a group of models designed and built by different people, and given vastly different forcings. Of course you can’t have a thousand or a hundred of those at will. You have exactly as many as you have.

    The use of the same word for both situations, which Judith has continued in this post, is a source of huge confusion. The two “ensembles” are not alike either conceptually or theoretically.

    Finally, I object strongly to the use of the model spread to indicate the uncertainty in the range of real-world outcomes. This bizarre bit of illogic leads to a contradiction, which is that the addition of less accurate models, to a tightly focused and similar ensemble that is clearly unrealistic because it is far from observations, can allow you to claim that your model ensemble results include the real-world situation, merely by increasing the standard deviation …

    Overall? I was fascinated by this comment:

    To cut to the chase, the best model (ECMWF) performs as well as the multi-model ensemble mean …

    I don’t understand this. If the average, the “multi-model mean” includes the best model as well as less accurate models, how can the best model not outperform the mean? What am I missing here?

    w.

    • Steven Mosher

      Willis

      “The other use of the word is for a group of models designed and built by different people, and given vastly different forcings. Of course you can’t have a thousand or a hundred of those at will. You have exactly as many as you have.”

      1. all models even EMCWF are written by different people.
      2. They do not have ‘vastly different’ forcings. For CMIP 5 for example the
      forcing data is very tightly constrained. To be sure you can find some differences, 2 or 3 models for example dont use volcanic forcings, but
      “vastly different” is a dubious claim. I would ask for example how many
      of the CMIP models have you examined the forcings for in order to
      make this testable claim of “vastly different”

      • Willis Eschenbach

        Thanks, Steven. You say:

        1. all models even EMCWF are written by different people.

        You misunderstand my point, likely my fault.

        EMCWF uses what is essentially a “Monte Carlo” technique. They take one model written by one group, and they vary the both the inputs and the model parameters slightly. In this way, they can generate a hundred or a thousand different outputs. The EMCWF folks call this an ensemble, and they average the results.

        The IPCC, on the other hand, uses a group of individual models. Each model was developed by a different group of people. Each model is unique, consisting of thousands of lines of individual code. There are at any instant only a fixed number of them, you can’t just generate more the way the EMCWF does. The IPCC calls this an ensemble, and they average the results.

        My point was that the use of the same term, “ensemble”, for a Monte Carlo analysis on the one hand, and for an actual group of distinct individual models on the other hand, is leading to confusion. Statistically these two situations are very different, yet they are treated as being the same.

        2. They do not have ‘vastly different’ forcings. For CMIP 5 for example the forcing data is very tightly constrained. To be sure you can find some differences, 2 or 3 models for example dont use volcanic forcings, but“vastly different” is a dubious claim. I would ask for example how many of the CMIP models have you examined the forcings for in order to make this testable claim of “vastly different”

        As you point out, some don’t use volcanic forcings at all. This is not a small difference. The exclusion of the cooling volcanoes from the GISS forcing increases the average forcing by about 40% … To me, this is already “vastly different” from those that don’t, although YMMV.

        However, it’s much worse than that. Your comment reflects a common misunderstanding. In fact, the CMIP5 folks do not specify the forcings as you claim. You can check this by downloading the CMIP5 historical global radiative forcing dataset here. You’ll find the following statement:

        NOTE: THIS FORCING DATASET IS NOT CMIP5 RECOMMENDATION, AS CONCENTRATIONS, NOT FORCING, SHALL BE PRESCRIBED IN MAIN CMIP5 RUNS.

        Note also from that webpage the following note indicating that the forcings are not recommended, only the concentrations:

        CMIP5 Recommendation: Only the mixing ratios for long-lived GHG emissions are CMIP5 recommendation.

        Since there is absolutely no prescription of the forcings, it’s clear that they are not “tightly constrained” as you say. Let me give you a quick tour of the actual reality. Here’s what the CMIP5 folks recommend regarding the forcing:

        SOLAR: Some models use spectrally resolved, some don’t. It is recommended that the Sparc/Solaris data be used … but as with all of the CMIP5 forcing related issues, it’s a recommendation, not a requirement.

        Greenhouse Gas Concentration Data: No forcing data, just concentrations. Some models also use aerosol concentration data to calculate the actual forcing, and some don’t. In addition, there are no less than 31 GHGs. Not all models include all GHGs, they are free to pick and choose which ones to use.

        CO2 Emissions Data: Some models include CO2 from land use change, and some don’t. Emissions are specified for each, but the forcings are calculated by each individual model.

        Aerosols and minor GHG emissions data: Modelers get their choice of which species to include, and they have two datasets to chose from. Again, either the concentrations or the emissions are specified by CMIP5, not the forcings.

        Volcanic Aerosols: From the CMIP5 folks: “There are no emissions data currently available for natural aerosols (e.g., volcanic, sea salts, dust).” And you think the forcings are “tightly controlled”?

        Land Use Change: the changes of land use are specified as areas converted from one use to another. The models use different subsets of this data. Some models include the conversion to/from urban land. They get a different dataset from those that don’t. In addition, you get the choice of two alternative datasets. Also, some models use historical biomass burning emissions, and some don’t. Lots of choices.

        Ozone: some models calculate ozone, some have it specified. Once again, there’s a recommended database for those that have it specified, and once again, it specifies concentrations, not forcing.

        Sea surface temperature and sea ice: Some models calculate these. For those that don’t there are two datasets, with one being recommended.

        So no, Steven, the forcing in CMIP5 is not controlled at all. At the most basic level, each modeling group gets to decide which of the many forcings it wants to include. This already makes for large variations in input data.

        Then, for the included forcings, each modeling group gets to decide whether to have the model calculate them, or instead to use the specified values. Again, more variation.

        So before we even get to the recommended concentrations, there is already a large difference in the forcings for the various models

        Once we get to the CMIP5 recommended concentrations or emissions, first, they are merely recommendations, and not all models follow every one of them. Next, they are not forcings, so each model needs to actually calculate the forcings from the concentrations or emissions. Also, there are often alternate datasets for the concentrations, depending on the assumptions of the individual models.

        Net result? Lots of variation in the CMIP forcings, they’re not “tightly controlled” in any sense.

        Best regards,

        w.

      • David Springer

        @mosher

        You wrote that even ECMWF models are written by different people. It would seem not. Which part of the did you not understand:

        “The ECMWF ensemble members are generated using a singular vector approach that perturbs both model parameters and initial conditions.”

        Eschenbach is about twice as bright as you are but that’s still an insult to Eschenbach.

      • Steven Mosher

        willis

        “The IPCC, on the other hand, uses a group of individual models. Each model was developed by a different group of people. Each model is unique, consisting of thousands of lines of individual code. There are at any instant only a fixed number of them, you can’t just generate more the way the EMCWF does. The IPCC calls this an ensemble, and they average the results.

        ######################

        several inaccuracies.

        1. Some of the models are developed by the same people, but run
        at different labs
        2. You can just generate more. EMCWF is one model with many
        runs. You are aware that you can do the same thing with a GCM
        and that, in fact, there are multiple runs for individual models.
        #########################

        My point was that the use of the same term, “ensemble”, for a Monte Carlo analysis on the one hand, and for an actual group of distinct individual models on the other hand, is leading to confusion. Statistically these two situations are very different, yet they are treated as being the same.
        ####
        Its entirely unclear whether these situations are different in any substantive way or any practical way. That’s a claim which would need proof.

        #############

        “As you point out, some don’t use volcanic forcings at all. This is not a small difference. The exclusion of the cooling volcanoes from the GISS forcing increases the average forcing by about 40% … To me, this is already “vastly different” from those that don’t, although YMMV.”

        1. That’s not accurate. you should actually look at results with and without volcanic forcing. Its part of the experimental design

        Willis, I’m well aware that the instructions permit some latitude in forcings.

        Your claim was that the forcings were vastly different.

        That is a TESTABLE claim.

        you do not establish your claim by pointing to a documeent that ALLOWS differences. you establish that claim by actually getting the forcing files from the modelers.

        you need to check out the metadata. Then do the comparisons.

        When you do that then you can come back and explain in using Watts how vastly different the forcings are.

        Hint: your ability to model the models as a simple function should tell you something about the underlying forcings being very similar

      • Steven Mosher

        david

        ‘David Springer | June 25, 2013 at 7:55 am |
        @mosher

        You wrote that even ECMWF models are written by different people. It would seem not. ”

        You think it was written by one person?

        A) ECMWF was written by many people.

        In an ECMWF experiment one model written by many people is run
        multiple times.

        B) The models used by IPCC ( including ECMWF) are written by many people.

        In an IPCC experiment many models written by many people are run
        multiple times.

        Question:

        Is the average of many models run multiple times somehow different from a single model run multiple times?

        I have yet to see a convincing argument that the average of multiple models run multiple times is somehow different from a single model run multiple times. Put another way, the average of multiple models run multiple times is itself a model run multiple times.

        Now to be sure one can construct all sorts of differences but I have yet to see a difference that makes a difference.

        So, I see a bunch of arm waving about how these two are different.

        i see a bunch of people claiming that running one model multiple times is fundamentally different than averaging many models run multiple times, but I’m not seeing any practical proof of that.

      • Steven Mosher

        willis the easier way to see that the forcings are not “vastly different”
        is to understand what is meant by RCP.

        experiments for RCP6, for example .. what does the 6 stand for?

        http://www.pik-potsdam.de/~mmalte/rcps/

        next would be to read this

        http://cmip-pcmdi.llnl.gov/cmip5/docs/Taylor_CMIP5_design.pdf

        next would be to read this document and go read the netcdf files

        http://cmip-pcmdi.llnl.gov/cmip5/docs/CMIP5_output_metadata_requirements.pdf

        So, you cannot look at the guidelines to determine what was actually done. A speed limit sign that tells you you can go 70 mph, doesnt tell you how fast you were actually going. You actually have to look at the files.
        And then understand what RCP means

    • David Wojick

      Agreed Willis, an ensemble of runs by a single model is very different from a ensemble of single runs by many models. The former is a model ensemble while the latter is an ensemble of models. The former might be argued to produce a subjective probability distribution for that model because it is a form of repetitive sampling. But the latter merely shows how the models differ. I see no logical way to derive a probability distribution from these model differences.

  14. If you could give a gift to climate modelers it would be 10,000 years of detailed observations and forcing measurements.

    I am sympathetic to their problem of lack of data to validate their models and having to wait such long periods to iterate their models. It is an extremely serious limitation, and I would be surprised if anyone disagrees with this.

    And there almost nothing that can be done about it…but to…wait…

    wait…

    wait…

    wait…

    • Tom Scharf | June 24, 2013 at 10:39 pm |

      You may be onto something. And the wait may be shorter than you think.

      The Marcott infographic is not detailed instrumental temperature records.

      However, it exceeds 10K years, and it allows the opposite of what you propose, that is the generation of 10K years of models based on supposed Holocene conditions that can be dithered to simulate the various paleo components of Marcott — and of ‘new’ paleo as yet undiscovered — as a form of reliable validation.

      That would be quite a thing.

      A few here have considered this proposal. I think it has legs. Far more than BEST did when it started, IMO. Of course, BEST got enthusiasm and funding and credible volunteers.. which Climate Etc. may not be the best place to campaign for.

      • To the first order, the Holocene arc seen by Marcott is predicted by the precession cycle (Milankovitch). We are in a cool Milankovitch phase, and it should have cooled more, not warmed.

      • “The Marcott infographic is not detailed instrumental temperature records.
        However, it exceeds 10K years, and it allows the opposite of what you propose, that is the generation of 10K years of models based on supposed Holocene conditions that can be dithered to simulate the various paleo components of Marcott — and of ‘new’ paleo as yet undiscovered — as a form of reliable validation.”

        Yes, climate scientists can’t even model today’s climate with any accuracy, but by all means, let’s use the vastly inferior paleo data to “generate 10K years of models.”

        What? No single model will have any chance of being in the least bit accurate?

        No problem, just “generate” enough models and do the necessary statistics. Just be sure not to use those moldy old “classical” statistics.

  15. Weather prediction differs from climate projection because weather prediction is an initial value problem. You don’t get a good forecast unless your initial condition is right, or your ensemble captures the uncertainties in it and the forecast model. Even with a perfect model, data limits the forecast due to error growth rates that eventually spread to larger scales (the Lorenz problem). Forecasts only have skill out to a week or so. Seasonal forecasts rely on also getting the ocean forecast right, which is even harder due to lack of 3d data. If you can’t predict the next El Nino you are toast, and they can’t really until it has almost started. So when we compare a spaghetti plot of climate models with the observations over the last ten years, the problem of getting the anomalies right is in initial value one. Climate model ensembles just use random initial conditions because they are not trying to get the short-range right, just the possible variability, so it is too much to expect them to get ‘the pause’ unless you think their ocean forecast will be good for ten years, which it isn’t. The ocean model is very limited by resolution too, because important upwelling parts of circulations are poorly resolved, and this does not help with getting a good PDO, for example, so some of the natural variability may need better computers. I would argue, however, that natural variability on decadal average scales is about a tenth of a degree (from what we see in the 20th century data), so not getting it right is not a critical issue when climate change is several degrees. It is a little detail on the long-term trend, but impacts short term projections because the climate trend is 0.1-0.2 degrees per decade which ca be compared with the natural variations about the trend.

    • Yes, Jim, you are giving the standard gloss on the truth. Climate models are also initial value problems and have all the same problems as weather models, but the errors are much larger because of the course spatial grids.

      There are two possibilities:
      1. The dogma of the attractor: You always get sucked in to the attractor in the long term so short term errors don’t matter. This is just so far as I can tell wishful thinking. There is no mathematical or theoretical justification, merely, the observation that climate models don’t blow up. This is the official ex cathedral dogma of the team and the IPCC. As I say, there is really no support for it.

      2. Clmate models have so much spurious dissipation that all error modes (and also the signal you want to track) are dissipated away leaving a simulation with forced and wrong artificial stability.

      I actually favor the second theory because it is supported by 50 years of experience in numerical PDE simulations and we know that the models have a lot of spatial dissipation added explicitly. I just finished verifying this for the latest NCAR model which by the way still uses the leapfrog time marching scheme with the filter shown by Paul Williams to be far too dissipative.

      You need something better to have a chance with people with actual mathematical background in PDE’s.

      • The reason that climate is not an initial value problem, but a boundary value problem, is that climate depends on the forcing, not the initial state. This is the fundamental difference. In a climate context, the boundaries are the forcing, which includes continental configuration, atmospheric composition, solar radiation, orbital characteristics, etc. Keep these the same and climate has a limited range of variation, apart from some sensitivity with the ice-albedo feedback when it gets cold enough (i.e. Ice Ages) or warm enough (loss of sea ice and major continental glaciers like Antarctica and Greenland) where tipping points are seen in paleoclimate. However, you could argue the initial state comes in via a hysteresis effect. That is a characteristic of catastrophe theory, for example, where hysteresis and tipping points are seen. In certain cases, the climate may do one thing in a cooling trend and another in a warming trend, such that for a given forcing there may be two quasi-stable states, depending which way it got there from, but this collapses to one state at a tipping point when the trend continues. This is usually associated with the area of ice albedo regions. The ongoing loss of the Arctic sea ice is one possible such tipping point, where it won’t easily come back even if somehow we reduce CO2 to current levels in the future because the albedo is that much lower leading to a warmer state.

      • Chief Hydrologist

        Jim,

        I can assure you that models are chaotic.

        ‘Sensitive dependence and structural instability are humbling twin properties for chaotic dynamical systems, indicating limits about which kinds of questions are theoretically answerable. They echo other famous limitations on scientist’s expectations, namely the undecidability of some propositions within axiomatic mathematical systems (Gödel’s theorem) and the uncomputability of some algorithms due to excessive size of the calculation.’ http://www.pnas.org/content/104/21/8709.long

        And an open arctic leads to more snow. – http://www.gtresearchnews.gatech.edu/arctic-ice-decline/

        And God only knows what drives Arctic cloud cover. – http://www.arctic.noaa.gov/detect/climate-clouds.shtml

        It’s a complex system – Jim – and chaotic as well.

      • The reason that climate is not an initial value problem, but a boundary value problem, is that climate depends on the forcing, not the initial state. This is the fundamental difference. In a climate context, the boundaries are the forcing, which includes continental configuration, atmospheric composition, solar radiation, orbital characteristics, etc. Keep these the same and climate has a limited range of variation, [...] [my bold]

        I doubt you could call “continental configuration” a “forcing”, but either way, the actual boundary condition is not any sort of “average” of the actual shapes of the land (and ocean edges and bottoms), but the actual shapes themselves. If a butterfly flapping its wings differently can make the difference where a thunderstorm develops a week later, the precise location of a hill, or even a tree (or building) living on that hill, can make the difference how weather patterns tend to develop in a particular area, which arguably can make a difference in how the climate responds to other “forcings” such as changes to pCO2 or solar radiation.

        The point I’m trying to make is that we don’t know just how much variation is possible to any characteristic (e.g. Climate “sensitivity”) depending on geological differences, and we don’t know at what scale, if any, such differences cease to have an effect.

        It’s easy to assume that such effects cancel out at, say, the scale of individual mountains (as opposed to mountain ranges, which probably do have an effect). Indeed it seems intuitively obvious that the small size of mountains, compared to that of important drivers of the weather, means that their individual locations and shapes would “cancel out” in their effect on the larger scale.

        The problem is that the intuition involved is shaped by our great familiarity with linear processes, and could easily be wrong when applied to non-linear systems like the weather, and the climate. Very, very easily.

      • David Young

        Jim, Jim, A boundary value problem is defined very precisely. It’s a STEADY STATE problem where the solution is a function of the boundary values. An initial value problem is a TIME DEPENDENT problem where the solution is a function of the initial conditions.

        Climate is an initial value problem with time varying forcings. A climate model is just a weather model run for a long period of time on a course spatial grid. This whole boundary value problem thing is just wrong and deceptive. The idea behind this dogma I think is deliberate deception. The climate communicators know that those who are “somewhat” mathematically literate will know that “boundary value problems” are well posed for elliptic problems. Of course Navier-Stokes is NOT elliptic, but the pateena of well posedness might wear off on climate models. They probably chose this set of words to try to counter the well known “butterfly effect” which even the mathematically illiterate have heard about.

        This is a shameless and despicable tactic to “sell” something that is just plain false.

      • David Young, that is what I was trying to convey. Climate is an equilibrium state even if it changes slowly with the forcing. For a given forcing you get a steady state, but in reality the climate goes slowly from one equilibrium to another because the forcing is always changing. The current state with a rapidly changing forcing is leading to a transient climate, but it is on its way to a new equilibrium state once the forcing stops changing so quickly, and meanwhile that climate change is in one well defined direction.

      • Jim, I read what you said and it sounds rational and reasonable but it is wrong. Climate is never steady state, its just the average of weather which is clearly time dependent.

        Look at Palmer and Slingo to see how different but close initial distributions for the Lorentz attractor lead sometimes to diverging final states (time dependent states). Paul Williams’ video at the Newton Institute show something similar, but there its the time stepping method that leads to different states.

        I challenge you to show me how climate can be defined as a boundary value problem in a rigorous sense. If I’m wrong on this, I want to know.

      • Chief Hydrologist

        Climate is a chaotic system composed of control variables (your so called forcing) and multiple feedbacks. The feedbacks create the non-linear response and there are numbers of feedbacks that exceed the potential forcing from greenhouse gases.

        ‘The top-of-atmosphere (TOA) Earth radiation budget (ERB) is determined from the difference between how much energy is absorbed and emitted by the planet. Climate forcing results in an imbalance in the TOA radiation budget that has direct implications for global climate, but the large natural variability in the Earth’s radiation budget due to fluctuations in atmospheric and ocean dynamics complicates this picture.’ http://meteora.ucsd.edu/~jnorris/reprints/Loeb_et_al_ISSI_Surv_Geophys_2012.pdf

        The ‘large natural variability in the Earth’s radiation budget due to fluctuations in atmospheric and ocean dynamics’ occurs on scales from years to eons. It is abundantly clear that these have been the dominant influences over recent decades and – as the greenhouse forcings have been of a such small effect – will be in future.

      • David Young, what you are asserting is that a change in forcing like a 1% solar increase will not be seen as a climate change, because the background is too chaotic, when even more subtle things like the sunspot cycle and volcanoes have measurable effects. The background climate is very steady in the absence of forcing changes.

      • Chief Hydrologist

        Have trouble reading Jim?

        ‘The top-of-atmosphere (TOA) Earth radiation budget (ERB) is determined from the difference between how much energy is absorbed and emitted by the planet. Climate forcing results in an imbalance in the TOA radiation budget that has direct implications for global climate, but the large natural variability in the Earth’s radiation budget due to fluctuations in atmospheric and ocean dynamics complicates this picture.’ http://meteora.ucsd.edu/~jnorris/reprints/Loeb_et_al_ISSI_Surv_Geophys_2012.pdf

      • Here is an example of how direct an effect global warming has on the weather. Weather Channel’s Stu Ostro shows this chart which demonstrates how the isobar representing geopotential height has risen by 25 meters over the years:

        This has the effect in Ostro’s words of thickening the atmospheric layer leading to changes in pressure formations.

        I checked on the height change and the slope matches the global warming curve exactly
        dT/dh = mg / R
        where m is the avg atmospheric molar mass. dT is the change in GAT and dH is the change in geopotential height, R and g are universal gas constant and gravitational constant, respectively.

        This kind of thermodynamic characteristic has NOTHING to do with any chaotic models and is simply the result of bulk effects and the forcing function applied.

        I marvel at the people stuck in the complexity and chaos trickbox, forever unable to extricate themselves, while offering no fundamental advances to the knowledgebase.

      • Chief Hydrologist

        Prattling and preening again I see.

        Geopotential is a function of air temperature – which is largely due to natural factors. So what you are saying is that temperature increased. Duh.

        In the short term most global temperature change it is due to ENSO – which is chaotic in itself – and on decadal scales (and much longer) it is changes in ocean and atmospheric circulation which is most certainly chaotic with multi-decadal changes obvious in the behavior of these systems. To understand the Earth system you need to understand this. But you can’t for some weirdness of your own. I suppose it is tantamount to admitting that you have wasted your entire freaking life.

        The graph is utter nonsense – doesn’t even have the units right – temperature did not increase in the last decade – the geopotential height along with cloud height decreased because of cooling sea surface temps. The latter is certainly chaotic – but you remain utterly clueless about even the basics of oceanography.

        And what the hell is this?

        dT(k)/dh(m) = 29.4 (g/mol)/ 8.3144621(J/K.mol)

        Somehow the surface temperature is at all meaningful in the upper atmosphere and the change in temp over the change in geopotential equals a constant that is mysteriously derived from the average molar mas of air and the gas constant? Notwithstanding that the units are nonsensical.

        You are a freaking idiot who is destined to remain utterly clueless.

      • The little larrikin Luddite-in-Chief again lashes out against anything resembling thought.

        It is called independent verification of global temperature increase and no, this is not caused by ENSO.

        Totally predictable to watch him struggle with the simplest equation as well.

      • Did you get the mathematics right yet, Chief ?
        Never seen the barometric formula?

      • JC SNIP

        These are real macro effects which impact the mean value of observable weather and climate characteristics.

      • Chief Hydrologist

        It is that you are not interested in observation. Merely strange manifestations of simplistic ideas.

        http://www.auckland.ac.nz/uoa/home/news/template/news_item.jsp?cid=466683

        http://www.nasa.gov/topics/earth/features/misrb20120221.html

      • CH, you are also asserting that a 1% sustained or growing increase in forcing won’t be detectable against chaotic internal variations in global temperature, and the evidence does not back up such an assertion because even small regular changes like sunspot cycles are detectable and those are only about 0.06%.

      • Jim D, I am NOT saying that changes in forces cannot be detected in the climate. What I am saying is that the attractor can be so complicated and have many many bifurcation points that models even ensembles of models can be useless. I think that’s pretty clear from the Lorentz system. In any case, I think its clear that both the initial conditions and the forcings can change the outcome, which might not be a recognizable stable climate “state” anyway.

        Thus, “climate is a boundary value problem,” is false in the technical mathematical sense and I believe is a deceptive way to say it. The further problem of course is that models do not track the actual attractor but have large temperal and spatial dissipation that is not physical.

      • Chief Hydrologist

        No Jim – science is saying that you have fundamentally underestimated natural variability in radiant flux at TOA.

      • David Young, yes, see what I said about tipping points. Forcing can lead into those (like the rapid loss of ice areas). There are areas of increased forcing that have two solutions, both warmer to different degrees. If we don’t move along the forcing line, we keep the climate much more stable than if we push along with the changing forcing (changing the climate boundary condition).

      • Chief Hydrologist

        http://www.gtresearchnews.gatech.edu/arctic-ice-decline/

        Loss of ice in the Arctic seas can lead to either a slowdown in THC or expending ices sheets. Both of which can radically cool climate with feedbacks in a different sign and much larger than the original forcing.

    • Jim, I’m curious, if model runs start with random initial conditions, and some models produce large unforced oscillations approximately equal to the last 100 year warming trend, how do they manage to not produce runs that are all over the map since it would be reliant upon where in that unforced oscillation the model run began?

      • The energy balance constrains how big these oscillations can be and how long they can last. Larger oscillations like El Nino are short as the hot anomaly is radiated to space once it gets to the surface and atmosphere from the deep ocean. It is like a restoring force towards an equilibrium surface temperature.

      • Jim, if they are real or not doesn’t matter for this point. The point is the models make them. When they do a model run how do they manage to not start the run at different parts of the oscillation if they use random initial conditions?

      • steven, these are not global average surface temperature variations of the kind we see over the last century. They are changes in gradients that cancel globally. Your better bet is a PDO-like thing that can vary global decadal-average temperatures by as much as 0.2 degrees around the long-term mean, but there might be some dispute as to whether even that is partly solar (1910-1940) and aerosols (1950-1980), which are of course forcings. Hard to separate a natural part out.

      • Jim, where did you come up with the idea that they cancel globally? The authors state that their findings make it tenuous that any of the warming of the last 150 years was from radiative forcing. Perhaps you see something the authors missed? Can you point it out?

      • steven, do you think a trend in the zonal temperature gradient is the same as a trend in the global temperature?

      • Jim, I have no idea what you are talking about. I look at the pretty pictures where some pictures have a lot of red and other pictures have a lot of blue and I read where the authors say the internal variability of the models could cause as much warming as we experienced in the last 150 years. I assume the reviewers looked at the same pretty pictures and read the same words I did. The only thing I don’t know is which pretty pictures and which words you are looking at to think there is only a redistribution of temperature. Can you point them out or can’t you?

      • steven, well, you can read the abstract where it only talks about zonal gradients, but other than that I can’t help you.

      • Well that doesn’t say anything about global temperatures balancing out. The only studied one zone in depth so they mention it as zonal but they had comments on other areas and implications. Skip to the conclusion since the entire paper is available.

    • I think your assertion “climate is a not an initial value problem, but a boundary value problem, is that climate depends on the forcing, not the initial state” is an assumption, and personally I think the assumption is false.

      I don’t think there is any real meaning in the separation of weather and climate, they are the same underlying system, our terms relate to scale, both temporal and spatial. I think the climate is as chaotic as the weather. To be clear this wouldn’t mean that forcings don’t impact, just that internal processes also impact significantly, are more than noise, and can’t be filtered out.

      • agree, it’s an arbitrary assumption. The variatons seem to be similar at all timescalrs.

      • Edim, ” agree, it’s an arbitrary assumption. The variatons seem to be similar at all timescalrs.”

        Ah, yep. Almost like the theory is missing a big piece.

      • It is like the question of whether the earth would warm if you increased the solar radiation by 1% (equivalent to doubling CO2 in its effect on the energy balance). I think it would, but you say no? This type of warming is defined as a climate change, not weather. The sun is a “boundary forcing” being external to and unaffected by the earth system.

      • Chief Hydrologist

        ‘‘The top-of-atmosphere (TOA) Earth radiation budget (ERB) is determined from the difference between how much energy is absorbed and emitted by the planet. Climate forcing results in an imbalance in the TOA radiation budget that has direct implications for global climate, but the large natural variability in the Earth’s radiation budget due to fluctuations in atmospheric and ocean dynamics complicates this picture.’ http://meteora.ucsd.edu/~jnorris/reprints/Loeb_et_al_ISSI_Surv_Geophys_2012.pdf

        The energy balance changes a lot more through ‘the large natural variability in the Earth’s radiation budget due to fluctuations in atmospheric and ocean dynamics’ over years to eons. Climate is never stable and lurches unpredictably from attractor to another. There are many factors in ocean and atmospheric dynamics.

        ‘The global climate system is composed of a number of subsystems — atmosphere, biosphere, cryosphere, hydrosphere and lithosphere — each of which has distinct characteristic times, from days and weeks to centuries and millennia. Each subsystem, moreover, has its own internal variability, all other things being constant, over a fairly broad range of time scales. These ranges overlap between one subsystem and another. The interactions between the subsystems thus give rise to climate variability on all time scales.’ http://academia.edu/3226175/Mathematical_Theory_of_Climate_Sensitivity

      • Chief Hydrologist | June 26, 2013 at 1:58 am repeated again: ”’‘‘The top-of-atmosphere (TOA) Earth radiation budget (ERB) is determined from the difference between how much energy is absorbed and emitted by the planet”

        Chief has become a scratched vinyl record; he kept repeating what the leading conmen used to say, now he is repeating himself… is is dementia, or just too much fourex beer?

      • JimD said

        Absolutely not!

        Weather and climate are both boundary value problems, patently!

        The disagreement between us is your assertion is that climate is only a boundary conditions problem, not an initial values problem, that over time the noise will even out and that will be left is the effects of the forcing. This is where I disagree, I believe climate is also sensitive to initial conditions, that internal variability is a manifestation of that, and that while forcings of course play their part, they are not the only story.

    • Chief Hydrologist

      http://www.gtresearchnews.gatech.edu/arctic-ice-decline/

      Loss of ice in the Arctic seas can lead to either a slowdown in THC or expending ices sheets. Both of which can radically cool climate with feedbacks in a different sign and much larger than the original forcing.

  16. Tom Scharf said: ”I am sympathetic to their problem of lack of data to validate their models and having to wait such long periods to iterate their models”

    Sympathetic… sympathetic…?! If one hasn’t enough data; shouldn’t ”predict” with confidence and squander billions on shonky predictions.

    meteorologist are good at predicting weather / climate for 3-4 days in advance. Anything more than that is suspicious. Instead of ”predicting” climate 2-3-4-5 years in advance – PREDICT the wining numbers in next week’s lotto; it’s only a week in advance AND are less numbers in all the lottery; than variations in temperature on the whole planet; and that temp changes every 10 minutes in a year: http://globalwarmingdenier.wordpress.com/climate/

    • I’m sympathetic in the academic / engineering / numerical sense. It’s an incredibly difficult problem and they don’t have nearly enough information to solve the problem. It’s questionable how well you could do it even if you had all the info you wanted.

      Trying to predict the climate is not a bad thing.

      And for the most part, the modelers aren’t running around making grand statements on the efficacy of their models. They let others do that and they turn their heads. It’s in their interests to do so. They should not be getting a free pass on this.

      I’ve yet to see any journalist interview actual modelers and ask them the hard questions. Do you trust your own models? How skillful are their forecasts? How do you know? How would you know they aren’t working well? etc.

  17. When taking an average of all deserts in the Victory Cookbook, if I understand correctly, you get baked camel.

  18. 1) Gather all the models plus measured reality.
    2) Throw out the high and the low, examine the data.
    3) Issue press release that everything is in very close agreement.
    4) Don’t mention your thrown out ‘low’ was measured reality.

  19. Climate models. Giggle.

  20. This discussion reminds me of what ecologists faced in the 60s. Everyone realized that an individual organism/population had to operate within a complex “ecosystem”. Very little progress was made for many years because the whole was just to complex to describe or comprehend. There were many discussions about how best to proceed. Progress began when the complexity was broken down into more manageable components, beginning with identifying trophic levels, the interaction between trophic levels, and then looking at the flow of “energy” through the system. The early “modellers” slowly graduated from descriptive to analytical. Understanding proceeded slowly, and progressive steps often had to wait for new information from specialists within the biological community. It has taken 50+ years to get where we are today- which isn’t all that far. Several of the comments above suggest that the available data, and the interaction among variables simply doesn’t allow the definitive answers folks want. I enjoy following climatology, but the old adage of using cadillac methods on horse and buggy data seems to apply- especially since the cadillac is missing some pretty significant parts.

  21. Cees de Valk

    Seems that you already have another conclusion: in a case where we can actually determine the benefits of a multi-model ensemble empirically, it appears that there are none, except possibly to indicate situations with relatively large uncertainty related to sensitivity to model formulation.

  22. Chief Hydrologist

    The El Niño-Southern Oscillation (ENSO) remains neutral. There is some divergence in the outlooks from the seven international models surveyed by the Bureau of Meteorology. Both the Bureau of Meteorology and UK models suggest there is an increased chance of weak La Niña conditions forming during the winter months. However the remainder of the models indicate neutral ENSO conditions are likely to continue.

    The Indian Ocean Dipole (IOD) is currently negative. All five international models surveyed suggest this negative event is likely to persist throughout the southern winter and into spring. A negative IOD event during winter-spring increases the chances of above normal rainfall over southern Australia.

    http://www.bom.gov.au/climate/ahead/model-summary.shtml

    This is about ensembles in weather forecasting – and yet the same tired old arguments about climate models are trotted out.

    These are the models – http://www.bom.gov.au/climate/ahead/model-summary.shtml#tabs=Models

    Here some detail.

    http://www.bom.gov.au/climate/ahead/models/model-summary-table.shtml

    Each of these GCM is an initialized and perturbed model that is run a set number of times. The results are reported both as a stpred and as a mean as a mean and then amalgamated with other models and the mean of the means determined.

    It is difficult to understand what the spreads in the models represent – other than the ever present chaotic divergence of solutions. The statistical treatment is fairly basic – simply the mean of a chaotic spread or the mean of a mean of chaotic spreads. It seems merely a mathematical convenience that seems not to have much essential rationale. Nor does it seem of much utility overall to have such a broad spread.

    For seasonal forecasts – the tendency is to use simpler models of sea surface temperature associated with patterns of rainfall – such as that involving the IOD above.

    http://www.bom.gov.au/climate/ahead/rain_ahead.shtml?link=1

    It looks like reasonable rainfall ahead for most of the country.

    Or even longer term.

    http://s1114.photobucket.com/user/Chief_Hydrologist/media/USdrought_zps2629bb8c.jpg.html?sort=3&o=32

    America is in for drought for bit yet.

  23. An ensemble seems to have few advantages. Sure it increases the sample size if it can be shown to have the same statistical parameters. It mighht have advantages in identifying some types of experimental error

  24. I read this thread, and there is another one to come, and I wonder why our hostess has chosen to introduce them. Once again, there is an implicit assumption that CAGW is a problem; otherwise why bother about climate models at all? The current weather models give us good forecasts for a few hours, as required for flight plans, and a few days for other reasons. The thread seems to assume they are useful as an intial stage for climate models.

    So far as CAGW is concerned, IMHO, all climate models do, is to give a bunch of “scientists” an excuse to pretend that the numbers they derive, such as climate sensitivity, are something other than guesses. So I cannot see why climate models are worthwhile discussing in the context of CAGW.

    • Yes, but only the existing models, which assume some signifficant anthropogenic effect (ghgs, aerosols etc) plus volcanic and some solar and mostly ignore or downplay (not global) any ‘natural’ variabilities and oscillations at multi-decadal and -centennial timescales.

      If the climate change at these timescales is mostly of solar and astronomical origin plus ‘internal’ climatic oscillations and resonances, it could be possible to model and predict these climate changes, complex or not (the exact physical mechanisms).

      • Edim, you write “it could be possible to model and predict these climate changes, complex or not (the exact physical mechanisms).”

        I agree, but the key word is “could”. Anything could happen with the models. However, nothing will change the fact the current models have not been validated.; and until a way is found to validate any new models, they will be just as useless as the current ones when it comes to esimating numeric values for anything.

  25. Camels, deserts, desserts. Do I have it right? The great attractor is catastrophic warming.

  26. Take a chaotic, complex system, for which the scientists themselves admit there are significant known unknowns, (like water vapor, clouds etc. that they can’t model), and for which there are clearly at least equally significant unknown unknowns (they don’t even really know how ice ages start and stop).

    You are responsible for deciding whether or not to disrupt the entire global economy by forcing the entire population to stop using affordable, available energy sources.

    Would you:

    1) Admit your state of knowledge is too sparse to adequately model the system and recommend no drastic action until you are able to figure out the known unknowns and at least discover the unknown unknowns.

    2) Construct a single model based on what you do know, even though you know it is woefully incomplete and cannot even hindcast with any accuracy; and recommend the world decarbonize on the basis of that single clearly false model.

    3) Construct numerous models, all of which have the flaws in 2, but claim that the mean of the ensemble of the models somehow compensates for your lack of knowledge of the underlying system, even though the ensemble mean is no better at hind casting or forecasting than any individual model; and recommend the world decarbonize on the argument that numerous clearly false models when taken together are somehow magically more accurate.

    If you would choose option 3, youuuuu might be a warmist.
    (with apologies to Jeff Foxworthy)

    • You are responsible for predicting an election.

      Would you:

      1) Take models that have a strong track record of accuracy, and no history of partisan bias, and consider them to be viable within the stated margin of error

      2) Dream up flaws with the models, based on no actual evidence but much wishful thinking, and claim that the models with a strong track record of accuracy and no history of partisan bias are in “skewed” to promote agenda and part of a conspiracy to rig the election.

      If you would choose option 2, youuuuu might be GaryM.

      • Blatant ad hominem.

      • No, that is an ad joshua, a put down with a typo.

      • Joshua

        If only what you have written- “Take models that have a strong track record of accuracy” – was true. In order to have a strong record of accuracy a model would need to demonstrate that it reasonably accurately matched observed conditions. Which GCMs do you believe had this prior to their being used?

    • AK and cap’n.

      The amazing thing about annoying pests is that, if you ignore them long enough, they really do go away. This gnat currently buzzing around my head will just keep buzzing if I, or others, pay it any attention at all. The parent of any 2 year old can tell you – any attention at all is better than no attention to a spoiled child who desperately craves attention, but has nothing of any importance to say.

    • Steven Mosher

      GaryM

      The mean of all models is better at hindcasting.

      • Steven –

        It struck me a couple of weeks ago – when max was constantly pushing his mean of climate sensitivity estimates by collecting and then averaging various estimates that resulted from completely different methodologies – that such a practice would be statistically invalid in that it would be creating estimates resulting from unrelated processes. That would also seem to me to be the difference between averaging multiple runs of the same model versus averaging the results of different models. Comparing one run of a given model to another run of that same model would seem to be meaningful. Looking at the run of one model compared to another model would seem to be useful in the sense of getting an idea of the range of possibilities – but not in the sense of deriving a meaningful average.

        Now I know nothing about statistics, and I am quite willing to accept that my lack of seeing the validity of averaging different types of models is due to my ignorance. But I still have a question as to whether you are saying that comparing across different models does, in fact, create a valid “mean,” or if you are saying that the models in question are similar enough in methodology that they are similar to comparing multiple runs of the same model?

      • Steven Mosher,

        The import of your comment depends in large part on what the definition of “better” is in this context.

        Are the results showing this better accuracy reported anywhere accessible (and intelligible) to the, shall we say, non-scientific?

      • Chief Hydrologist

        ‘In each of these model–ensemble comparison studies, there are important but difficult questions: How well selected are the models for their plausibility? How much of the ensemble spread is reducible by further model improvements? How well can the spread can be explained by analysis of model differences? How much is irreducible imprecision in an AOS?

        Simplistically, despite the opportunistic assemblage of the various AOS model ensembles, we can view the spreads in their results as upper bounds on their irreducible imprecision. Optimistically, we might think this upper bound is a substantial overestimate because AOS models are evolving and improving. Pessimistically, we can worry that the ensembles contain insufficient samples of possible plausible models, so the spreads may underestimate the true level of irreducible imprecision (cf., ref. 23). Realistically, we do not yet know how to make this assessment with confidence.’ http://www.pnas.org/content/104/21/8709.long

        If there is no true spread the mean is meaningless.

      • Steven Mosher

        what I’m suggesting is a thourough going pragmatic approach.

        Lets start by a parallel.

        I have a model of Apple’s performance as a company. lets say it considers revenues and costs and past performance.
        Lets say you have a model of Apples performance as a company.
        lets say it looks at totally different things, like a techical analysis
        of the chart.
        lets suppose other people have models, say some have insider information..

        So we all have different models from highly mathematical to gut feels.

        Now we have a series of questions

        1. can we average them? Sure
        2. is averaging them ‘valid’ good question
        3. does averaging them work? easy to test.

        So, the debate is centered on #2. is averaging them valid?

        I put the question differently; If averaging them works better to forecast than not averaging them or works better than selecting the “best”model, then of course you would average them. And what I’m suggesting by my example is that there is may be something akin to the wisdom of markets at work. Of course this could be improved by punishing those who have bad forecasts.. that leads to a weighting approach as Judith outlines.

      • “If averaging them works better to forecast than not averaging them or works better than selecting the “best”model, then of course you would average them. And what I’m suggesting by my example is that there is may be something akin to the wisdom of markets at work.”

        That is not how markets work at all. Markets work regardless of the lack of information of any individual participant, because that knowledge IS held by other individuals somewhere in the marketplace. The value of wood to a pencil maker for instance. The knowledge is disperse, but it is known. The market is just the means it is communicated, often without the participants having a clue why.

        In the case of models, particularly climate models, the knowledge necessary to come to the right (OK, sufficiently useful) result may simply be unknown. No number of models, no method of statistical analysis, can change that fundamental lack of knowledge.

        Not to mention, the stakes are somewhat different between deciding whether to invest in Apple stock, or to deconstruct the entire global economy. Any time the CAGWers want to drop the push for decarbonization, let us know and you all can set all the rules for debate, including over the”usefulness” of climate models, you want.

      • Chief Hydrologist

        ‘It is also the case that model-specific biases, both in the mean state and in the internal variability, lead to under-dispersion in the ensemble. This has led to the use of multi-model ensembles in which the differing model-specific biases allow the forecast phase space to be sampled more completely with therefore greater reliability in the ensemble prediction system [17]. However, it has to be recognized that, compared with the stochastic parametrization approach, the multi-model ensemble is a rather ‘ad hoc’ concept and, as discussed below, is dependent on those models that happen to be available at the time of forecasts.’

        Exploring bias in the models in an ad hoc approach?

        ‘Uncertainty in climate-change projections3 has traditionally been assessed using multi-model ensembles of the type shown in figure 9, essentially an ‘ensemble of opportunity’. The strength of this approach is that each model differs substantially in its structural assumptions and each has been extensively tested. The credibility of its projection is derived from evaluation of its simulation of the current climate against a wide range of observations. However, there are also significant limitations to this approach. The ensemble has not been designed to test the range of possible outcomes. Its size is too small (typically 10–20 members) to give robust estimates of the most likely changes and associated uncertainties and therefore it is hard to use in risk assessments.’

        http://rsta.royalsocietypublishing.org/content/369/1956/4751.full

        Nonlinearity – which is central to climate models – poses different problems. The fact remains that any member of a multi-model ensemble is subjectively chosen from a range of possible outcomes – and we get essentially to human bias that can’t be controlled for.

        ‘Of course, models can be formulated that eschew these practices. They are mathematically safer to use, but they are less plausibly similar to nature, with suppressed intrinsic variability, important missing effects, and excessive mixing and dissipation rates.

        AOS models are therefore to be judged by their degree of plausibility, not whether they are correct or best. This perspective extends to the component discrete algorithms, parameterizations, and coupling breadth: There are better or worse choices (some seemingly satisfactory for their purpose or others needing repair) but not correct or best ones. The bases for judging are a priori formulation, representing the relevant natural processes and choosing the discrete algorithms, and a posteriori solution behavior.’ James McWilliams

        There are two methods. The multi-model ensemble – which is based on choosing a solutions subjectively – and the perturbed model approach that is theoretically better at exploring the range of feasible solutions but with model deficiencies and statistical methodologies in development.

        ‘Atmospheric and oceanic computational simulation models often successfully depict chaotic space–time patterns, flow phenomena, dynamical balances, and equilibrium distributions that mimic nature. This success is accomplished through necessary but nonunique choices for discrete algorithms, parameterizations, and coupled contributing processes that introduce structural instability into the model. Therefore, we should expect a degree of irreducible imprecision in quantitative correspondences with nature, even with plausibly formulated models and careful calibration (tuning) to several empirical measures. Where precision is an issue (e.g., in a climate forecast), only simulation ensembles made across systematically designed model families allow an estimate of the level of relevant irreducible imprecision.’ http://www.pnas.org/content/104/21/8709.long

  27. David L. Hagen

    curryja
    Compliments on learning via reforecasting etc. to remarkably improved forecasts.
    How can that be applied to improving climate forecasts over IPCC’s hot, hotter and very hot forecasts?

  28. Steven Mosher

    Here is an interesting parallel case that is even more extreme

    Forecasting september ice extent

    http://www.arcus.org/search/seaiceoutlook/2012/summary

    Every year multiple groups using different methodologies attempt to forecast sept ice

    The methods are not even comparable: statistical; physics modelling; heuristics, and a combination of these.

    Some come with uncertainty others do not.

  29. R. Gates, Skeptical Warmist, etc.

    Excellent post Dr. Curry. I’ll look forward to Part II. Considering the very different intended functions of climate models and weather forecast models, it seems a bit unfair to even speak about ensembles for each in the same way. If an esemble of climate models are correct over some longer period it would be out of luck, as too much natural variability should have sent them way off course of actual climate as it unfolds. Models can be wrong about specifics but still right about the underlying dynamics. If an esemble of weather forecasts is continually off the mark over the short term time frames they usually are intended for, you can look for some serious problems with the dynamics of the models involved.

  30. Willis Eschenbach

    Steven (Mosher), thanks for your comments. I’d said:

    “As you point out, some don’t use volcanic forcings at all. This is not a small difference. The exclusion of the cooling volcanoes from the GISS forcing increases the average forcing by about 40% … To me, this is already “vastly different” from those that don’t, although YMMV.”

    You replied

    1. That’s not accurate. you should actually look at results with and without volcanic forcing. Its part of the experimental design.

    I fear that’s unresponsive, because I was talking about the difference in the forcings. For that, looking at the results as you recommend is uninformative.

    Willis, I’m well aware that the instructions permit some latitude in forcings.

    Your claim was that the forcings were vastly different.

    That is a TESTABLE claim.

    As to whether the forcings are “vastly” different, that’s not TESTABLE, and likely you’re right, it’s an inaccurate or vague overstatement. So let me put some more numbers on it, that’s testable. You agree that not all models use volcanic forcings. If you remove the volcanic forcings from the CMIP5 exemplar forcings dataset, the trend of the 20th century forcing increases by about 25%. Vast? Yeah, you’re right, likely not. But significant? Surely.

    Do bear in mind, however, that my statement was in response to your claim that

    They do not have ‘vastly different’ forcings. For CMIP 5 for example the forcing data is very tightly constrained.

    You said the forcings were “tightly constrained” by the CMIP5. They are not—in fact, forcings aren’t constrained by the CMIP5 at all. The concentrations and the emissions are constrained. The dataset I just cited says in capital letters so folks don’t miss it:

    NOTE: THIS FORCING DATASET IS NOT CMIP5 RECOMMENDATION, AS CONCENTRATIONS, NOT FORCING, SHALL BE PRESCRIBED IN MAIN CMIP5 RUNS.

    In addition, for those models that do use volcanic forcing, there is no agreed upon dataset. Modelers are free to choose at will. The exemplar CMIP5 forcing cited above gives -1.4 and -0.9 W/m2 as peak forcings for Pinatubo and El Chichon respectively. The GISS dataset gives -2.8 and -1.7 W/m2 for the same two. The Crowley dataset gives -3.7 and -3.1 W/m2 for the two eruptions. That’s about a three-to-one difference from highest to lowest … and the modelers are free to choose. Since volcanoes are the third largest of the forcings (after GHGs and aerosols), this allows wide variation in the total forcing.

    As a result, your following statement is incorrect:

    you do not establish your claim by pointing to a document that ALLOWS differences. you establish that claim by actually getting the forcing files from the modelers.

    I was not looking to establish my claim. I was simply responding to your erroneous idea that the forcings were “tightly constrained” by the CMIP. When you say the forcings are tightly constrained, a document that ALLOWS differences of all kinds falsifies your claim. And certainly the CMIP statement in capital letters falsifies your claim.

    You continue:

    you need to check out the metadata. Then do the comparisons.

    When you do that then you can come back and explain in using Watts how vastly different the forcings are.

    Hint: your ability to model the models as a simple function should tell you something about the underlying forcings being very similar

    Actually, my ability to model the models as a simple function says nothing about the similarity of the underlying forcings used by the different modeling groups. To the contrary, I have pointed out that the models’ climate sensitivity “lambda” is simply a constant times the trend ratio, i.e the trend of the particular forcings used divided by the trend of the model response. The trends of the forcings are different in various models, sometimes significantly so. My simple model doesn’t mean that the forcings are similar, that’s a misunderstanding of my results.

    Regarding the difference in watts of the forcings, see my various calculations above.

    Finally, my thanks for the detail and clarity of your comments, much appreciated.

    All the best,

    w.

  31. Willis Eschenbach has done a interesting analysis that seems to show that the outputs of the models have gradually converge in one way: the outputs track the inputs lagged. What happens inside the models does not appears to be linear transformations. Willis claims this is true about both the ensembles and the individual models.

    http://wattsupwiththat.com/2013/06/25/the-thousand-year-model/

    If the GCMs are equivalent to time-series models of the ARIMA form then one condition is stationarity. This is addressed by an econometric technique called polynomial cointegration analysis.

    An Israeli group carried out such an analysis of the inputs (GHG, temperature and solar irradiance data) and concluded,

    “We have shown that anthropogenic forcings do not polynomially cointegrate with global temperature and solar irradiance. Therefore, data for 1880–2007 do not support the anthropogenic interpretation of global warming during this period.”

    Beenstock, Reingewertz, and Paldor, Polynomial cointegration tests of anthropogenic impact on global warming, Earth Syst. Dynam. Discuss., 3, 561–596, 2012.

    URL: http://www.earth-syst-dynam-discuss.net/3/561/2012/esdd-3-561-2012.html.

  32. Sorry Willis,

    Correction: “What happens inside the models appears to be linear transformation.”

    My original said, “…does not appear…” which was unintended..

  33. Just to knock off some of the Moshpit meanderings that distract us all from reality:

    There is nothing wrong with averaging models but there is a great deal wrong with calculating standard deviations. To do that you need to randomise inputs, use a lot of runs then test that until you prove a normal distribution.

    Testing against ‘reality’ is what models are all about. If you can simplify some of the physics to get there then great but you still have to test the result against what happens in real life. If you hit the target then great. If you don’t then that needs to be reported and fixed, not hidden under the carpet or explained away by error bars. The guidance system metaphor is indeed apt; climate models are very much like the Patriot missile system which had a claimed success rate of near 100% and an actual success rate near zero. If Mosher was involved in that it would explain a lot.

    Ultimately model ensembles work because the individual models are not that bad in the first place (ONLY for forecasts of a few days out) and they are not that bad because they are tested and fixed until they become that way. An ensemble then makes sense; not due to any statistics whatsoever but by least squares smoothing which somehow just seems to give more accurate answers and we are not that sure yet why.

    Climate model ensembles are initially weeded out to remove any runs that produce cooling. If they did not do that then they might indeed get closer to reality; and farther away from alarmism. Alas the current efforts give us only a mathematical cloak around conclusions that derive directly from the initial assumptions, which are flawed. The only models that are remotely close to reality are those that do not have positive feedback.

  34. Some useful things you can do with ensembles of model calculations with one computational model:
    1. Determine model sensitivity to initial conditions. By randomly varying initial conditions within the limits of possible error and following model calculations, you can determine model sensitivity to input uncertainty. This exercise can help determine the usefulness of calculations with a model.
    2, Determine model sensitivity to model parameters. By randomly varying model parameters within the limits of possible error and following model calculations (for fixed initial conditions), you can determine model sensitivity to the unknowns of its parameters. This exercise can determine sensitivity of the model to model parameter unknowns.

    By calculating these “model partial derivatives” you might also see whether the sensitivities will allow parameter identification by trying to fit real world data.

    I’ve not seen a single attempt to perform these calculations with climate models.

  35. Judith, great post from someone whose business reputation depends on practicle results. We do not live on a perfect world. Both RGB and Briggs are ‘right’ from narrow perspectives, but talk pst each other and miss your main point. RGB is rigorously correct that to compute the variance of an ensemble of models which by definition does not obey the central limit theorem and all that follows from it is nonsense. IPCC does commit this ‘sin’. Briggs is right that you can still derive useful information in a less rigorous sense (perhaps Baysian philosophically, but not rigorously).

    “All models are wrong. Some are useful.” You simply describe how in practice, you make the best of what you have. That will be far less certain than the IPCC false impression of consensus certainty, but far more useful information than those who assert CO2 does absolutely nothing. Both positions are now provably wrong. The evidence seems to point to a significant model hot bias for reasons posted elsewhere and in my book, an under appreciation of natural variability (with deliberately misleading attempts to suppress past natural variability by such as Mann, exposed by such as Tonyb’s historical research), and an ECS perhaps half of the IPCC consensus, which of course says CAGW actions should be halted and/ or reconsidered.
    What the above comments to your post show is how difficult it is to maintain the reasoned middle ground in the midst of a polarized, politicized debate.
    Thank you for continuing to do so.

    • “PCC does commit this ‘sin’.”
      I don’t know if it’s a sin. But I was never able to get RGB or anyone to show where they do it in AR5.

    • “Briggs is right that you can still derive useful information in a less rigorous sense (perhaps Baysian philosophically, but not rigorously). “

      Using statistical terminology in a less rigorous context requires concurrent detailed explanation of terms as used. Failure to do so can lead, and I suspect and many cases do lead, to misunderstanding and/or deception. Many people see statistical terms bandied about–e.g., standard deviation, standard error, confidence interval–and jump to inference(s) which may or may not be appropriate for the calculations presented.

      Given the stakes involved in the climate debate it would seem that clarity and detail are paramount–regardless of the camp presenting the argument.

    • Rud,
      If they implement the physics the same and are only different because of initialization, you can use them to explore initialization parameters, this you can average. But, I don’t get that the physics is implemented identically, they’re not an exploration of input parameters, they’re an exploration of the assumptions, biases if you will of the developer. And when the assumption generates results that are known to be outside actual measurements, they need culled. If they had some redeeming feature, those assumptions should be used in other models that do resemble measurements. But averaging them together does no good.

      “That will be far less certain than the IPCC false impression of consensus certainty, but far more useful information than those who assert CO2 does absolutely nothing.”
      This depends on if measurements shows it does nothing, temperature anomalies are a red herring.
      And yes I think the difference between daily warming and nightly cooling does mean something more, and it’s not saying Co2 is driving anything.

  36. Pingback: How should we interpret an ensemble of models? Part II: Climate models | Climate Etc.

  37. Pingback: Weekly Climate and Energy News Roundup | Watts Up With That?