Waving the Italian flag. Part I: uncertainty and pedigree

by Judith Curry

The Italian flag (IF) is a representation of three-valued logic in which evidence for a proposition is represented as green, evidence against is represented as red, and residual uncertainty is represented as white.  The white area reflects uncommitted belief, which can be associated with uncertainty in evidence or unknowns.

The IF was introduced and applied previously on the hurricanedoubt, and detection and attribution threads. The IF was used on these threads as an heuristic device to enable understanding of the role of uncertainty in scientific problems where there are “conflicting certainties” and expert assessments of confidence levels.

These applications of the IF have engendered much confusion in the climate blogosphere.  In the interests of further developing ideas for applying the IF to aspects of the climate problem, I am devoting a two part series to the IF.

If you intend to follow this closely, you need to download:

I refer to numerous figures from these papers (note: if you don’t like homework, just download these to refer to the figures).

The IPCC’s logic for evidential judgments

Before describing the IF, I provide context regarding the IPCC’s framework. The IPCC’s logic for evidential judgments is described by Moss and Schneider.  As summarized by Morgan et al.:

Guidance developed by Moss and Schneider (2000) for the IPCC on dealing with uncertainty describes two key attributes that they argue are important in any judgment about climate change: the amount of evidence available to support the judgment being made and the degree of consensus within the scientific community about that judgment.

In Fig 4. of Moss and Schneider, the judgment has two dimensions:  on the x-axis is the amount of evidence (e.g. model output, observations) and on the y-axis is the level of agreement/ consensus.  On this chart, four boxes are delineated that are referred to as “state of knowledge” descriptors:

• Well-established (high amount of evidence, high consensus): models incorporate known processes; observations largely consistent with models for important variables; or multiple lines of evidence support the finding)

• Established but Incomplete (low amount of evidence, high consensus): models incorporate most known processes, although some parameterizations may not be well tested; observations are somewhat consistent with theoretical or model results but incomplete; current empirical estimates are well founded, but the possibility of changes in governing processes over time is considerable; or only one or a few lines of evidence support the finding

• Competing Explanations (high amount of evidence, low consensus): different model representations account for different aspects of observations or evidence, or incorporate different aspects of key processes, leading to competing explanations

• Speculative (low amount of evidence, low consensus): conceptually plausible ideas that haven’t received much attention in the literature or that are laced with difficult to reduce uncertainties or have few available observational tests
The Moss and Schneider document also includes this recommendation:

6. Prepare a “traceable account” of how the [uncertainty] estimates were constructed that describes the writing team’s reasons for adopting a particular probability distribution, including important lines of evidence used, standards of evidence applied, approaches to combining/reconciling multiple lines of evidence, explicit explanations of methods for aggregation, and critical uncertainties. In constructing the composite distributions, it is important to include a “traceable account” of how the estimates were constructed.

Good recommendation.  Unfortunately the IPCC hasn’t heeded it.

Three-value logic

The following description is from the Tesla document:

Evidential judgments based on classical probability theory follow two-value logic, whereby evidence must either be in favour of a hypothesis, or against it.  [That is,] evidence for and against are treated as complementary concepts (i.e. p(A) + p(not A) =1, where p(A) is the probability of event A occurring, or in other words the evidence supporting the occurrence of A.)

Three-value logic extends this to allow for a measure of uncertainty as well, recognizing that belief in a proposition may be only partial and that some level of belief concerning the meaning of the evidence may be assigned to an uncommitted state.  Uncertainties are handled as “intervals” that enable the admission of a general level of uncertainty, providing a recognition that information may be incomplete and possibly inconsistent (i.e. evidence for + evidence against + uncertainty = 1).  This is represented visually by the “Italian flag”, in which evidence for a proposition is represented as green, evidence against as red, and residual uncertainty is white. . . As an alternative to the Italian flag representation, the values may be simply represented in the triplet form [evidence for, uncertainty, evidence against].

The overall assessment of degree of belief in the evidence, b(E), needs to take into account the net value of the evidence that exists, n(E), and the estimated uncertainty due to lack of knowledge, k(E).  Thus b(E) = n(E) k(E).   The residual uncertainty [is] given by 1 – (b(E) + b(notE)). A parallel analysis is then conducted for b(not E).  With the three valued formalism, evidence for and evidence against can be evaluated independently, each ranging from 0 to 1, with uncertainty taking a value from -1 to 1.  An uncertainty of 1 implies that there is no evidence at all on which to base a judgment, whereas a negative value indicates a situation in which the evidence appears to be in conflict.  [W]here evidence is in conflict (say, for example, [0.65, -.32, 0.67], [the white portion of the IF] is indicated by a yellow central bar.

The uncertainty due to lack of knowledge, k(E), is determined (using expert jugment) as the ratio of the information you actually have to the information you would ideally wish to have in order to be confident in the judgment.

The net value of the evidence, n(E), is determined as a function of the face value of the evidence and the confidence in the evidence (Figure 5 in the Tesla document).  Figure 5 in Tesla is similar to Figure 4 in Moss and Schneider, if we equate consensus/agreement with confidence and face value of evidence with amount of evidence, and hence the IPCC judgment could be interpreted as       b(E) = n(E).  Equating the labels in these two diagrams is not entirely appropriate, given the verbal descriptions for the four boxes in Figure 4 of Moss and Schneider. The IF expression for belief in the evidence is

b(E) = n(E) k(E) = 1 – b(not E) – residual uncertainty

Relative to the IPCC method that considers the amount of evidence and the consensus regarding this evidence, the IF method explicitly (and traceably) includes the estimated uncertainty due to lack of knowledge, the incompleteness of the information, and belief in competing hypotheses.

Assessing uncertainty

With regards to the “white” portion of the  Italian flag, the Tesla document describes the residual uncertainty in the following way:

There are many potential contributions to residual uncertainty in the treatment of evidence; in essence, the assignment of a level of belief to an uncommitted state (i.e. neither for nor against) ought to reflect “anything we are not sure of.”  This incorporates not only the awareness that exists in relation to uncertainties in the system under review and its behaviour, but also a measure of degree of belief in that understanding.

The assessment of uncertainty is not straightforward.  As a reminder, consider the uncertainty lexicon on the previous uncertainty monster thread and also the section on climate model imperfections on what can we learn from climate models thread.   Figure 3 in Refsgaard et al. provides a useful summary of uncertainty taxonomy.

Refsgaard et al. provide a framework and guidance for assessing and characterizing uncertainty in the context of environmental modeling. Refsgaard et al. review 14 different  methods commonly used in uncertainty assessment and characterization, 12 of which are relevant for type of problems relevant here (IPCC WG1): data uncertainty engine (DUE), error propagation equations, expert elicitation, inverse modelling (parameter estimation), inverse modelling, Monte Carlo analysis, multiple model simulation, NUSAP, quality assurance, scenario analysis, sensitivity analysis, and uncertainty matrix.

Table 5 of Refsgaard et al. categorizes the different methods according to their utility for the following:

  • Methods for preliminary identification and characterization of sources of uncertainty
  • Methods to assess the levels of uncertainty for the various sources of uncertainty
  • Methods to propagate uncertainty through models
  • Methods to trace and rank sources of uncertainty

The focus here is on qualitive methods for identification and characterization of uncertainty and assessment of levels of uncertainty.  Propagation of uncertainty is the topic of Part II in this series.

The uncertainty matrix (Table 1 in Refsgaard et al.)  can be used to provide an overview of the various sources of uncertainty.  The vertical axis lists the locations or sources of uncertainty while the horizontal axis covers the level and nature of uncertainty for each uncertainty location.

Funtowicz and Ravetz (1990) introduced the NUSAP system for multidimensional uncertainty analysis.  The NUSAP acronym stands for numeral, unit, spread, assessment, pedigree.  NUSAP combines quantitative analysis (numeral, unit, spread) with expert judgment of reliability (assessment) and the reliability of the knowledge base (pedigree).

As described by Refsgaard et al.,

The strength of NUSAP is its integration of quantitative and qualitative uncertainty. It can be used on different levels of comprehensiveness: from a ‘back of the envelope’ sketch based on self elicitation to a comprehensive and sophisticated procedure involving structured, informed, in-depth group discussions on a parameter by parameter format. The key limitation is that the scoring of pedigree criteria is to a large extent based on subjective judgements. Therefore, outcomes may be sensitive to the selection of experts.

Pedigree

Funtowicz and Ravetz’s concept of pedigree  is described in the Tesla document, whereby pedigree relates to the origin and trustworthiness of the knowledge.   Pedigree is evaluated in a chart (Figure 6 in the Tesla document), whereby the columns are quality indicators that include theoretical basis, scientific method, auditability, calibration, validation, and objectivity.  The rows describe quality scores, ranking from very low to very high.

Objectivity is described by the Tesla document as:

Whilst the scientific method provides a logical framework for improving understanding it does not guarantee objectivity.  The influence of entrenched values, motivational bias and peer and institutional pressures may obscure true objectivity.  In order to maintain a check on the quality and objectivity of our interpretations we rely on peer review and expeosure to critique through peer reviewed publication.  This indidcator is used to give a judgment on the extent to which information can be said to be objective and free from bias.

Why use the Italian flag?

As described by the Tesla document:

Moreover, whilst there may be a large volume of information relating to the [hypothesis] at hand, it may on the whole be only of partial relevance, incomplete and/or uncertain, or even conflicting in terms of the level of support it provides for a given interpreation.  The range of available evidence may appear to give an indistinct picture, with no clear indication of how best to arget resources in order to improve understanding.  There may be disputed interpretations, pehaps because some practitioners appear to be biased by excessive reliance on a particular source of evidence in the face of contradictory, or seemingly more equivocal, evidence from elsewhere.  Hence, in order to provide a justified interpretation of the  available evidence, which can be audit-traced from start to fiish, it is necessary to examine and make visible judgments on both the quality of the data and the quality of the interpretation and modelling process.

Returning to the issue of the IPCC’s statement in the AR4 regarding attribution of 20th century warming:

Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.

which was discussed previously on the detection and attribution threads.  While much evidence is presented in the AR4, there is an absence of traceability of the uncertainty analysis that makes the statement ambiguous and diminishes the defensibility of the statement other than by a “consensus.”

Reasoning about uncertainty in the presence of substantial amounts of often conflicting information with varying levels of quality and reliability will be the topic of Part II.

95 responses to “Waving the Italian flag. Part I: uncertainty and pedigree

  1. Kind of a narrow “indidcator” (“This indidcator is used “). Peer review and publication are apparently unsuited to handle contentious issues, because the reviewers and publications can be fairly readily co-opted by groupthink, not to mention money-think.

    • If the system was done correctly and the science thoroughly reviewed, there would be very little left in the magazine to publish(except advertisers and sponsors).

  2. The AGW flag had no white!

  3. While much evidence is presented in the AR4, there is an absence of traceability of the uncertainty analysis that makes the statement ambiguous and diminishes the defensibility of the statement other than by a “consensus.”

    And this is by no means the only problem. While much evidence is presented in AR4, much more evidence is not. The exclusion of evidence which goes against the ‘consensus’ POV is what makes AR4 irretrievably flawed and useless as a basis for policy decisions. It is a consensus of those who already agree. This is one of the circularities the entire document is riddled with.

    A true assessment of the state of climate science would look like a bedsheet with red and green edges.

    Not that we should be dismayed by that fact. Understanding how the Earth’s climates are altered by the multitude of factors affecting them is the ost interesting of puzzles, and we shouldn’t expect to be able to come up with the answers after a mere 20 years of effort. Look how long it took for the formulation of a single law of physics, gravitation, to get sorted out from Galileo’s tower of Pisa experiment to Newton’s apple bouncing insight.

  4. Dr. Curry, are you endorsing the Tesla model? It does not seem to include the existence of alternative hypotheses, which is often the major source of uncertainty in science. Based solely on your quoted material, tn Tesla the uncertainty seems to be exclusively in regard to the evidence for or against the hypothesis being evaluated. Evidence for or against a competing hypothesis might be entirely different evidence.

    There is also the common case where there is just very little evidence either way. (Perhaps I need to read the document.)

    • David, Tesla explicitly addresses the alternative hypothesis, even suggests that the arguments be made by two different people/groups. Re Tesla, I think steps 1 and 2 make a lot of sense for the kinds of problems involved in climate. when there is little evidence either way, you end up with a big white part of the flag. the interesting case is when you have strong evidence for both hypotheses (say, AGW and natural variability), with the sum of everything exceeding 1, which is a case of conflicting hypotheses.

      • If the hypothesis being tested is;

        Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.

        Then although there is lots of ‘strong evidence’ for various parts of climate science, it is the claim that these ‘sub hypotheses’ lead to a 90% certainty for the main hypothesis that really needs assessing. This is the place where a lot of people are suffering snowblindness, and getting green flashes in front of their eyes.

        So many of the papers we read use the words “could” “might” and “may”, and a good many more that apply theoretical physics to the actual climate system as if it was in equilibrium until man set fire to coal which don’t use these words, ought to.

      • Indeed. Perhaps the most interesting question is how the IF’s of detailed (or “sub”) hypotheses should combine to determine the IF of a more general hypothesis that depends upon these details (or premises). At the most detailed level, every measurement is an estimate, with its own uncertainty. For example, in climate science many issues relate to the accuracy of estimated data, especially global temperatures. How do we “flag” these nested uncertainties?

      • “How do we “flag” these nested uncertainties?”

        Thank you David, for neatly defining the question my objections circle around.

      • The challenge for me is how to fit the IF into the issue tree. We are dealing with both sub-hypotheses and counter arguments. The strongest single piece of evidence for the IPCC “mostly human warming” statement is the claim that the models can only replicate the warming using the GHG increases, which are posited to be of human cause. The three primary sub-hypotheses seem to be (1) modeling natural variability (without the GHG increases) does not work, (2) modeling with the GHG increases does work, and (3) the GHG increases are of human origin or cause. So we need an IF for each of these. So far so good.

        The leading counter argument for (1) is that we do not understand natural variability, but we know it occurs, so (1) is an unfounded argument from ignorance. This seems like an argument for a large white area in IF1, as opposed to being counter evidence.

        On the other hand a leading counter argument for (2) is that other model runs with the same parameters do not match the temperature profile. This argument seems more like counter evidence against (2) than uncertainty. There is also the argument that the temperature profile they are trying to match is incorrect, which is clearly an argument for uncertainty.

        The leading counter argument for (3) is that the CO2 increases are caused by the temperature increases, not human emissions. This looks like counter evidence against (3), not uncertainty.

        Note however that both white uncertainty and red counter evidence reduce the overall level of certainty.

      • Yes, the issue tree is key, you have to break it down into many more pieces than I did on the detection and attribution thread. Then you need to formulate the counter argument, which would be that 20th century warming can be explained by natural variability. Two candidates for this are the multidecadal ocean oscillations (e.g. AMO, PDO) and solar, there are others.

      • To which the counter argument is that while there are many candidates there are no actual explanations. It is an interesting situation, non-classical but probably not unusual. One actual hypothesis (AGW) versus a set of possible hypotheses, none of which can presently be precisely formulated. I think this clearly rules out 90% certainty but others obviously differ.

      • On the solar subject, if you work out how much the TSI trend would have to be wrong for solar plus the terrestrial amplification found by Prof. Nir Shaviv http://sciencebits.com/calorimeter to be able to account for global warming, the answer is, not very much at all, and well within the uncertainty of TSI measurement.

        I think this is a strong candidate for an alternative hypothesis.

      • In principle each candidate could be explored, including the evidence and counter evidence, arguments and counter arguments, etc. This is precisely what the IPCC does not do. There are thousands of components in this argument (1) alone.

      • The other issue here is the unknowns.

        As we are still discovering factors that influence the climate it is very difficult to attribute a ’cause’ to the recent warming to any level of significance.

        If you cannot even define all the possible natural candidates, you certainly cannot put forward the theory that they do not matter (i.e. that co2 is overwhelmingly the cause of the recent warming).

        This is one of the major fallacies of the cAGW theory as i see it.

    • As someone well versed with numerous works of Nikola Tesla, and as student spent hours and hours in his museum, I think it is objectionable that some ‘dubious procedure’ should exploit the Tesla’s name. Name of this great scientist should be respected, and those who hope to ‘cash in’ even using it as acronym, should be ignored.

  5. Alexander Harvey

    Judith,

    I fear that you are on a long old road and I hope you get to the end of it.

    I think that there are bones that need picking over in the application of statistical methods of inference. Whether this is the right way or not I could not say.

    Anyway good luck

    Alex

  6. Dear Prof. Curry,
    All those methods to assess uncertainty might be tools that support decision making. So it might be a device for political decision makers.
    However, they are lacking scientific substance, since one cannot refute the outcome of the algorithm based on experiments.
    This needs to be clearly communicated.
    Best regards
    Günter

    • This is true of induction generally. The weight of evidence is subjective, usually determined by voting, polling or some such procedure. But these procedures can be tested for accuracy, depending on the group specified as the base of opinion. The IPCC estimates are probably very high, unless the base group is a set of AGW proponents, in which case they may be accurate.

      What needs to be understood and communicated is that the level of certainty is basically a psychological result, not a fact of logic.

    • My first thought on reading through the homework and Judith’s commentary was that it wasn’t a long way from some of the tools and techniques used routinely in commercial Management Consultancy assignments for the last twenty or thirty years.

      A bit more formalised perhaps – with impressive looking equations as befits something aimed at academia – but pretty much BAU otherwise.

      Disturbing to see that it comes as new news to some.

  7. I am reminded of Stalin’s (attributed) comment that “Those who cast the votes decide nothing. Those who count the votes decide everything.” To me, the peer review process has been called into question by climategate. The devil will be in the details of who decides what goes where, and as Tallbloke says, what doesn’t. I think it is worth asking how to best ensure objectivity in the process when motive is suspect, which might involve an adversarial process of some sort. It seems like examination of something like treaty creation processes might be useful.

    • “might involve an adversarial process of some sort.”

      The due diligence sector of the legal profession will be rubbing its collective hands together with glee right about now. On a personal note, I have been saving my flight ticket receipts for the last few years, and a spreadsheet of the ‘green fuel levy’ increases. The small claims court will be busy.

    • So those scandals involving geek heinousness just flew right by your oblivious geekiness?

      Geeky odd.

  8. OT /
    Dr. Curry
    You indicated that you may look into the ‘Arctic factor’.
    In this short note I put together two apparently unrelated events. This may be irrelevant, but I think it is still worth a look.
    http://www.vukcevic.talktalk.net/ds.htm
    Thank you.

  9. Judith,

    No one has asked why the ARGO program will only give out processed data and absoluely refuses to give out raw data.
    Sea surface temperatures do not give the actual temperatures of the oceans.

    • This really belongs on the raising the game II thread.
      Not just the surface temperatures, but the temperatures at depth. Argo tells us about ocean heat content.

      Why are we not allowed to see the results we paid for?

  10. Dr. Curry, please comment on how the quality control issue is dealt with in more detail. IOW, how is it confidently claimed that something is reliably green, white or red?

    • It does require expert judgment, but it breaks the whole thing down into a number of smaller judgments that are more easily objectified. Breaking a hypothesis down into component parts is discussed in Part II. Looking at the quality/pedigree of the evidence, and then identifying the sources of uncertainty is information that is to be included.

      • The polling science people deal with this sub-issue issue a lot. For example a lot of people when polled will assent to feel-good policy questions but when asked about specific actions they disagree. The challenge is how do we “break down” the science into component parts? The logic of science may not be that well understood. Then how do we combine the components? Can a hypothesis have less uncertainty than its parts? It seems likely that premises and assumptions differ from multiple lines of evidence in this regard. That is, there are different kinds of component parts.

        In any case we may get some good polling questions out of this, which would be helpful. Most of the climate polling is awful precisely because it ignores uncertainty.

      • Yes, by breaking this down, we can see what the premises and assumptions are, including the necessity and relative importance of the components for the overall hypothesis. There is no particularly objective way to combine all this, but it is far better to have the expert judgments made at the level of relatively unambiguous premises, and then combine these in some way, rather than doing the expert judgment at the level of the hypothesis, which is subject to all sorts of crazy mental models and biases especially if the hypothesis regards a complex system.

      • Yes, in general the value of decision tools is in the exercise of systematically thinking through the basis for the decision, not in the numbers. In your original IF post you made a start on the premises, but had to defend the flag as it were. The evidence will be equally complex. Warming per se is not evidence for AGW, given the alternative of natural variability. I think the specific evidence for the IPCC statement is rather slim and mostly model based.

        But is the goal of this thread to get clear about the IFA (Italian flag analysis) method? I never had a problem with it as a general concept. Where are the critics?

      • well, see here and here.

        To me, the IF idea is pretty simple to understand. The main new aspect i introduce here is NUSAP, which is a way to categorize and assess uncertainty (including pedigree) in complicated multi premise problems and models, which has been widely applied in water resources and other environmental problems.

      • yes it has spread like wildfire, but most people just cite Tobis, Annan.

      • There’s also more here:

        http://initforthegold.blogspot.com/2010/10/willard-on-curry.html

        I know you said you have not understood any of it. But I urge you to reconsider: amid the tomfoolery (pun intended) emerges an honest discussion.

        Please call Michael ;-)

      • williard, I find you incoherent too but I thought it was deliberate, like James Joyce.

      • Thank you for your constructive comment, David Wojick. What I say there is easier to grasp than basic knowledge of epistemology or modal logic, so easy that one could very well explain your attitude as selective obliviousness.

      • I am aware of the prior criticisms, but meant where are the critics here on this blog? I don’t see a single criticism of the IF on this thread, so far anyway. Nothing to defend. Ho hum.

      • Judith Curry’s aims of opening up a civilised dialogue to which all are welcome is admirable. We can only hope those who currently skulking on other websites might come and contribute to the process of opening up climate science, and debate the substantive issues rather than passing comment among their own coterie on Judith’s motivation, competence, expertise and integrity.

      • Just a word quibble: “natural variability” suggests drunkard’s walk changes. I have seen no evidence or even serious effort to present evidence that current temperatures depart from the 10,000-yr. trends since the last Ice Age (discounting the Hockey Stick and all its brethren, of course). This is not “variability” in common parlance; it’s an ongoing process which should be the default assumption.

  11. Judith

    you say-

    “Returning to the issue of the IPCC’s statement in the AR4 regarding attribution of 20th century warming:

    Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.”

    Is this intended to be a scientific statement or a political statement? As far as I have been able to understand the debate on ‘temperatures’, we are using the wrong (or a less than scientific) metric. Does ‘the observed increase in global average temperatures since the mid-20th century ‘ tell us anything about changes in the Earth’s heat energy budget?

    There is also the issue of the relevance of ‘global average temperatures’ – see

    http://chiefio.wordpress.com/2010/11/26/small-tyrants-large-tyrannies/

    posted by Pascvaks on the skeptics make your best case thread.

    Does the IPCC statement have any scientific value?

    Regards Gary

  12. Dr Curry:
    With due respect, I think that your Italian flag analogy has more to do with a court room than with methods of science. Your flag deals with making decisions; science deals with observations, data, analysis and conclusions. Each of those steps must be the best possible, the observations and data accurate and as objective as possible, the analysis non-biased and the conclusions falling only within those supported by the data and not beyond. In a court of law science is used with varying degrees of success, to reach a decision: e.g., is a drug responsible for one or a number of deaths? Scientists can be used as expert witnesses. Their job and the ability of a judge or jury to reach a decision are difficult. Expert witnesses can vary in their skill and honesty. Some are simply “hired guns”. They vary in their ability to assess the evidence and so on. If they cease to be as objective as possible they have failed as an expert witness and also as a scientist.
    Perhaps courtrooms should have your Italian flag displayed so that expert witnesses an judges could see it!

    • Morley, there are many scientific issues that are not clear cut in a laboratory sense. For what “science” is, see Mike Zajko’s post and the ensuing discussion.

      There are definitely some analogies with a legal case (after all, we are looking at evidence). Lets clarify the evidence, its quality, the associated uncertainties, and also consider the opposing case.

      The legal analogy is judging the guilt of the defendant based on what you have read in the newspaper and your own background knowledge, without actually going through the process of weighing and debating the evidence.

      • Science may be based on observation and experiment but scientific reasoning is just as complex and difficult as legal reasoning. For example, much of the climate debate centers on the validity of observations and data. So the fact that the science is based on these does not simplify the science.

      • To repeat a prior comment on the analog with legal reasoning, there is an analog in the old 3 verdict legal trial system, which had verdicts of not guilty, guilty and not proven. Not proven is a form of uncertainty.

        The central principle is that in addition to evidence for and against a hypothesis, there can be evidence for uncertainty per se. If we quantify the strength of the evidence and sum across all three kinds then we get the Italian flag.

        There are basically three epistemic propositions that we are evaluating, for a given hypothesis, call it hypothesis A:
        1. We know that A is true. There can be evidence for this proposition.
        2. We know that A is false. There can be evidence for this proposition.
        3. We do not know if A is true or false. There can be evidence for this proposition, which is the uncertainty claim.

        Note however, that we do not include any of the direct evidence for or against A as part of the evidence of uncertainty. This is very different from a 2-valued (true/false) system in which evidence both ways is the usual measure of uncertainty.

        For example, if there is an alternative hypothesis B, that is not evidence against A directly, rather it is evidence that A is uncertain. In a trial the analog is having multiple suspects, such that none can be convicted.

        Likewise for, say, measurement problems with the evidence for A. These problems are not evidence for A, nor against A, rather do they increase our uncertainty regarding the truth or falsity of A. A legal analog is evidence that certain evidence has been planted.

        It seems clear that many of the issues in the climate debate, perhaps even most of them, are of this epistemic uncertainty sort. If so then the 3-valued modal logic of the Italian flag may be the correct model for capturing the dimension of uncertainty with AGW, as opposed to its simple truth and falseness.

      • David- with all due respect, your comment in this case really does not make sense. In a court of law, there is guilty and not guilty. A person is either convicted (guilty) or they are not convicted (not guilty). They are not found innocent….but not guilty.

        In the case of climate change there is a broad range of potential impact of human released GHG’s. Are they causing 50%, or “Most of the observed increase in global average temperatures” or a lesser percentage. IMO- that is an interesting scientific investigation, but really almost meaningless to policy implementation at a governmental level.

        Take two examples
        1. United States- What motivation is there for the US to spend wealth to reduce their GHG emissions by say 25%, when that expense can not be demonstrated to be of any benefit to the climate. I am assuming that total global emissions will continue to rise by at least 40% over the next 25 years even with US reductions.

        2. Developing nation- What is the motivation of a particular country to not provide electricity to their population, or to allow them access to personal transportation, or to even utilize a more expense energy source for the next 25 years.

      • Rob, as I indicated I was referring to another legal system, long ago is Scotland as I understand it, with 3 verdicts.

      • I’m not a native or up on the subject, but my understanding is that this is still current legal practice in Scotland, or was until very recently. Any locals able to clarify?

      • Latimer Alder

        ‘Not proven’ can still be a verdict under Scottish Law.

      • An excellent option, IMO. It would eliminate many “default” acquittals that now put malefactors back on the streets in two-valued jurisdictions.

      • That is to say, it would eliminate unwarranted acquittals which put malefactors back on the street with no chance of retrying when new evidence comes available. Much would or does (I assume) depend on the details of how such cases are administered and assessed within the legal ministries, though.

  13. Dr. Curry,
    In your response of 11:51 am, you said:
    “The legal analogy is judging the guilt of the defendant based on what you have read in the newspaper and your own background knowledge, without actually going through the process of weighing and debating the evidence.”
    I don’t understand this. Would you please explain? Who is “you”, the scientist as expert witness or the judge or the jury or someone not associated with the trial?
    Morley

  14. Moss and Schneider get it wrong when they focus on amount of evidence and the level of consensus. The first criteria should be the QUALITY of the evidence. We can have one climate model or 100 climate models – but the number doesn’t make any difference when all of these models have the same weaknesses. We can have the Hockey Stick itself or a dozen reconstructions that agree with the Hockey Stick – but it doesn’t make any difference if all of the imitators rely on mostly the same proxy data and make the same mistakes the North report found in Mann’s work (focus on RE statistic and ignore CE, no objective criteria for selecting reliable proxies by a process that eliminates cherry-picking and the introduction of bias, etc.). We can have many studies that provide weak statistical support for a conclusion – but the number of such studies is irrelevant unless they are derived from independent data sets. The first criteria MUST be the quality of the evidence, not the amount.

    The focus on consensus is also misplaced. The question should not be: “Do you think there is a consensus about Result X?” Most IPCC authors want to believe in results that support the CAGW consensus, so politics is producing an artificial consensus. A more appropriate question would be: “What do you think about the reliability of the data supporting Result X or the arguments or experiments contradicting Result X?” Not “Is there a consensus”; but “Should there be a consensus”. “How sure are we that the skeptics are wrong about Result X?” (If the political implications of climate science were not distorting scientific opinion, the natural skepticism many scientists bring to their work would make the term “consensus” more meaningful.)

    • I totally agree. The Moss and Schneider method also ignores the “k”, related to how much and what quality of info you have, relative to that you would ideally need.

  15. Here’s a paper we could do a test run on. The Royal Society has just released a clutch of papers as Cancun opens the lead paper is saying temperature could rise 4C globally by 2060.
    http://tallbloke.files.wordpress.com/2010/11/royalsoc4c-full.pdf
    The other papers are free to download until tomorrow at http://rsta.royalsocietypublishing.org/content/369/1934.toc

    The timing, giving no opportunity for assessment and response by anyone else in advance of the conference, seems like a deliberate ploy to influence policy makers. Is the timing of the release of science a factor which should be considered? NUSAP is aimed at dealing with decision making where time is short and decisions urgent. Therefore papers threatening imminent catastrophic warming timed to coincide with world climate summits are potentially influential.

    Is this sound, honest scientific practice?

    • As Monckton points out in his recent report from the scene, the same trick is used to slip resolutions and agreements through at these confabs. The summary documents are presented to the delegates on the very morning of the vote, and are dense with gobbledegook and bureaucratese — hundreds of pages. No one could parse and comprehend what is being voted on. These “agreements” are then later cited in national legislatures as the basis for enabling laws, and thus the world is shaped by obfuscators.

  16. It is relevant that a child first spoke: “The Emperor has no clothes.” The child’s mind is naive and uncluttered with social and academic constraints. Therefore the child sees and speaks to what is observed. Once simplicity is revealed, we adults get “it” and give attribution and implications to the observed, layering what we want to see upon layers of uncertainty until relevance is lost. All three referenced resources to quantify uncertainty rely upon the adult perspective. Tesla requires peer-review to move closer to determine uncertainty. Climategate reveals the fallacy of such reliance. Refsgaard asks the question “who” will begin the process? because the “who” determines the uncertainty to be assessed. Moss & Schneider writes for the members of the “team” to develop a consensus on the uncertainty. Missing in all three is the naive observer who says: “you really don’t know do you?” That is in part what the blogosphere provides, citizen scientists, oberving, not caught up in the bewildering myriad of science, math, and advocay found in the climate science industry. Knowledgable people looking at the pieces as they go by can say, “The Emperor has no clothes.” Then there is simplicity to uncertainty: “I don’t know; stay tuned.”

  17. Dr. Curry,

    Welcome to world of us folks who are interested in the climate science. I appreciate your work.

    Could you perhaps provide an executive summary of your posts for engineers?

    I shake my head and can’t continue to waste my time on discussions of uncertainty where the ordinate of some ‘degree of consensus’, where most of the 2,500 scientists are of the soft sciences. They have no idea of the partial differential equations in a GCM.

    The absissa, the independent variable, is a computer model with assumed boundary conditions.

    With all due respect, consensus of laymen about an unverified, unvalidated computer model with assumed boundary conditions and parameters, which ignore the sun and clouds and that are currently falsified warrents NO study of the nature you are doing here, and you cannot come to some statistical value of uncertainty analizing these phenomena, I don’t care how many pages of stats you cite. I may be wrong, but I don’t think you can model a century.

    I am starting to sound like a curmudgeon
    You are doing very good work here in my humble opinion.

  18. Where’s Tobis? Where’s Waldo? I know they’re both in here somewhere…

  19. The “Tesla document” conflates the “probability” of the event A with the evidence supporting the occurrence of A. Actually, the probability of A is the measure of A. The evidence of A is not the measure of A but rather is the frequency of A in a statistical sample plus the frequency of not A.

    From this conflation, the Tesla document derives a number of conclusions which, while erroneous, are highly favorable toward the ends of ambitious climatologists. One of these is that it is experts (e.g. climatologists) and not the frequencies of A and not A that establish the evidence supporting the occurrence of A.

    • Terry can you please explain this, this does not necessarily follow from the tesla document in my opinion. Are you referring to the basic IF analysis, or step 3 in the analysis of the evidence?

      • Judy:
        On page 6, the document states: “where p(A) is the probability of event A occurring, or in other words, the evidence supporting the occurrence of A.” The implication is that the probability of A is the evidence supporting the occurrence of A. This is wrong. The evidence supporting the occurrence of A is the frequency of observed events in which A is observed plus the frequency of observed events in which A is not observed.

    • Can you comment on the 3 value logic vs the 2 value logic aspect of this. If we are talking about CO2 warming the atmosphere in 100 years as A, can you please expand on your statement

      • Judy:
        The following is based upon a quick skim of the Tesla report.

        Conventionally, the phrase “two value logic” references Aristotelian logic, wherein every proposition has a variable called its “truth-value.” The truth-value takes on the two values of “true” and “false.” The three-valued “logic” augments the two valued logic to cope with situations in which the truth or falsity is not determined. The three-valued “logic” is misnamed in the sense of being devoid of principles of reasoning for its inductive branch. It is devoid of these principles because, by lumping together all inferences that are neither true nor false, it eliminates the possibility of determination of which of these inferences is the correct inference.

        The folks who wrote the Tesla report seem to be confused about a number of matters. An aspect of this is that they seem to have confused the subjective Bayesian idea of probability as “degree of belief” with “evidence.” Thus, it seems, if you want to get evidence of CO2 warming in 100 years, you convene a group of climatologists and ask them about their collective degree of belief. The notion that one can create evidence by asking scientists rather than by observing nature is quite bizzare, in my opinion. However, it is consistent with the otherwise puzzling aspect of AR4 in which the IPCC asks us to believe in CAGW without providing evidence of same. Apparently, climatologists are able to generate “evidence” by articulating their subjective opinions.

      • An aspect of this is that they seem to have confused the subjective Bayesian idea of probability as “degree of belief” with “evidence.”

        I don’t think you read this correctly, but I still agree. (!?!)

        As I read this, “evidence” isn’t segregated to mean physical evidence vs opinion at the input stage. What I see as the problem is that their p and q values when propagating upwards in the tree don’t seem to be weighted, implying that physical evidence at a given input stage is equal to opinion evidence in another input stage (given the fact that this is software, I’m guessing it can’t tell which evidence is factual vs opinion.) As such this can give artificial weighting to opinion as it propagates up the tree, which I think is analagous to your comment.

        Or then again I could be mistaken.

    • BlueIce2HotSea

      The evidence of A is … the frequency of A … plus the frequency of not A.

      Should be:

      minus the frequency of not A.

    • Terry, the Tesla doc is talking about actual evidence, not statistics, and especially not the frequency of anything. Once again you are confusing your special language with ordinary language. Evidence is a very general concept. Moreover, they are talking about the evidence for a hypothesis, not the evidence supporting the occurrence of an event, whatever that might mean. You can’t have our words.

  20. BlueIce2HotSea

    i would like to see the IPCC authors’ raw assessment data for quality of evidence (their ranking votes using a continuous scale). It might be interesting to look at the the distribution of judgments (SD, skewness) and see how it changes over time.

    • BlueIce2HotSea

      What i am getting at is that SD is the measure of consensus and mean the measure of certainty.

  21. Applying the legal analogy to climate science, as has been postulated in comments above, is really not the appropriate paradigm to follow when analyzing the causes of climate change.

    A far better paradigm (and much more appropriate to investigating climate change) is the crime scene investigation that is conducted by detectives. Clearly, the detectives are far more objective in conducting their investigation than the lawyers are in conducting their case.

    Detectives have to assess the facts of the case, identify clues and evidence, identify likely suspects, establish cause and motivation, develop the credibility of the evidence, cross-check the evidence to see which suspects can be eliminated from consideration – sometimes having to think “outside the box” to figure out what it is that has happened.

    Lawyers, on the other hand, are handed a case to either prove or defend using whatever means that they may have available to them. If the facts support their case, then by all means, use the facts. If not, then try attacking the process, vilify the witnesses, or use other courtroom tricks and antics that might help to achieve their objective.

    Unfortunately, much of the public discourse on global climate change seems to be following the legal analogy paradigm.

    There are those who in one form or another have declared global warming to be an unmitigated hoax perpetrated by climate scientists. Since they have no facts on which to base their claim, their approach is naturally to ignore the facts and physics that would be relevant. After that, their approach has been to attack the process, e.g., the process of how the IPCC AR4 report was put together, but not the climate relevant information that it might or might not contain. When this approach was not sufficiently successful, they resorted to character assassination of the scientists involved in writing the IPCC report, as in Climategate. Deliberate promotion of misinformation regarding global warming and global climate change continues non-stop with the hope of achieving an OJ verdict from a thoroughly bamboozled public.

    Fortunately, Nature never takes the time to listen to lawyers. Nor has Nature been known to listen to the pleas, opinions, or arguments that may be presented by either side. It is only the facts and physics that set the course for what is to transpire with global warming and global climate change.

    The detective paradigm is the one that we should strive to follow in order to get a more objective view as to what is happening with global climate.

  22. Is it the intention to actually start using these new (to some) tools and to draw some conclusions, or just to spend days arguing about them?

    Perhaps a better way to discover the strengths and weaknesses of the IF method would be to suck it and see with a practical example. Suspend disbelief in it for a while and try it. I’ll leave it to others to choose the example, but otherwise I think you may be going nowhere fast.

    Forgive my unusual directness, but I’m used to having deadlines and budgets to meet…not the luxury of prolonged navel-gazing.

    • The intention is to use these tools to reason about uncertainty, which will be the topic of Part Ii. Some simple examples were given in the previous threads mentioned in the introduction.

  23. Judith,

    From some brief reading of evidential reasoning, belief functions, and Dempster-Shafer theory… Relative to probability theory, the formalism of these approaches is much, much more complex and the interpretations more unclear and more disputed. Yet I have interacted with several individuals who are under the impression that your approach is ‘obvious’, ‘intuitive’, ‘a simple heuristic’, ‘a simple teaching aid’, but have been unable to provide consistent explanations of the categories used. Belief functions are not probabilities, but seem to be most frequently applied as weights in decision making. These approaches seem to be used more frequently where it is believed that ignorance prevents the formation of a probability prior… but it is disputed that, in cases where evidential reasoning is applicable, that the former is also true.

    Confusion arises where you have equated probabilities with belief functions without a mathematical justification. Michael Tobis’ main criticism was of your equating the alpha probability level of a hypothesis test with the quantiles of a belief confidence interval… this post again doesn’t provide mathematical justification, nor what the transformation means in the real world… calling Tobis’ criticisms “confusion” seems unresponsive.

    Perhaps it would be helpful if you could explain what the following mean;

    ‘I have 90% belief that A is true, 90% belief that A is false, and-80% uncommited belief’.

    ‘Evidence increases my belief that A is true, but it doesn’t decrease my belief that A is false’.

    I’m not saying that this approach is wrong, or that it doesn’t work, but that it is complex and hasn’t been clearly explained and justified. Given such constraints, I’m doubtful that this approach is overall ‘value added’ relative to tried and tested and relatively easily and well understood probability theory and Bayesian estimation.

    • “Degree of certainty” is nonsense. “Degree of confidence” is better; it also suggests “degree of non-confidence” as opposed to “degree of uncertainty”. At least it would force being explicit about what generates confidence, and what it means for the speaker.

  24. From the education thread:

    However, you can then not pass it off as science if you are unwilling to hold your theory, or data to the same levels of rigour or scrutiny.

    I’m sorry, but this is gobbledygook. Science is not manufacturing, any more than manufacturing is neurosurgery. Saying that your process isn’t up to the standards of an anterior temporal lobectomy would be a meaningless judgment. One assesses things as they are, not with respect to other, different things.

    If we cannot say something with any degree of certainty using science (which i don’t think we can yet), then you cannot use the science to justify preventative actions.

    This is all-or-nothingism, again. Recognizing that there is uncertainty does not lead to the conclusion that “we cannot say something with any degree of certainty.” You are trying to reduce this to a black-and-white issue, and using a very lossy compression algorithm.

    • Firstly- thanks for taking the time to do this AND to post it on another thread, it IS appreciated.

      re- your above points.

      I disagree that i am ‘shooting for’ all or nothingism (nice term btw)- i am in fact trying to assert something slightly (but significantly) different. Apologies if i’ve explained poorley.

      I am NOT saying that without complete certainty a theory cannot be supported/acted upon/used- if it’s come across as that then my bad.

      What i AM saying is that i do not think that climate science has reached a level of understanding where predictions/definitive conclusions CAN be made. note- this is not the same as saying we WON’T get to that stage, just that we’re not there yet. Do you see the distinction?

      My issue is, there are uncertanties in the data- temperature for instance (error limits, splicing etc etc) and although not a direct counter to the theory, the statistical error limit is larger than the trend we’re trying to observe (particularly for the proxy data). Therefore, scientifically AND statistically you cannot make ANY predictions/conclusions off this data- regardless of how good it looks.

      It’s a rigour thing- we WILL get there, believe me the rate of knowledge-gain on this subject is becoming seriously impressive, (now the debate has been re-opened) but as it stands now, we do not know ENOUGH to make any conclusions- only guesses.

      • The uncertainty, however, is heavily weighted to the high side of the sensitivity range. There’s very little evidence at all of low climate sensitivity. This doesn’t preclude a revolutionary discovery that reverses the developing understanding of cloud feedbacks as positive, not negative. But merely saying “well they could be negative” in anticipation of such a discovery isn’t really a strong argument.

        ENOUGH is a value judgment, and this is the kind of thing that I was referring to when I talked about questions that can only be answered by the policy arena. What is enough? “Error limits larger than the trend” doesn’t apply in all cases; as Jim Cripwell is fond of pointing out, you can’t directly measure climate sensitivity. We won’t know for sure until CO2 doubles. Are you saying we should wait for 630 ppmv?

      • If it’s a “rigor thing”, maybe we should wait for 630,123456789 ppmv.

        In this context, “enough” is more than a value judgement, it CAN’T be BE an AD HOC one. (By the way, all-or-nothingism does not forbid from saying we’ll get there one day.)

        Here is an idea for a punk-rock band name: **Climate Disruption**.

        An idea for a song: “When is enough really enough?”

      • PDA

        You said: “The uncertainty, however, is heavily weighted to the high side of the sensitivity range”

        For my own edification, what is the evidence that the uncertainty is weighted to the high side? Thanks

      • On the contrary, papers have been published showing low (and in a few cases) negative climate sensitivity.

        The ENOUGH value judgement is actually an exceptionally simple and straight forward one- it is by no means ad hoc.

        In this instance it can be classified as this:

        “To allow us to attribute an anthropogenic signiture to the observed trends we need to eliminate the natural forcings at work and demonstrate the degree of the anthropogenic vs the natural signal.”

        As we still do not know how the main natural forcings work, let alone the unknown or little-known ones, we therefore cannot make ANY assertions towards the attribution of ‘blame’ (as it were), towards any UnNatural forcings- i.e. the attribution of man-made co2 as the cause.

        This is VERY basic stuff chaps.

      • > To allow us to attribute an anthropogenic signiture to the observed trends we need to eliminate the natural forcings at work and demonstrate the degree of the anthropogenic vs the natural signal.

        This criteria is not only AD HOC, but a NONSEKWITCHUR.

        The hint as to why this is an ad hoc criteria lies in the expression “demonstrate the DEGREE”.

        The hint as to why it is a NONSEKWITCHUR might be found in this post by A Lacis, earlier in this thread:

        http://judithcurry.com/2010/11/28/waving-the-italian-flag-part-i-uncertainty-and-pedigree/#comment-17062

        > As we still do not know how the main natural forcings work […]

        There is no NEED to go beyond VERY basic critical thinking stuff to recognize an APPEAL TO IGNORANCE.

        ***

        There is no NEED to know HOW the main natural forcings work (as if we did not know something about them) to observe THAT they do not suffice to explain what is happening.

        Pretty basic stuff indeed.

      • On the contrary, papers have been published showing low (and in a few cases) negative climate sensitivity.

        Those studies are few and far between. There are far more papers that show sensitivity bounded on the low side at around 1 or 1.5°C, and not bounded well at all on the high side. That’s what I mean by the uncertainty being “weighted on the high side.”

        I hope you understood Willard’s response to your “simple and straight forward” criteria. I’m hopeful it was poor wording on your part, rather than a blatant appeal to ignorance.