On the consilience of evidence argument

curryja

15 years ago

On the Uncertainty and the AR5 thread, Fred Moolten and Paul Dunmore provide starkly different arguments for reasoning about multiple lines of evidence. This issue gets to the heart of the source of much disagreement in the scientific debate about climate change.

Fred Moolten:

To more formally describe the importance of the convergence principle in support of anthropogenic causality, I would state it as follows:

A. Very few data sources on this conclusion are amenable to a probability estimate of the form: “The probability that anthropogenic causality is true is p. The probability that it is false is 1 – p.”

B. Instead, the vast majority take the form: “The probability that the data can correctly be interpreted as demonstrating anthropogenic causality is p. The probability that the data are insufficient to demonstrate this result is 1 – p.” Here, 1 – p is not the probability that the conclusion is false, but merely the uncertainty about its truth.

The distinction is critical. The confusion between A (disproof) and B (uncertainty) leads to a very substantial underestimate of a valid probability for anthropogenic causality based on the large number of data sources that fall into category B, and the few might be assigned to category A.

I have cited examples above of category B probabilities. How often will evidence for some alternative climate mechanism constitute an example of a Category A disproof type of probability value? Such would occur only when the alternative is inconsistent with the coexistence of anthropogenic causality. This is rarely possible, because quantitative estimates are rarely precise enough for one possible mechanism to exclude an important role for all others. As an example, early twentieth century solar forcing can be assigned a greater role than anthropogenic forcing in mediating observed trends, but the participation of both in proportion to calculated potencies, in conjunction with other factors, known and unidentified, is consistent with the data. Nevertheless, Category A examples should always be evaluated seriously based on the evidence in each case.

I hope it is clear from the above that what we are discussing are probabilities, and not certainties – indeed, the operative term that defines Category B is uncertainty. With that in mind, and with uncertainty as a focus, Judy, of what you are writing about, I hope you will consider adding the convergence principle to other perspectives on probability you are already addressing in formulating conclusions. Without it, I believe an important element of how we evaluate climate data will be missing.

Paul Dunmore:

Fred, I have been hoping that someone would tackle your “convergence” argument in this thread head-on, and various people have taken on bits of it. But it turns up in various guises, and it needs to be nailed properly.

The proposition you wish to establish with high confidence is that “anthropogenic greenhouse gas emissions are a major contributor to global warming.” No evidence can establish the truth of this proposition, because the proposition is incomplete. It cannot be tested until the word “major” is given an operational definition. (As a grammatical rather than logical difficulty, your statement contains two implicit propositions, the first of which (global warming is occurring) must be established before the second has any meaning. But that is minor.) This bears on the question of whether a particular study supports there being a “major” contribution. The IPCC has defined “major”, and if you accept their definition you would have to consider whether each of the studies you have read supports a “major” contribution or just a contribution. I don’t see that you claim to have made that distinction, and I not sure that you could have.

Next, you assert that there are large numbers of independent studies (“thousands” at one point) which provide evidence in support of this proposition. But many of the studies cannot provide such evidence: paleoclimate studies, for example, cannot directly give evidence about the effects of anthropogenic emissions, because such emissions were not happening at the time. What they can do is to allow us to test our understanding of the climate system, and that understanding is itself evidence about the causes of contemporary global warming. But you massively double-count if you present all of the studies that support our understanding and also count the models themselves (as you do, twice over, when you say that the models fail to reproduce trends without including anthropogenic forcing and also that we must accept high-end values for other drivers if we exclude anthropogenic effects; this is a single argument expressed in two different sets of words). In fact, the evidence here is the models, not the studies that support the models – the supporting studies may mean that we have high confidence in the predictions of a single model, but not that we can treat the predictions of the model in a new situation as having been confirmed thousands of times by the thousands of studies in quite different situations that we used to build up our confidence in the models. What we might have is 90% confidence in the description offered by one model, not 50% confidence in each of thousands of independent predictions of AGW.

Next, you misuse Bayes’s Theorem in the way that you combine your streams of evidence. Bayes’s Theorem allows us to update the probability that a hypothesis is true given evidence about related observations; it can be applied sequentially to each of many pieces of evidence, and the probability that the hypothesis is true builds up very quickly towards 1 in exactly the way you calculate. But what is needed in the update is not what you claim: it is NOT the probability that the hypothesis is true given the next piece of evidence, but the probability of observing the next piece of evidence given that the hypothesis is true (and also that probability given that the hypothesis is false). If our next piece of evidence, X, has a 50% chance of being observed if theory A is true, and has a 50% chance if theory B is true, then the Bayesian update does not bring the probability of A upwards towards 1, but somewhat closer to 50% (from either above or below 50% based on our previous evidence). In fact, if there is also a 30% change of observing X if theory C is true and a 10% chance if theory D is true, the Bayesian update will pull the probability that A is true down below 50%. And the contrary theories do not need to be the same in every iteration. It is possible that in an ice-bubbles study the alternative hypothesis is gas contamination, in a tree-ring study it is physical damage, and in a historical temperature set it is incomplete corrections for UHI. In each case, if the alternative is about as consistent with the evidence as the conventional explanation, the conventional explanation will not gain ground, even if it is a contender in every single study. That is why scientists need to design careful studies which demonstrably eliminate every alternative explanation in that particular study (that is, they must show that the probability of observing result X under contending theory B is some very small amount such as 0.01); and they have to repeat this feat again and again and again. There is no short-cut principle by which weight of numbers of unconvincing studies can eventually overwhelm all opposition: your “convergence” mantra is a fallacy, not a principle.

Finally, your formulation of the question at issue is wrong. You present it as a true-false hypothesis, but it is actually a measurement problem. Most people whose views deserve to be taken seriously have no doubt that adding CO2 and its friends to the atmosphere will warm the planet. The $64 question is: how much? If doubling CO2 will warm the planet by 0.1C, we have nothing much to talk about – the atmospheric scientists can be left in peace to get on with their work. If doubling CO2 will raise the temperature by 20C, we have a great deal to be concerned about (and I would want to withdraw a policy comment I made above). What matters is the magnitude, and the difficult problem is to measure the magnitude, given the complexity of the feedback processes and the fact that some of them (and even some of the forcings) are incompletely understood and even more incompletely calculable. There may be debate about which magnitude is to be measured: suppose it is the equilibrium temperature sensitivity. Suppose, further, that several careful studies of different kinds have estimated this to be 1.5+/-0.7, 4.2+/-2.9, 0.6+/-0.3, where the +/- ranges are 95% confidence intervals. (For the avoidance of doubt, I just made these numbers up.) All of these studies clearly support a value greater than zero, but they are inconsistent with each other. Are they supportive of the AGW hypothesis? Yes, of course, but that was never at issue. The average estimate is 2.1, but that is completely inconsistent with two studies and only marginally consistent with the other. Assuming normally distributed errors in each study, the maximum-likelihood estimate of the sensitivity is 0.77, but this most-likely estimate is still extremely unlikely, with a probability density there of only 0.001 (reflecting the mutual inconsistency of the estimates). That is, this made-up evidence overwhelmingly supports both of the propositions that (1) the true sensitivity is greater than zero, AND (2) we do not at all understand what is going on. The problems might be measurement errors, inconsistent definitions of what is being measured, or processes that we do not suspect. The only way to get a credible estimate is to work out why the estimates vary and to design new measurements that correct the problems; we have not done that until all of the estimates agree within observational error.

So when you say that you have read many studies and they overwhelmingly support the AGW hypothesis, can you tell us how many of these studies provide (independent) estimates of the sensitivity, and have you noticed whether these estimates are mutually consistent? Some such estimates exist, but there seem to be not very many of them. If the IPCC writers have cherry-picked the studies to which they give credence (or the literature is biased against publishing studies which give estimates away from the consensus), then the persuasiveness of the claims quickly becomes much less than is thought. So such allegations of (even unconscious) bias in the process are, if at all credible, very damaging to our confidence in the consensus views – these debates matter for scientific reasons, not just as political point-scoring. BTW, suppressing the highest of my made-up estimates would be as damaging as suppressing the lowest, because each action would convey the wrong impression that our understanding is better than it really is.

None of this is to say that your belief about AGW is wrong; I share at least some of your views. But the argument that you present for it is utterly bogus.

Fred Moolten:

For readers to understand the background without revisiting the other thread, I suggested a principle of “convergence” whereby a large number of approaches to a hypothesis (in this case anthropogenic causality) could add to the probability that the hypothesis is correct to the extent that they were independent or partially independent – the degree of independence affecting their contribution to the final probability estimate.
An essential element of the principle is that none of the individual approaches (e.g. studies reported in the literature) need necessarily be conclusive, but would nevertheless contribute if they provided partial support (e.g., an ability of prehistoric CO2 to elevate temperature or of current CO2 to correlate with temperature – obviously neither of these “proves” anthropogenic causality). This involves studies of the type described in your post, where the study is of the “B” type. I suggest that adequate counter-evidence would require multiple studies of a hypothesis that excluded anthropogenic causality, and therefore the number of conflicting studies is an important variable. In other words, the number of supporting studies is an important ingredient of the probability estimate in
addition to the strength of each study.

I cited a hypothetical example of the principle in which a very large number of supportive, independent studies of intermediate conclusiveness would constitute a dataset, D, that would be very improbable if the hypothesis being tested, which I labeled A, were false. This would occur if an alternative I labeled S were true. The Bayes formulation for this is given by P(A|D) = P(D|A) x P(A)/P(D). In this hypothetical case, P(D) is almost the same as P(A), because the alternative S in my example almost never yields D.

Paul Dunmore correctly states that I generalized about what constitutes “major” anthropogenic causality. I was primarily interested in bringing productive discussion away from the question as to whether anthropogenic causality was non-existent (or trivial) and instead directing it toward appropriate estimates of its strength relative to other climate drivers.

As this discussion proceeds, I expect that more of the elements that go into these estimates will emerge.

Share this: