by Sergey Kravtsov, Marcia Wyatt, Judith Curry and Anastasios Tsonis
A discussion of two recent papers: Steinman et al. (2015) and Kravtsov et al. (2015)
Last February, a paper by Mann’s research group was published in Science, which was discussed on a previous CE post [link]
Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures
Byron A. Steinman, Michael E. Mann, Sonya K. Miller
Abstract. The recent slowdown in global warming has brought into question the reliability of climate model projections of future temperature change and has led to a vigorous debate over whether this slowdown is the result of naturally occurring, internal variability or forcing external to Earth’s climate system. To address these issues, we applied a semi-empirical approach that combines climate observations and model simulations to estimate Atlantic- and Pacific-based internal multidecadal variability (termed “AMO” and “PMO,” respectively). Using this method, the AMO and PMO are found to explain a large proportion of internal variability in Northern Hemisphere mean temperatures. Competition between a modest positive peak in the AMO and a substantially negative-trending PMO are seen to produce a slowdown or “false pause” in warming of the past decade.
The paper is explained by:
- Michael Mann at RealClimate: Climate Oscillations and the Climate Faux Pause.
- The press release from Penn State: Ocean Oscillations caused false pause in global warming.
Led by Sergey Kravtsov, the stadium wave team was quick to respond with a rebuttal. Many months later, our rebuttal, along with Steinman et al.’s surrebuttal, have been published:
Comment on “Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures”
Kravtsov, M. G. Wyatt, J. A. Curry, and A. A. Tsonis
Science 11 December 2015: DOI: 10.1126/science.aab3570 [link]
Abstract. Steinman et al. argue that appropriately rescaled multimodel ensemble-mean time series provide an unbiased estimate of the forced climate response in individual model simulations. However, their procedure for demonstrating the validity of this assertion is flawed, and the residual intrinsic variability so defined is in fact dominated by the actual forced response of individual models.
Response to Comment on “Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures”
Science 11 December 2015: DOI: 10.1126/science.aac5208 [link]
Abstract. Kravtsov et al. claim that we incorrectly assess the statistical independence of simulated samples of internal climate variability and that we underestimate uncertainty in our calculations of observed internal variability. Their analysis is fundamentally flawed, owing to the use of model ensembles with too few realizations and the fact that no one model can adequately represent the forced signal.
Marcia Wyatt explains Kravtsov et al. rebuttal
Climate varies on time scales from years to millennia. Recent attention has focused on decadal and multidecadal variability, notably “pauses” in warming. The latest one – a slow down in warming since 1998 – has eluded explanation. One school-of-thought presumes external forcing (natural and anthropogenic) dominates the low-frequency signal; another casts internal variability as a strong contender. Numerous studies have addressed this conundrum, attempting to decompose climate into these dueling components (Trenberth and Shea (2006); Mann and Emanuel (2006); Kravtsov and Spannagle (2008); Mann et al. (2014); Kravtsov et al. (2014), Steinman et al. (2015); Kravtsov et al. (2015)). Despite many efforts, no universal answer has emerged.
Steinman et al. (2015) claim significant strides toward resolving the matter. They argue that they have identified an externally forced response, and per consequence, can estimate the observed low-frequency intrinsic signal. Using multiple climate models, they generate an average of all climate simulations. The resulting time series – a multi-model ensemble-mean – is their defined forced signal. Steinman et al. aver that this forced signal is completely distinct from intrinsic variability of all simulations within this suite of models. If this claim holds, then this model-estimated forced signal justifiably can be combined with observational data in semi-empirical analysis, which Steinman et al. use to expose the estimated intrinsic component within observed climate patterns.
But questions arise – about the modeled data and about procedure.
Output of climate-model simulations can provide insights into the observed climate response to external forcing. Steinman et al. use data from the collection of models of the fifth version of the Coupled Model Intercomparison Project (CMIP5). While a valuable resource; CMIP simulations have limitations. A major one is uncertainties surrounding model input: Specifying external forcing and parameterizations of unresolved physical processes is an inexact science; thus, in attempt to accommodate the speculated ranges and combinations of these factors, modelers script individual models within the CMIP multi-model ensemble with different subsets of such. Hence, each model generates a statistically distinct forced response. Deciding which forced response might be the “right” one is the challenge. Steinman et al. maintain that they have overcome this hurdle with a single forced signal that they claim represents the entire collection of CMIP models. In light of the individual model distinctions aforementioned, and the observation that each individual model generates its own unique forced response, the assertion of a single forced response, suitable for all CMIP models, is striking.
Much of Steinman et al.’s argument rests on residual time series. This is the information left-over from the time series of a climate simulation after the model-estimated forced response has been extracted from it. These time series can be used to ensure the forced signal is unbiased. The reason behind this assumption is that if the model-estimated forced response is completely disentangled from each model’s intrinsic signal, and therefore unbiased, then the modeled intrinsic signals (the residuals) should be unbiased too; the residuals would be statistically independent of one another, i.e. not correlated. Steinman et al. claim to show the residuals are uncorrelated – a significant result, if true.
Steinman et al. generate the residuals by removing the forced signal from individual time series of all climate simulations, across all models and do so using two different methods – one involves differencing (subtraction) and the other, linear removal of a re-scaled forced signal via regional regression. Both operations yield similar results: Each leaves behind numerous residual time series, and each residual is taken to represent model-estimated intrinsic variability. Steinman et al. test correlation among the residuals: They invoke an indirect statistical test based on well-known properties of distribution of independent random numbers. The residuals are shown to be uncorrelated. This result suggests that the forced signal, indeed, is unbiased. Now the path is paved for Steinman et al. to use this signal to evaluate internal variability of observed regional climate patterns. Once they achieve this, they can use the identified intrinsic component to infer its potential relationship to the currently observed “pause” in surface warming. This all sounds promising. But is all as it seems?
The results seem counterintuitive, given fundamental differences among individual models. How could the forced signal truly be unbiased? This apparent puzzle motivated Kravtsov et al. (2015). They explore details of Steinman et al.’s methodology and design an alternate strategy to separate model-estimated signals.
In the big picture, Steinman et al. and Kravtsov et al. go about disentangling climate components similarly. Yet there are significant differences. Steinman et al. place emphasis on a multi-model ensemble. They consider simulated time series “in-bulk”, so to speak, with no distinctions made among individual models. Kravtsov et al., instead, focus on the individual models of the multi-model ensemble; they look at simulations from one model at a time. This difference in approach produces different data sets of residuals. Specifically, Steinman et al. use a multi-model ensemble-average as their forced signal, and remove this forced signal from individual climate simulations across all models of the multi-model ensemble. The result is one data set of residuals for this multi-model ensemble. On the other hand, Kravtsov et al. generate two data sets of residuals within each of the 18 individual-model ensembles: One data set  is derived from linear subtraction of a multi-model ensemble-mean from each simulation of a given model; repeated for 18 models. The second  is derived from subtraction of the single-model’s ensemble-mean from each simulation of a given model; repeated for 18 models.
This latter approach  – subtraction of the single-model’s ensemble-mean from a climate simulation – is a traditional method for decomposing simulated climate variability into its forced and intrinsic components, thereby generating a naturally unbiased signal. The reason for success in generating this naturally unbiased signal is traceable to a fundamental assumption: forcing subsets and physical parameterizations for a given single-model’s ensemble of simulations are identical; any differences in the modeled realizations are due to differences in the model’s intrinsic variability, most of these attributable to differences in initialization of each run, and therefore, differences that are uncorrelated. Thus, when the forced signal (single-model ensemble-mean) is subtracted (differenced) from each climate-simulation time series within a single-model ensemble, remaining residuals within this ensemble should be uncorrelated, or independent. And indeed, using a simple time series-correlation metric, Kravtsov et al. found this to be the case.
In contrast, when Kravtsov et al. linearly subtracted  the multi-model ensemble-mean (i.e. Steinman et al. forced signal) from the model simulations within a single-model ensemble, the resulting residual time series within the model’s ensemble were significantly correlated . These residual time series of a single-model ensemble are not independent. They share a common signature. They contain remnants of the multi-model ensemble-mean! Implications are significant: If the multi-model ensemble-mean – i.e. the Steinman et al. forced signal – produces a biased forced response for single CMIP5 models, how could this choice of forced signal credibly provide the unbiased estimate of a forced response in observations?
All this brings us to a final curiosity: Steinman et al. demonstrated absence of bias in their forced signal. How did they do this, if indeed, the signal is biased? The answer lies in their procedure for ensuring statistical independence of model-estimated residual time series. Their procedure is flawed: due partly to their choice of forced signal – a multi-model ensemble-mean, and due partly to how the forced signal is removed from simulations. Recall, they remove the forced signal from individual simulations across all models, versus removing the forced signal from simulations within single-model ensembles.
Alluded to earlier in this essay was the procedure used by Steinman et al. to assess signal bias, or lack thereof. We revisit that procedure here. This simple method determines whether or not a number of individual residual time series share a common signature with one another. Bear in mind: if the forced response is completely separated from each simulation’s intrinsic component, the numerous resulting time series of residuals will be uncorrelated with one another, i.e. statistically independent. Hence, when they are averaged together into a multi-model ensemble-mean of residuals, any differences among them would largely cancel out. The result of this is that the dispersion (difference from mean) of the multi-model ensemble-mean of residuals would be much smaller than the dispersion of the individual residuals. The procedure used to capture this relationship compares the actual dispersion (the variance of the ensemble-mean residuals) to the theoretical dispersion (the average of the individual variances of the residuals divided by the number of simulations). Due to the way variance is computed, the actual dispersion will be much smaller than the theoretical dispersion if the residuals are independent (uncorrelated). Steinman et al. applied this well-known method to their data set of residuals and found that their actual dispersion was, indeed, much smaller than their theoretical. It logically followed that their forced signal must be unbiased.
But it turns out there is a hidden glitch in this procedure that makes this result an illusion: Defining the forced signal in terms of a multi-model ensemble-mean and extracting it from individual simulations in-bulk, across all models, imposes an algebraic constraint on the residuals such that the ensemble-mean of residuals happens always to be zero, by mathematical construction. Due to this algebraic constraint, the actual dispersion will always be smaller than the theoretical one, and therefore the residuals always will appear to be uncorrelated, whether they are or not.
The trick in bypassing this constraint is to limit focus to residual time series within an ensemble of a single model. This is what Kravtsov et al. did. Had Steinman et al. also done this – i.e. removed their multi-model ensemble-mean from simulations exclusively within an ensemble of a single model, and tested the resulting residual time series for statistical independence using actual versus theoretical dispersion, as per Kravtsov et al., – they, too, would have found the residuals to be correlated (not independent).
Hence, through these slightly different methodologies, Steinman et al. and Kravtsov et al. come to different conclusions. Steinman et al. argue they have identified an unbiased forced signal, and with it, have identified the component of observed climate due to intrinsic variability. In contrast, Kravtsov et al. conclude Steinman et al. have not identified a forced response that is unbiased; that procedural artifacts only gave the illusion of such; and that successful disentanglement of climate components remains elusive.
A final point: Intrinsic variability indeed may damp the presumed anthropogenic signature of secular-scale warming in the currently observed “pause”. Many have suggested such, Steinman et al. among them. Yet, without demonstration, conclusions on what may seem valid cannot be made. Thus, Steinman et al.’s argument that they have assessed the role of internal variability in the current “pause” is unsupportable. Their chosen forced signal – a multi-model ensemble-mean – fails to provide convincing evidence.
Acknowledgements: Feedback and editing suggestions from Sergey Kravtsov and Judith Curry streamlined the text of this essay; ensured its accuracy; and clarified its message. Their input is much appreciated.
Sergey Kravtsov responds to the surrebuttal
Steinman et al. claim that they avoided the issue of the algebraic constraint leading to the apparent cancellation of the “intrinsic” residuals in multi-model ensemble mean by using N-1 models to define the forced signal of the Nth model. However, doing so is really no different from using the multi-model mean based on the entire ensemble, since exclusion of one simulation will not affect the multi-model ensemble mean in any appreciable way. In fact, it is easy to show that the ensemble-mean residual time series in this particular case would be approximately proportional to the estimated forced signal – multi-model ensemble mean time series, – with the scaling factor involving 1/N, thus making its standard deviation much smaller than expected from 1/sqrt(N) scaling due to the cancellation of statistically independent residuals. Hence, the multi-model ensemble mean of the “intrinsic” residuals so obtained would have a negligible variance, but this will have nothing to do with the actual independence of the residuals (see Reference/Note 5 in Kravtsov et al.). Indeed, Kravtsov et al. demonstrated that these residuals are definitely well correlated within individual model ensembles, hence not independent. Steinman et al., in their reply, acknowledge the correlation, but still falsely claim independence.
Steinman et al. built the rest of their rebuttal on the fact that individual-model ensembles only have a few realizations, so the ensemble mean over these realizations, even if smoothed, will contain a portion of the actual intrinsic variability aliased into the estimated forced signal. Hence the estimated residual intrinsic variability will be weaker than in reality. This is a valid point, and it can be dealt with by choosing the cutoff period of the smoothing filter (5-yr in Kravtsov et al.) used to estimate the forced signal more objectively, for example by matching the level of the resulting residual intrinsic variability in the historical runs with that in the control runs of CMIP5 models. However, this issue is only tangentially related to the implications and fundamental limitations of using the multi-model ensemble mean to estimate the forced signal outlined in Kravtsov et al. In particular, if one would use multi-model ensemble mean to define an intrinsic variability in any one model, this variability would have a much larger variance than the intrinsic variability of this model in the control run; this larger variance would be dominated by the individual forced signal bias for this model.