Site icon Climate Etc.

A Test of the Tropical 200-300 mb Warming Rate in Climate Models

by Ross McKitrick

I sat down to write a description of my new paper with John Christy, but when I looked up a reference via Google Scholar something odd cropped up that requires a brief digression.

Google Scholar insists on providing a list of “recommended” articles whenever I sign on to it. Most turn out to be unpublished or non-peer reviewed discussion papers. But at least they are typically current, so I was surprised to see the top rank given to “Consistency of Modelled and Observed Temperature Trends in the Tropical Troposphere,” a decade-old paper by Santer et al. Google was, however, referring to its reappearance as a chapter in a 2018 book called Climate Modelling: Philosophical and Conceptual Issuesedited by Elizabeth Lloyd and Eric Winsberg, two US-based philosophers. Lloyd specifically describes herself as “a philosopher of climate science and evolutionary biology, as well as a scientist studying women’s sexuality” so readers should not expect specialized expertise in climate model evaluation, nor does the book’s editors exhibit any. Yet Google’s algorithm flagged it for me as the best thing out there and positioned two of its chapters as top leads in its “recommended” list.

Much of the first part of the book is an extended attack on a 2007 paper by David Douglass, John Christy, Benjamin Pearson and Fred Singer on the model/observational mismatch in the tropical troposphere. The editors add a diatribe against John Christy in particular for supposedly being impervious to empirical evidence, using flawed statistical methods and refusing to accept the validity of climate model representations of the warming of the tropical troposphere.

By way of contrast, and as an exemplar of research probity, they reproduce the decade-old Santer et al. paper and rely entirely on it for their case. If they are aware of any subsequent literature (which I doubt) they don’t mention it. They fail to mention:

McKitrick and Vogelsang (2014)provided a longer model-observational comparison using radiosonde records from 1958 to 2012 while generalizing the trend model to include a possible step change, and reaffirmed the significant discrepancy between models and observations. Similar conclusions were also reached by Fu et al. (2011), Bengtsson and Hodges (2009)and Po-Chedley and Fu (2012).

Needless to say you learn none of this in the Lloyd and Winsburg book.

A related issue is the ratio of tropospheric to surface warming. Klotzbach et al. (2009)found that climate models predict an average amplification ratio of about 1.2 between surface and tropospheric trends, but this far exceeded the observed average, which is typically less than 1.0. Critics said they should have used a different ratio between oceans and land, so Klotzbach et al. (2010)used 1.1 over land and 1.6 over oceans, which didn’t change their conclusions.

Vogelsang and Nawaz (2017)is an important new contribution to this literature since they provide the first formal treatment of the trend ratio problem. They note that there are several seemingly identical ways to write out the trend ratio regression but they each imply different estimators, one of which is systematically biased. They identify a preferred method (which corresponds to the form used by Klotzbach et al.) and they provide a practical method for constructing valid confidence intervals robust to general forms of autocorrelation.

They then use the Klotzbach et al. data sets (original and updated) and test whether the typical amplification ratios in climate models are consistent with observations. In almost all global surface/troposphere data pairings, the amplification ratios in models are too large and are rejected against the observations. When the testing is done separately for land and ocean regions the rejections are unanimous.

So: whether we test the tropospheric trend magnitudes, or the ratio of tropospheric to surface trends, across all kinds of data sets, and across all major trend intervals, models have been shown to exaggerate the amplification rate and the warming rate, globally and in the tropics.

Philosophers Elizabeth Lloyd and Eric Winsberg sound very smug and confident as they disparage people like John Christy and his coauthors and colleagues. Yet they clearly don’t know the literature, and they instead reveal that they are the ones who are impervious to empirical evidence, enamoured with flawed statistical methods and uncritical in their acceptance of biased climate model outputs.

Moving on.

John and I have published a new paper in Earth and Space Sciencethat adds to the climate model evaluation literature, using tropical mid-troposphere trend comparisons (models versus observations) as a basis to make a more general point about models. For a model to be scientific it ought to have an underlying testable hypothesis. Large, complex models like GCMs embed countless minor hypotheses that can be tested and rejected without undermining the major structure of the model. For instance, if a GCM does a lousy job of reproducing rainfall patterns over the Amazon, that component could be modified or removed without the model ceasing to be a GCM.  But there must be at least one major component that, in principle, were it to be falsified, would call into question such an essential component of the model structure that you couldn’t simply remove it without changing the overall model type.

The hypothesis we are interested in testing is the representation of moist thermodynamics in the model troposphere that yields amplified warming in response to rising CO2 levels, thereby generating the results of most interest to users of GCMs, namely projections of global warming due to greenhouse gas emissions.  We propose four criteria that a valid test must meet and we argue that the air temperature trend in the tropical 200-300 mb layer satisfies all four, pretty much uniquely as far as we know. That layer is where models exhibit the clearest and strongest response to greenhouse warming, on a rapid timetable, so it makes sense to focus on it as a test target. The four specific criteria are as follows.

We took the annual 1958-2017 tropical 200-300mb layer average temperatures from three radiosonde data sets: RATPAC, RICH and RAOBCORE, and from all 102 runs in the CMIP5 archive. The model runs followed the RCP4.5 concentrations pathway, which follows observed GHG levels and other forcings up to the early part of the last decade then projections thereafter. We estimated linear trends using ordinary least squares and computed robust confidence intervals using the Vogelsang-Franses method. We generated results both for a simple trend and for one allowing a possible break in 1979 following the method outlined in McKitrick and Vogelsang (2014).

The trends (circles) and confidence intervals (whiskers) are shown here (models-red, observations-blue):

The mean restricted trend (without a break term) is C/decade in the models and  C/decade in the observations. With a break term included they are, respectively, C/decade (models) and C/decade (observed).

In both cases, all 102 model trends exceed the average observed trend. In the restricted case (no break term), 62 of the discrepancies are significant, while in the general case 87 are. In both cases the model ensemble mean also rejects against observations.

We also constructed divergence terms consisting of each model run minus the average balloon record. The histograms of trends in these measures ought to be centered on zero if the model errors were mere noise. Instead the distributions are entirely positive, indicating a systematic positive bias:

Conclusion

Summarizing, all 102 CMIP5 model runs warm faster than observations, in most individual cases the discrepancy is significant, and on average the discrepancy is significant. The test of trend equivalence rejects whether or not we include a break at 1979, though the rejections are stronger when we control for its influence. Measures of series divergence are centered at a positive mean and the entire distribution is above zero. While the observed analogue exhibits a warming trend over the test interval it is significantly smaller than that shown in models, and the difference is large enough to reject the null hypothesis that models represent it correctly.

To the extent GCMs are getting some features of the surface climate correct as a result of their current tuning, they are doing so with a flawed structure. If tuning to the surface added empirical precision to a valid physical representation, we would expect to see a good fit between models and observations at the point where the models predict the clearest and strongest thermodynamic response to greenhouse gases. Instead we observe a discrepancy across all runs of all models, taking the form of a warming bias at a sufficiently strong rate as to reject the hypothesis that the models are realistic. Our interpretation of the results is that the major hypothesis in contemporary climate models, namely the theoretically-based negative lapse rate feedback response to increasing greenhouse gases in the tropical troposphere, is flawed.

Paper:

Moderation note:  As with all guest posts, please keep your comments civil and relevant.

Exit mobile version