by Judith Curry
On the previous sea surface temperature thread, I stated “Do you for one minute believe that the uncertainty in global average sea surface temperature in the 19th century is 0.3C? I sure as heck don’t.” Sharper00 challenged me to further support this statement, which provides the motivation for this thread along with the recent release of the latest version of the Hadley Centre SST dataset (HADSST3).
The HadSST3 website went live with major updates on 5/27. The data set is described at length in the following two publications:
Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850, part 1: measurement and sampling uncertainties
J. J. Kennedy, N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
Abstract. New estimates of measurement and sampling uncertainties of gridded in situ sea-surface temperature anomalies are calculated for 1850 to 2006. The measurement uncertainties account for correlations between errors in observations made by the same ship or buoy due, for example, to miscalibration of the thermometer. Correlations between the errors increase the estimated uncertainties on grid-box averages. In grid boxes where there are many observations from only a few ships or drifting buoys, this increase can be large. The correlations also increase uncertainties of regional, hemispheric and global averages above and beyond the increase arising solely from the inflation of the grid-box uncertainties. This is due to correlations in the errors between grid boxes visited by the same ship or drifting buoy. At times when reliable estimates can be made, the uncertainties in global-average, southern-hemisphere and tropical sea-surface temperature anomalies are between two and three times as large as when calculated assuming the errors are uncorrelated. Uncertainties of northern hemisphere averages approximately double. A new estimate is also made of sampling uncertainties. They are largest in regions of high sea surface temperature variability such as the western boundary currents and along the northern boundary of the Southern Ocean. The sampling uncertainties are generally smaller in the tropics and in the ocean gyres.
Published in the Journal of Geophysical Research [link]
Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850, part 2: biases and homogenisation
J. J. Kennedy, N. A. Rayner, R. O. Smith, D. E. Parker, and M. Saunby
Abstract. Changes in instrumentation and data availability have caused time-varying biases in estimates of global- and regional-average sea-surface temperature. The size of the biases arising from these changes are estimated and their uncertainties evaluated. The estimated biases and their associated uncertainties are largest during the period immediately following the Second World War, reflecting the rapid and incompletely documented changes in shipping and data availability at the time. Adjustments have been applied to reduce these effects in gridded data sets of sea-surface temperature and the results are presented as a set of interchangeable realisations. Uncertainties of estimated trends in global- and regional-average sea-surface temperature due to bias adjustments since the Second World War are found to be larger than uncertainties arising from the choice of analysis technique, indicating that this is an important source of uncertainty in analyses of historical sea-surface temperatures. Despite this, trends over the twentieth century remain qualitatively consistent.
Published in the Journal of Geophysical Research [link]
Some excerpts are provided from both of these papers, to which I will refer subsequently in my analysis and discussion of their treatment of uncertainty.
4.4. Coverage uncertainty
When calculating area averages from a gridded data set there is an additional uncertainty that arises because there are often large areas, and consequently, many grid boxes, which contain no observations. Such uncertainties are referred to here as coverage uncertainties. In Brohan et al.  coverage uncertainties were estimated by subsampling reanalysis data. A similar method is used here. SST anomalies from the globally complete HadISST1 data set were used in the place of reanalysis data. For example, to calculate the uncertainty on the March 1973 monthly average for the North Pacic a time series of North Pacic average SST anomalies was calculated using HadISST from 1870 to 2010. The coverage of HadISST at all time steps was then reduced to that of HadSST3 for March 1973. The North Pacific time series was recalculated from the sub-sampled data and the standard deviation of the difference between the series from the complete and subsampled series was used as an estimate of the uncertainty for March 1973. Data from the ERSSTv3b and COBE data sets were also used in place of HadISST1 and gave similar results suggesting that the uncertainties do not depend strongly on the statistical assumptions made in creating HadISST1. Coverage uncertainties calculated using HadISST1 are shown in Figure 6.
It should be noted that the adjustments presented here and their uncertainties represent a rst attempt to produce an SST data set that has been homogenized from 1850 to 2006. Therefore, the uncertainties ought to be considered incomplete until other independent attempts have been made to assess the biases and their uncertainties using different approaches to those described here.
5. Key results
Measurement and sampling errors (derived in part 1) are larger than in previous analyses of SST because they include the effects of correlated errors in the observations. Correlation between the measurement errors leads to an approximate two-fold increase in global- and hemispheric-average uncertainty. A time series of global-average, bias-adjusted SSTs with all uncertainty estimates combined is shown in Figure 11.
The uncertainty of global-average SST is largest in the early record and immediately following the Second World War. The reasons for the large uncertainties are in each case different. In the mid 19th century the largest components of the uncertainty at annual time scales are the measurement and sampling uncertainty and the coverage uncertainty because there were few observations made by a small global fleet. The bias uncertainties are relatively small because it was assumed that there was little variation in how measurements were made. By contrast, in the late 1940s and early 1950s, there is a good deal of uncertainty concerning how measurements were made. As a result the bias uncertainties are larger than the measurement and sampling uncertainties. After the 1960s bias uncertainties dominate the total and are by far the largest component of the uncertainty in the most recent data.
Figure 12 shows the ordinary least squares (OLS) trends in adjusted and unadjusted global-average SST for different periods all ending in 2006 and compares the trends to those in the previous Met Office Hadley Centre SST data set HadSST2 and those in the drifting buoy data. The adjustments have the effect of reducing the trend from 1940 to 2006, but generally increase the trend from 1980 to 2006. In the latter case the effects are not signicant given the wide uncertainty range. The trends in the adjusted and unadjusted series are, on average, lower than those from HadSST2 but are statistically indistinguishable except for start dates 1935-1955 and from 2003 onwards. Between 1995 and 2001, the trends in the unadjusted data lie at the lower end of the distribution of the trends in the adjusted series reflecting the rapid increase in the number of relatively-cold-biased buoy observations in the record at that time. The buoy data are shown from 1991, but 1996 was the first year in which more than one third of grid boxes were filled in every month. The coverage of buoy observations continued to increase throughout the period 1996-2006. The trends in the buoy data and the HadSST3 data are very similar after 1997 suggesting that the buoy data could be used alone to monitor global-average SST in the future.
6. Remaining issues
Finally, the estimates of biases and other uncertainties presented here should not be interpreted as providing a comprehensive estimate of uncertainty in historical sea-surface temperature measurements. They are simply a first estimate.Where multiple analyses of the biases in other climatological variables have been produced, for example tropospheric temperatures and ocean heat content, the resulting spread in the estimates of key parameters such as the long-term trend has typically been signicantly larger than initial estimates of the uncertainty suggested. Until multiple, independent estimates of SST biases exist, a signicant contribution to the total uncertainty will remain unexplored. This remains a key weakness of historical SST analysis.
The HadSST3 website also has a comprehensive writeup that summarizes their uncertainty analysis.
The analysis discusses the following sources and types of uncertainties:
- random observational errors
- systematic observational errors
- sampling uncertainty
- coverage uncertainty
- structural uncertainty
- unknown unknowns
Because Rayner et al. (2006) and Kennedy et al. (2011b) make no attempt to estimate temperatures in grid boxes which contain no observations, an additional uncertainty had to be computed when estimating area-averages. Rayner et al. (2006) used Optimal Averaging (OA) as described in Folland et al. (2001) which estimates the area average in a statistically optimal way and provides an estimate of the coverage uncertainty. Kennedy et al. (2011b) followed the example of Brohan et al. (2006) and subsampled globally complete fields taken from previous analyses. The uncertainty of the global average computed by Kennedy et al. (2011b) were generally larger than those estimated by Rayner et al. (2006). How comparable these two sets of numbers are is difficult to assess. Kennedy et al. (2011b) used a simple area weighted average of the available grid boxes, with no attempt made to optimise the weights to account for the distribution of data as Rayner et al. (2006) did.
The HadSST3 coverage uncertainty is largest (with a 2-sigma uncertainty of around 0.15°C) in the 1860s when coverage is poorest. This falls to 0.03°C by 2006. That this uncertainty should be so small – particularly in the nineteenth century – is surprising. To an extent the relatively small uncertainty might simply be a reflection of the assumptions made in the analyses used by Kennedy et al. (2011b) to estimate the coverage uncertainty. Another way of assessing the coverage uncertainty is to look at the effect of reducing the coverage of well-sampled periods to that of the less well sampled nineteenth century and recomputing the global average.
This figure shows the range of global annual average SSTs obtained by reducing each year to the coverage of years in the nineteenth century. So, for example, the range indicated by the blue area in the upper panel for 2006 shows the range of global annual averages obtained by reducing the coverage of 2006 successively to be at least as bad as 1850, 1851, 1852 and so on to 1899. The red line shows the global average SST anomaly from data that has not been reduced in coverage. For most years the difference between the subsampled and more fully sampled data is smaller than 0.15°C and the largest deviations are smaller than 0.2°C. For the coverage uncertainty on the global average to be significantly larger would require the variability in the nineteenth century data gaps to be much larger than in the well observed period.
The structural uncertainties associated with estimating SSTs in data voids and at data sparse times are somewhat better explored. A variety of different statistical techniques have been applied to both the in situ and satellite data providing a range of sometimes quite different results. The problem of creating globally complete analyses is challenging because of the relative sparseness of early observations, the non-stationarity of the changing climate and the fact that more observations are missing at colder periods than at warmer ones.
One particular concern is that patterns of variability in the modern era – which are used to train the statistical models – might not faithfully represent variability at earlier times. It is not clear to what extent this question has been resolved. Smith et al. (2008) allow for a non-stationary low frequency component in their analysis which weakens the criticism as it pertains to this analysis. They used sub-sampled climate models to optimise their algorithms in periods when data are few. Ilin and Kaplan (2009) and Luttinen and Ilin (2009) used iterative algorithms that make use of data throughout the record to estimate the covariance structures and other parameters of their statistical models. However, such methods will still tend to give a greater weight to periods with more plentiful observations.
Another concern is that methods which use Empirical Orthogonal Functions to describe the variability might inadvertantly impose long-range teleconnections that do not exist in the data (Dommenget 2007). Smith et al. (2008) explicily limit the range across which teleconnections can occur, likely mitigating this problem. A number of analysis methods have a tendency to lose variance at a range of time scales either because they do not explicitly resolve small scale processes (Kaplan et al. 1997, Smith et al. 2008) or because in the absence of data the method tends towards the climatological average (Ishii et al. 2005). Rayner et al. (2003) used the method of Kaplan et al. (1997) but with certain changes to the method to explicitly resolve the long-term trend and improve small scale variability where observations were plentiful. Karspeck et al (submitted) analyse the residual difference between the observations and the Kaplan et al. (1997) analysis using local non-stationary covariances and then draw a range of samples from the analysis posterior distribution in order to provide consistent variance at all times and locations.
Yasunaka and Hanawa (2011) examined a range of climate indices based on seven different SST products. They found that the disagreement between data sets was marked before 1880, and that the trends in large scale averages and indices tend to diverge outside of the common climatology period. For the global average, the differences between analyses was around 0.2K before 1920 and around 0.1-0.2K in the modern period. Even for relatively well observed events such as the 1925/26 El Nino, the detailed evolution of the event varied from analysis to analysis. The reasons for the differences are not completely clear because each data set is based on a slightly different set of observations, which have been quality controlled, and processed in different ways.
Currently, only a few groups provide explicit uncertainty estimates based on their analysis techniques. As noted above, the uncertainty estimates derived from a particular analysis will tend to underestimate the true uncertainty because they are conditional on the analysis method being correct.
In the 6 hours that I have been able to devote to this, here is my first take on their uncertainty analysis.
This effort represents a substantially more sophisticated assessment of the uncertainty than was seen in HADSST2 (which was the basis for much of the IPCC AR4 analysis). The have clearly identified the primary uncertainty locations, while at the same time allowing for the possibility of unknown unknowns. The statistical sophistication of the error analysis is also substantially improved. The acknowledge the incompleteness of their uncertainty quantification, i.e. there are some sources of uncertainty that were not quantified and they acknowledge that other methods may produce different uncertainty values.
My main critique of their uncertainty analysis is the issue of structural uncertainty and specifically their estimate of the uncertainty associated with incomplete coverage. Sophisticated statistical techniques (e.g. EOFs) are used to fill in missing data, using relationships and variability derived primarily during the period of good data coverage in the latter half of the 20th century. The farther back in time you go prior to this period, there is generally reduced global coverage of observations, with coverage very low prior to 1920 over much of the global ocean. So basically, values over much of the global ocean prior to 1950 are determined from a statistical model.
Errors associated with using this statistical model to determine the global average time series is estimated by subsampling the observations (primarily ship tracks) in the earlier period against reanalysis data for the modern period. One study has used simulation results from the GFDL model to assess the error. The coverage uncertaingy identified for HADSST3 (global annual average) has a maximum value of 0.08C (1-sigma) and 0.15C (2-sigma) circa the 1860’s. These value seems implausibly low for periods when data for a large portion of the global ocean is created by a statistical model.
A more sophisticated and systematic analysis of different assumptions about climate variability prior to the mid 20th century and different statistical methods is needed to provide a more realistic estimate of the coverage uncertainty. This might be accomplished by using coupled climate model simulations for the models that more reliably simulate the ocean climate for the last 20th century that each have multiple ensemble members to test the various statistical assumptions and subsampling on different realizations of the ocean climate since 1950.
The other structural uncertainty issue is the actual method for combining different biases and the uncertainty model itself. In the context of parametric uncertainty for the bias corrections, the determination of uncertainty is associated with a number of underlying assumptions that are uncertain in themselves.
The HADSST group are to be commended for their extensive efforts in attempting to quantify uncertainty in the HADSST3 dataset. They admit that their analysis is incomplete. I suspect that the major source of unaccounted uncertainty is structural error in their method for dealing with incomplete coverage. Improved methods to analyze uncertainty in the SST data set, particularly that accounts for structural uncertainty, are needed to provide a better understanding of the global SST and to produce more realistic assessments in the IPCC.
Here are two excerpts from the uncertainty web page regarding their advice on using the data set:
Finally, the work of identifying and quantifiying uncertainties will be pointless, if those uncertainties are not used when the SST data sets are used. Uncertainty estimates provided with data sets have sometimes been difficult to use, or easy to use inappropriately. As pointed out by Rayner et al. (2010), “more reliable and user-friendly representations of uncertainty should be provided” in order to encourage their widespread and effective use. HadSST3 has been produced as a set of interchangeable realisations that together define the bias uncertainty range of HadSST3. It is hoped that providing these 100 realisations in a form identical to the median estimate will encourage users to explore the sensitivity of their analysis to observational uncertainty with little extra effort. It is also suggested that users run their analysis on a range of different SST analyses.
It is common in detection and attribution studies only to use data where there are observations by reducing the coverage of the models to match that of the data. This reduces the exposure of the study to structural uncertainties associated with analysis techniques.
Let’s see if the IPCC AR5 adopts this advice in their assessments of likelihoods and confidence levels associated with the surface temperature record.
And finally, getting back to Sharper00’s challenge, Figure 11 of Part 2 clearly shows uncertainties exceeding 0.3C in the 19th century. And the authors admit that their uncertainty analysis is incomplete and likely to be too low:
As noted above, the uncertainty estimates derived from a particular analysis will tend to underestimate the true uncertainty because they are conditional on the analysis method being correct.