by Steven Mosher and Zeke Hausfather
Today the Berkeley Earth Surface Temperature Project released a major update to their temperature data. The update includes:
- Global and regional land temperature estimates back to the 1750s, with estimated uncertainties.
- Temperature figures and data for every country, state, city, and individual station.
- New estimates of the effect of early volcanoes as well as CO2 on the temperature record.
- Globally gridded min, max, and mean anomalies for 1×1 lat lon cells for each month for land areas.
The link to the new paper from the Berkeley Earth group is [here].
Figure 1: Land temperature with 10-year running averages. The shaded regions are the two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the Hadley / CRU record refers to the CRUTEM4 series. Click on image to embiggen.
Figure 1 shows the newly released temperature data from 1750 to 2012 compared to land temperature reconstructions from NASA GIStemp, NOAA’s NCDC, and Hadley/UEA’s CRUTEM over their available records. Berkeley’s record overlaps quite well with existing records from about 1875 onwards, and includes the first ever global land temperature estimates from 1753-1850.
The Berkeley Earth method differs previous groups in several ways. Rather than adjust (“homogenize”) individual records for known and presumed discontinuities (e.g. from instrument changes and station moves), it splits the records at such points, creating essentially two records from one. This procedure, referred to as the scalpel, was completely automated to reduce human bias. The 36,866 records were split, on average, 3.9 times each to create 179,928 record fragments.
The Berkeley data is plotted with uncertainties estimated via randomly subdividing the 179,928 scalpeled stations into 8 smaller sets, calculating global land averages for each of those, and then comparing the results using the “jackknife” statistical method. Spatial sampling uncertainties were estimated by simulating poorly sampled periods (e.g. 1753 to 1850) with modern data (1960 to 2010) for which the Earth coverage was better than 97% complete, and measuring the departure from the full site average when using only the limited spatial regions available at early times.
More details on the methods used are available in the Methods Paper.
Figure 2: Number of station records available for use for each month from 1700 to present for Berkeley (red) and GHCN-M version 3.1 (blue). Click on image to embiggen.
Figure 2 shows the number of station records available for each month in both the existing GHCN-Monthly data (used as the basis for reconstructions by GISTemp/NCDC/CRUTEM) and the new Berkeley data. For the period from 1700-1800 Berkeley uses 27 percent more station months. For the 1800-1900 period, this number rises to 50 percent more station months. For the post-1900 period averages 240 percent more station months, up to about 700 percent more in the present period.
While this additional station data does not result in significantly different results over the past century on a global scale, it helps improve our ability to map regional temperatures and perform analysis that requires taking subsets of stations (e.g. comparing urban to less urban station).
Figure 3: Change in diurnal temperature range from 1900 to 2012.
Figure 3 shows changes in the diurnal temperature range (DTR) over the past century. One noteworthy feature is the uptick in DTR post-1980s, something that has not been present in past analysis (e.g. Vose et al 2005) which have shown a relatively flat DTR during that period.
Figure 4: The annual and decadal land surface temperature from the BerkeleyEarth average, compared to a linear combination of volcanic sulfate emissions and the natural logarithm of CO2. It is observed that the large negative excursions in the early temperature records are likely to be explained by exceptional volcanic activity at this time.
Figure 4 shows temperatures from 1750 with a simple linear fit using records of volcanic sulphate emissions and atmospheric CO2 concentrations. The strong negative excursions in the early period closely match major volcanic events (detected by sulphate deposition in ice cores). This is the first time that major volcanic events prior to 1850 have been matched to estimates of global land temperatures, though it is worth noting that stations available prior to 1850 are primarily located in the Northern Hemisphere, and may amplify the observed volcanic response to Northern Hemisphere volcanoes.
In addition to global results, Berkeley has released new temperature reconstructions of every continent, country, and major city globally, as well as records for every U.S. state. These include various summary statistics, the number of stations used over time, and the data used in the temperature plots in a plain-text format. All of the country data, as well as data for all individual stations used is available for browsing here: http://berkeleyearth.org/locations/
Figure 6: Sample station temperature record along with regional and global record for a particularly iconic station.
The individual station records (coming soon), show the raw station records (in red), the best estimate of the regional record via kriging in blue, and the global land temperature record in gray for comparison.
As a way of clarifying the issues that have been raised we identify the following categories: 1) Questions of relevance. 2) Questions of method. 3) Questions of data. 4) Political considerations. The questions of relevance–is the surface temperature the most important climate metric and is the surface temperature physically meaningful—are not directly addressed in these papers. For example, Peilke’s argument that Ocean Heat Content is more meaningful and arguments that surface temperature is “meaningless” are not addressed directly. The latter argument, however, is addressed indirectly through one of the findings of the method. One result of the Berkeley Earth kriging approach is the construction of a temperature field for the entire land surface, such that every location on the land has an estimated temperature that is a function of latitude, longitude, altitude, climatology, and a residual – or weather. The field provides an estimate of the local temperature +-1.6C. This is both a means to test the method as new observations are added from historical records and gives meaning to the term “global temperature average.” The global average is that estimate which minimizes the error if one wants to estimate the temperature at an unknown location. If you know the location, but have no measures at the location, the estimate that minimizes the error is given by the field value at that location. Another way to look at this is that kriging gives us an estimate of what we can expect to find in unobserved locations.
There also have been several persistent questions of method in the debate over the global temperature average: questions about how the arctic region is handled (GISS approach and CRU approach); questions about how stations are selected and combined (GISS approach of reference stations and CRU approach of selecting stations that persist through a common anomaly period) and finally questions about adjustments. We defer the latter to the discussion about data. The Berkeley method relies on kriging, which is mathematically known to be an optimal solution for the problem at hand. This does not make the other methods wrong, merely sub optimal, in fact, the results shown in the results paper indicate that the other methods provide very similar results. This is not uncommon. In fact, much of the literature examining different methods for producing area averages of temperature often find only slight advantages for the various methods. One notable aspect of achieving similar results, for example, is the realization that concerns over how the arctic is treated in land temperature reconstructions are vastly overblown.
With regards to station combining, Berkeley has taken a diametrically different approach. The RSM method of GISS and the CAM method of CRU both use criteria of temporal overlap to select stations. The spatio-temporal problem is reduced to a spatial problem by selecting or constructing long series. In the Berkeley method all station data is used. Thus the 7000 stations of GISS and the 4228 stations of CRU4 are expanded and 36,866 stations are used; the spatio temporal problem is solved simultaneously. In this regard the method is similar to that implemented by skeptical blogger JeffId (with RomanM) and the methods developed by Nick Stokes and Tamino.
This approach also afforded Berkeley the opportunity to generate the result using subsets of station data. For example, only using stations that CRU and GISS do not. Those results match the results achieved with the entire dataset, thus showing that concerns over the “great thermometer” drop out are unfounded, as has been shown before with other analytic approaches (see Zeke, Nick Stokes and Mosher).
One unique feature of the Berkeley approach is the use of the scalpel. In GISS and CRU approaches stations are selected for their temporal coverage. And in some cases to achieve temporal continuity stations are spliced or adjusted so they can be treated as homogenous over time. In the Berkeley method there is no splicing.
Instead, stations records are “scalpeled” when a break point is found. For example, in other methods if a station moves from location X, to a higher elevation, a deterministic adjustment for changing elevation is made. In Berkeley Method the stations are treated as two separate stations. That is what they are. One drawback of the deterministic homogenization approach is that the errors and uncertainties due to adjustment are not propagated into the final answer. By slicing station records, this potential error/uncertainty due to adjustment is folded into the final confidence interval.
Another benefit of the kriging approach is that the total measurement error at the station level is estimated via a top down fashion, in sharp contrast to CRU method and the GISS approach. In the CRU approach the errors are estimated or built up in a bottom up fashion, so there is an estimate for thermometer error, for recording error, etc. This approach, it has been argued, leads to an underestimation of error. For example, siting errors are assumed to be normally distributed with a mean of zero. In the Berkeley approach no such assumptions are made and the error at the monthly station level is estimated from the top down. The nugget effect is calculated as 0.46C, which is considerably higher that the CRU estimate of 0.06C. That nugget effectively represents the sum of all errors at the average station, including errors due to different instruments, errors due to different observation practices, siting errors, etc. By looking at the correlation of all stations to all other stations, the difference between two stations that are co-located is 0.46C. While that looks substantially higher than bottoms up estimates it includes all forms of error by definition.
The most contentious area of scientific debate, however, revolves around the area of data and data adjustments. There are questions about the amount of data (number of stations), questions about geographical distribution, and lastly questions about using raw versus adjusted data. The methods paper demonstrates what has been widely claimed and widely doubted. The average land surface temperature can be estimated with relatively few stations. Of course uncertainties for small numbers of stations is higher, but the numbers used by GISS, NCDC, and CRU4 are more than adequate. This is due to the correlation length scale, which extends up to 1000 km or farther depending on the latitude and season. As stated before questions about the global average being a function of “dropping” stations are now definitively answered. The average is not materially impacted by any great thermometer drop out. That can be demonstrated any number of ways: by including stations that have dropped out of GHCN; by using only stations that remain in the record and by using random subsamples of the 36,866 stations.
The geographical distribution question is also more fully addressed. Stations were added from times and locations where GISS and CRU had no data. If there was a spatial bias in the GISS sample or CRU sample we would expect to find differences. With a few exceptions in the early part of the record, our answers match theirs. Where we differ is in our ability to push the record back to 1753. There are records that start and end before CRU’s common anomaly period. CRU are forced by method to drop these. There are stations that start after the common anomaly period, those are included in Berkeley earth as well. The results paper contains a few paragraphs discussing this early record in relation to the volcano record. We suspect this will occasion some heated discussion. Finally, the presence of a longer record allows us to make a preliminary and heavily caveated estimation of climate sensitivity. What is notable in that exercise is that changes in solar output had no discernible effect on the regression. In short, radiative forcing from GHGs and volcanic aerosols explains a great deal of the land record with a residual that follows a natural cycle: AMO.
That leaves of course the question about adjustments. CRU4 uses data that has been adjusted by National Weather Services. So for example they use the 207 homogenized station series for Canada provided by the Canadian weather service. CRU, in other words, don’t adjust data; they use data that has been adjusted. GISS, on the other hand, use adjusted data (GHCNv3) and make additional adjustments for the UHI effect.
In the Berkeley approach every attempt is made to use first reports. We avoid the term “raw” data because one can never know that data that purports to be “raw” is in fact “raw.” One sign that it is raw is the presence of errors, for example, temperatures of 1000C or -200C. An original report or first report is a report that has no documentation asserting that adjustments have been made. Adjusted data, on the other hand, normally has documentation with it that asserts that adjustments were made and often details the adjustments. First reports, thus are taken as “raw” unless there is reason to believe that they have been adjusted. For GHCNV3 data, for example, Berkeley uses the unadjusted data as opposed to the adjusted data than both CRU and GISS use. The largest data sources in the Berkeley Earth approach are daily data which is typically unadjusted. There are 27,000 stations taken from GHCN daily which is available in a form prior to the application of any QA procedures.
Finally, during the course of merging datasets and eliminating duplicates unadjusted data is given priority over adjusted data, such that data that comes from CRU4 for a given location is not included unless none of the other 15 data sources has data for the same location. That said, the possibility that some unadjusted stations may have been adjusted before entering the source datasets remains. But that mere possibility does not amount to a fact, and the ability to achieve the same result using random subsets argues for the proposition that there is no significant adjusted data contributing to or biasing the result.
Berkeley results demonstrate that the adjustments made to data do not materially impact the results. The process of identifying structural breaks in the time series and “slicing stations” is fully automated. There is no effort to cool the past or warm the present. Where stations exhibit an objective change in regime they are broken into separate records and then the kriging process estimates the field accordingly.
Finally there are the political or personal issues that have been raised around the subject. We know of no way to see into the hearts of men. Science provides us with some safeguards against personal bias: sharing data and sharing code. See the new website for those resources.
Disclaimer: Both Steven Mosher and Zeke Hausfather are participants in the Berkeley Earth Surface Temperature project. However, the content of this post reflects only their personal opinions and not those of the project as a whole.
JC comment: With regards to the new paper, I strongly disagree with their interpretation of attribution (Fig 3 in the post above). Here is what I have been saying in response to media queries:
The BEST team has produced the best land surface temperature data set that we currently have. It is best in the sense of including the most data and extending further back in time. The data quality control and processing use objective, statistically robust techniques. And most importantly, the data set is online and well documented, with a friendly user interface. That said, the scientific analyses that the BEST team has done with the new data set are controversial, including the impact of station quality on interpreting temperature trends and the urban heat island effect.
Their latest paper on the 250 year record concludes that the best explanation for the observed warming is greenhouse gas emissions. In my opinion, their analysis is way over simplistic and not at all convincing . There is broad agreement that greenhouse gas emissions have contributed to the warming in the latter half of the 20th century; the big question is how much of this warming can be attributed to greenhouse gas emissions. I dont think this question can be answered by the simple curve fitting used in this paper, and I don’t see that their paper adds anything to our understanding of the causes of the recent warming. That said, there are two interesting results in this paper, regarding their analysis of 19th century volcanoes and the impact on climate, and also the changes to the diurnal temperature range.