by Judith Curry
Last week, Richard Muller testified at the U.S. House of Representatives Hearing on Climate Change: Examining the Processes Used to Create Science and Policy [see here].
Muller’s testimony has drawn numerous vociferous and contradictory responses from the blogosphere and mainstream media. The issue is the Berkeley Earth Surface Temperature Project, of which Muller is the Director [see here]. Lets take a look at what Muller said, some the responses, and then play Monday morning quarterback in terms of evaluating the responses and pondering if/how this might have been handled better.
Context for Muller’s testimony
The topic of the hearing is examining the processes used to create science and policy. As per the hearing charter:
This hearing will provide an overview of some of the process questions within climate change science and policy that have been raised in recent years.
The significance of and concern regarding the emails has been heightened by the fact that CRU is one of the primary institutions that provide data and information to the IPCC, raising questions regarding the integrity of the models, data and processes, and ultimately the key scientific conclusions upon which climate policies are based.
In recent years, there have been questions regarding not only the quality of the data collected but also the processes used for normalization (in order to compare “apples to apples”). The quality of data collected from instruments that have not been maintained or whose placement violates government positioning procedures has not been established. Furthermore, the process used for quality assurance has come under question as well, prompting several data quality projects across the country to test the quality of the data used in climate change science.
So Muller was asked to comment on issues related to quality assurance, specifically with regards to the surface temperature data. He was not asked to provide a definitive estimate of the temperature trend over the past century.
Muller was asked to testify by the Republicans. If the Republicans wanted a “denier” to testify, they would not have invited Muller to testify. Note, the Republicans also invited me to testify several months ago, and I am hardly a denier or even generally regarded as a skeptic. Muller and has associates have made numerous public statements about being concerned about global warming. That said, Muller has also been harshly critical of the behavior revealed by the CRU emails, the hockey stick and hide the decline. He has also been concerned about the surface temperature record, the statement below is from the Berkeley Earth web page:
The most important indicator of global warming, by far, is the land and sea surface temperature record. This has been criticized in several ways, including the choice of stations and the methods for correcting systematic errors. The Berkeley Earth Surface Temperature study sets out to to do a new analysis of the surface temperature record in a rigorous manner that addresses this criticism. We are using over 39,000 unique stations, which is more than five times the 7,280 stations found in the Global Historical Climatology Network Monthly data set (GHCN-M) that has served as the focus of many climate studies.
Our aim is to resolve current criticism of the former temperature analyses, and to prepare an open record that will allow rapid response to further criticism or suggestions. Our results will include not only our best estimate for the global temperature change, but estimates of the uncertainties in the record.
So why ask Muller and not someone else who has a longer track record of working with surface data? Well part of the reason is probably Muller’s extremely impressive record in science, which is summarized by this wikipedia page.
What Muller’s testimony said
Relevant excerpts from Muller’s testimony:
The project has already merged 1.6 billion land surface temperature measurements from 16 sources, most of them publicly available, and is putting them in a simple format to allow easy use by scientists around the world. By using all the data and new statistical approaches that can handle short records, and by using novel approaches to estimation and avoidance of systematic biases, we expect to improve on the accuracy of the estimate of the Earth’s temperature change.
Prior groups (NOAA, NASA, HadCRU) selected for their analysis 12% to 22% of the roughly 39,000 available stations. (The number of stations they used varied from 4,500 to a maximum of 8,500.)
They believe their station selection was unbiased. Outside groups have questioned that, and claimed that the selection picked records with large temperature increases. Such bias could be inadvertent, for example, a result of choosing long continuous records. (A long record might mean a station that was once on the outskirts and is now within a city.)
To avoid such station selection bias, Berkeley Earth has developed techniques to work with all the available stations. This requires a technique that can include short and discontinuous records.
In an initial test, Berkeley Earth chose stations randomly from the complete set of 39,028 stations. Such a selection is free of station selection bias.
In our preliminary analysis of these stations, we found a warming trend that is shown in the figure. It is very similar to that reported by the prior groups: a rise of about 0.7 degrees C since 1957. (Please keep in mind that the Berkeley Earth curve, in black, does not include adjustments designed to eliminate systematic bias.)
The Berkeley Earth agreement with the prior analysis surprised us, since our preliminary results don’t yet address many of the known biases. When they do, it is possible that the corrections could bring our current agreement into disagreement.
Why such close agreement between our uncorrected data and their adjusted data? One possibility is that the systematic corrections applied by the other groups are small. We don’t yet know.
The main value of our preliminary result is that it demonstrates the Berkeley Earth ability to use all records, including those that are short or fragmented. When we apply our approach to the complete data collection, we will largely eliminate the station selection bias, and significantly reduce statistical uncertainties.
Many temperature stations in the U.S. are located near buildings, in parking lots, or close to heat sources. Anthony Watts and his team has shown that most of the current stations in the US Historical Climatology Network would be ranked “poor” by NOAA’s own standards, with error uncertainties up to 5 degrees C.
Did such poor station quality exaggerate the estimates of global warming? We’ve studied this issue, and our preliminary answer is no.
The Berkeley Earth analysis shows that over the past 50 years the poor stations in the U.S. network do not show greater warming than do the good stations.
Thus, although poor station quality might affect absolute temperature, it does not appear to affect trends, and for global warming estimates, the trend is what is important.
Our key caveat is that our results are preliminary and have not yet been published in a peer reviewed journal. We have begun that process of submitting a paper to the Bulletin of the American Meteorological Society, and we are preparing several additional papers for publication elsewhere.
NOAA has already published a similar conclusion – that station quality bias did not affect estimates of global warming – — based on a smaller set of stations, and Anthony Anthony Watts and his team have a paper submitted, which is in late stage peer review, using over 1000 stations, but it has not yet been accepted for publication and I am not at liberty to discuss their conclusions and how they might differ. We have looked only at average temperature changes, and additional data needs to be studied, to look at (for example) changes in maximum and minimum temperatures.
In fact, in our preliminary analysis the good stations report more warming in the U.S. than the poor stations by 0.009 ± 0.009 degrees per decade, opposite to what might be expected, but also consistent with zero. We are currently checking these results and performing the calculation in several different ways. But we are consistently finding that there is no enhancement of global warming trends due to the inclusion of the poorly ranked US stations.
Berkeley Earth hopes to complete its analysis including systematic bias avoidance in the next few weeks. We are now studying new approaches to reducing biases from:
1. Urban heat island effects. Some stations in cities show more rapid warming than do stations in rural areas.
2. Time of observation bias. When the time of recording temperature is changed, stations will typically show different mean temperatures than they did previously. This is sometimes corrected in the processes used by existing groups. But this cannot be done easily for remote stations or those that do not report times of observations.3. Station moves. If a station is relocated, this can cause a “jump” in its temperatures. This is typically corrected in the adjustment process used by other groups. Is the correction introducing another bias? The corrections are sometimes done by hand, making replication difficult.
4. Change of instrumentation. When thermometer type is changed, there is often an offset introduced, which must be corrected.
Based on the preliminary work we have done, I believe that the systematic biases that are the cause for most concern can be adequately handled by data analysis techniques. The world temperature data has sufficient integrity to be used to determine global temperature trends.
Despite potential biases in the data, methods of analysis can be used to reduce bias effects well enough to enable us to measure long-term Earth temperature changes. Data integrity is adequate. Based on our initial work at Berkeley Earth, I believe that some of the most worrisome biases are less of a problem than I had previously thought.
WUWT has been harshly critical of Muller’s testimony, after Watts initially showed support for the study:
But here’s the thing: I have no certainty nor expectations in the results. Like them, I have no idea whether it will show more warming, about the same, no change, or cooling in the land surface temperature record they are analyzing. Neither do they, as they have not run the full data set, only small test runs on certain areas to evaluate the code. However, I can say that having examined the method, on the surface it seems to be a novel approach that handles many of the issues that have been raised.
As a reflection of my increased confidence, I have provided them with my surfacestations.org dataset to allow them to use it to run a comparisons against their data. The only caveat being that they won’t release my data publicly until our upcoming paper and the supplemental info (SI) has been published. Unlike NCDC and Menne et al, they respect my right to first publication of my own data and have agreed.
And, I’m prepared to accept whatever result they produce, even if it proves my premise wrong. I’m taking this bold step because the method has promise. So let’s not pay attention to the little yippers who want to tear it down before they even see the results. I haven’t seen the global result, nobody has, not even the home team, but the method isn’tthe madness that we’ve seen from NOAA, NCDC, GISS, and CRU, and, there aren’t any monetary strings attached to the result that I can tell. If the project was terminated tomorrow, nobody loses jobs, no large government programs get shut down, and no dependent programs crash either. That lack of strings attached to funding, plus the broad mix of people involved especially those who have previous experience in handling large data sets gives me greater confidence in the result being closer to a bona fide ground truth than anything we’ve seen yet. Dr. Fred Singer also gives a tentative endorsement of the methods.
The week before Muller testified, there was discussion between Watts and Muller about what Muller could/should say about his results that involved used of Watts’ surface station classification data. Muller wanted to respect the agreement that he had made with Watts. This discussion culminated in Watts sending a letter to House Committee just prior to the Hearing, which is described in this post. The post starts out with the following statement from Watts:
There seems a bit of a rush here, as BEST hasn’t completed all of their promised data techniques that would be able to remove the different kinds of data biases we’ve noted. That was the promise, that is why I signed on (to share my data and collaborate with them). Yet somehow, much of that has been thrown out the window, and they are presenting some results today without the full set of techniques applied. Based on my current understanding, they don’t even have some of them fully working and debugged yet. Knowing that, today’s hearing presenting preliminary results seems rather topsy turvy. But, post normal science political theater is like that.
Watts is criticizing Muller for something he did not set out to do, for an analysis that Muller did not do. Muller’s presentation of preliminary results was clearly motivated by the context of providing support for his assessment on the challenges and issues surrounding the surface temperature data record.
In Watts’ letter to the House Committee, he provides the abstract for the Fall et al paper that is in review:
While NOAA and Dr. Muller have produced analyses using our preliminary data that suggest siting has no appreciable effect, our upcoming paper reaches a different conclusion.
Our paper, Fall et al 2011 titled “Analysis of the impacts of station exposure on the U.S. Historical Climatology Network temperatures and temperature trends” has this abstract:
The recently concluded Surface Stations Project surveyed 82.5% of the U.S. Historical Climatology Network (USHCN) stations and provided a classification based on exposure conditions of each surveyed station, using a rating system employed by the National Oceanic and Atmospheric Administration (NOAA) to develop the U.S. Climate Reference Network (USCRN). The unique opportunity offered by this completed survey permits an examination of the relationship between USHCN station siting characteristics and temperature trends at national and regional scales and on differences between USHCN temperatures and North American Regional Reanalysis (NARR) temperatures. This initial study examines temperature differences among different levels of siting quality without controlling for other factors such as instrument type.
Temperature trend estimates vary according to site classification, with poor siting leading to an overestimate of minimum temperature trends and an underestimate of maximum temperature trends, resulting in particular in a substantial difference in estimates of the diurnal temperature range trends. The opposite-signed differences of maximum and minimum temperature trends are similar in magnitude, so that the overall mean temperature trends are nearly identical across site classifications. Homogeneity adjustments tend to reduce trend differences, but statistically significant differences remain for all but average temperature trends. Comparison of observed temperatures with NARR shows that the most poorly-sited stations are warmer compared to NARR than are other stations, and a major portion of this bias is associated with the siting classification rather than the geographical distribution of stations. According to the best-sited stations, the diurnal temperature range in the lower 48 states has no century-scale trend.
Watts then concludes his letter to the House Committee with this statement:
It is our contention that many fully unaccounted for biases remain in the surface temperature record, that the resultant uncertainty is large, and systemic biases remain. This uncertainty and the systematic biases needs to be addressed not only nationally, but worldwide. Dr. Richard Muller has not yet examined these issues.
Huh? What the heck is all this about? As Mosher stated on a previous thread:
You should also note that Watts abstract CONFIRMS Muller’s preliminary finding. station quality does not impact the MEAN temperature recorded. ( it does hit diurnal temperature range which is a cool thing folks will have to look at). people miss that the mean temp is not affected.
Transparency vs right of first spin
What seems to be going on here is a struggle between the conflicting goals of transparency and the right to publish your own data first. Watts feels that he got burned when he had a preliminary version of the station classification online, and Menne et al. published a paper using this preliminary data set. Watts then pulled his data set from the web. A paper (Fall et al.) is in the review process. As I understand it, once the paper is accepted for publication, Watts will make the data set publicly available.
With regards to the Berkeley Earth Surface temperature data set, it is my understanding that the data set will be released once papers are submitted for publication. This is presumably something that Watts can understand.
Muller did not present in testimony any of the graphs where he has analyzed Watts’ data, respecting Watts’ right to be the first to publish his own data.
Lets face it, the timing of the testimony threw a spanner in the works of the schedule for Watts and Muller releasing information on their respective studies. Should Muller (or Watts or anyone else for that matter) have declined to testify on this subject, having done relevant analyses but not yet having made the data/analyses public? If Watts thinks that it was premature for Muller present his preliminary (unpublished) analyses in testimony, then it was equally premature for Watts to present to the Committee his (unpublished and otherwise unavailable) analysis. A strong argument can be made that Muller had a responsibility to provide to the House Committee his best assessment of the data quality and analysis issues, using his background knowledge and expert judgment.
Muller has been taking hits from both “sides.” On the “warm” side, Joe Romm has been his typically vicious self:
Romm’s alter ego on the Republican side, Marc Morano, has a one-stop shop for Muller smears:
which links to more than a dozen posts.
Monday morning quarterbacking
4 days after the testimony, in hindsight, should/could Muller have done anything differently? He could have declined to testify, but I’m not sure how that would have helped anything (other than to save Muller alot of personal grief). He could have thrown Watts under the bus and published results using Watts’ data. Other ideas?