# On inappropriate use of least squares regression

by Greg Goodman

Inappropriate use of linear regression can produce spurious and significantly low estimations of the true slope of a linear relationship if both variables have significant measurement error or other perturbing factors. This is precisely the case when attempting to regress modelled or observed radiative flux against surface temperatures in order to estimate sensitivity of the climate system.

Figure 1 showing conventional and inverse ‘ordinary least squares’ fits to some real, observed climate variables.

Ordinary least squares regression ( OLS ) is a very useful technique, widely used in almost all branches of science. The principal is to adjust one or more fitting parameters to attain the best fit of a model function, according to the criterion of minimising the sum of the squared deviations of the data from the model.

It is usually one of the first techniques that is taught in schools for analysing experimental data. It is also a technique that is misapplied almost as often as it is used correctly.

It can be shown, that under certain conditions, the least squares fit is the best estimation of the true relationship that can be derived from the available data. In statistics this is often called the ‘best, unbiased linear estimator’ of the slope. (Those, who enjoy contrived acronyms, abbreviate this to “BLUE”.)

It is a fundamental assumption of this technique that the ordinate variable ( x-axis ) has negligible error: it is a “controlled variable”. It is the deviations of the dependant variable ( y-azix ) that are minimised. In the case of fitting a straight line to the data, it has been known since at least 1878 that this technique will under-estimate the slope if there is measurement or other errors in the x-variable. (R. J. Adcock) [link]

There are two main conditions for this result to be an accurate estimation of the slope. One is that the deviations of the data from the true relationship are ‘normally’ or gaussian distributed. That is to say that they are of a random nature. This condition can be violated by significant periodic components in the data or excessive number of out-lying data points. The latter may often occur when only a small number of data points is available and the noise, even if random in nature, is not sufficiently sampled to average out.

The other main condition is that there be negligible error ( or non-linear variability ) in the x variable. If this condition is not met, the OLS result derived from the data will almost always under-estimate the slope of the true relationship. This effect is sometimes referred to as regression dilution. The degree by which the slope is under-estimated is determined by the nature of the x and y errors but most strongly by those in x since they are required to be negligible for OLS to give the best estimation.

In this discussion, “errors” can be understood to be both observational inaccuracies and any variability due to some factor other than the supposed linear relationship that it is sought to determine by regression of the two variables.

In certain circumstances regression dilution can be corrected for, but in order to do so, some knowledge of the nature and size of the both x and y errors has to be known. Typically this is not the case beyond knowing whether the x variable is a ‘controlled variable’ with negligible error, although several techniques have been developed to estimate the error in the estimation of the slope [link].

A controlled variable can usually be attained in a controlled experiment, or when studying a time series, provided that the date and time of observations have been recorded and documented in a precise and consistent manner. It is typically not the case when both sets of data are observations of different variables, as is the case when comparing two quantities in climatology.

One way to demonstrate the problem is to invert the x and y axes and repeat the OLS fit. If the result were valid, irrespective of orientation, the first slope would be the reciprocal of second one. However, this is only the case when there is very small errors in both variables, ie. the data is highly correlated ( grouped closely around a straight line ). In the case of one controlled variable and one error prone variable, the inverted result will be incorrect. In the case of two datasets containing observational error, both results will be wrong and the correct result will generally lie somewhere in between.

Another way to check the result is by examining the cross-correlation between the residual and the independent variable ie. ( model – y ) vs x , then repeat for incrementally larger values of the fitted ratio. Depending on the nature of the data, it will often be obvious that the OLS result does not produce the minimum residual between the ordinate and the regressor, ie. it does not optimally account for co-variability of the two quantities.

In the latter situation, the two regression fits can be taken as bounding the likely true value but some knowledge of the relative errors is needed to decide where in that range the best estimation lies. There are a number of techniques such as bisecting the angle, taking the geometric mean (square root of the product), or some other average, but ultimately, they are no more objective unless driven by some knowledge of the relative errors. Clearly bisection would not be correct if one variable had low error, since the true slope would then be close to the OLS fit done with that quantity on the x-axis.

Figure 2. A typical example of linear regression of two noisy variables produced from synthetic randomised data. The true known slope used in generating the data is seen in between the two regression results. ( Click to enlarge graph and access code to reproduce data and graph. )

Figure 2b. A typical example of correct application of linear regression to data with negligible x-errors. The regressed slope is very close to the true value, so close as to be indistinguishable visually. ( Click to enlarge )

The larger the x-errors, the greater the skew in the distribution and the greater the dilution effect.

An Illustration: the Spencer simple model

The following case is used to illustrate the issue with ‘climate-like’ data. However, it should be emphasised that the problem is an objective mathematical one, the principal of which is independent of any particular test data used. Whether the following model is an accurate representation of climate ( it is not claimed to be ) has no bearing on the regression problem.

In a short article on his site Dr. Roy Spencer provided a simple, single-slab ocean, climate model with a predetermined feedback variable built into it. He observed that attempting to derive the climate sensitivity in the usual way consistently under-estimated the know feedback used to generate the data.

By specifying that sensitivity (with a total feedback parameter) in the model, one can see how an analysis of simulated satellite data will yield observations that routinely suggest a more sensitive climate system (lower feedback parameter) than was actually specified in the model run.

And if our climate system generates the illusion that it is sensitive, climate modelers will develop models that are also sensitive, and the more sensitive the climate model, the more global warming it will predict from adding greenhouse gasses to the atmosphere.

This is a very important observation. Regressing noisy radiative flux change against noisy temperature anomalies does consistently produce incorrectly high estimations of climate sensitivity. However, it is not an illusion created by the climate system, it is an illusion created by the incorrect application of OLS regression. When there are errors on both variables, the OLS slope is no longer an accurate estimation of the underlying linear relationship being sought.

Dr Spencer was kind enough to provide an implementation of the simple model in the form of a spread sheet download so that others may experiment and verify the effect.

To demonstrate this problem, the spreadsheet provided was modified to duplicate the dRad vs dTemp graph but with the axes inverted, ie. using exactly the same data for each run but additionally displaying it the other way around. Thus the ‘trend line’ provided by the spreadsheet is calculated with the variables inverted. No changes were made to the model.

Three values for the predetermined feedback variable were used in turn. Two values: 0.9 and 1.9 that Roy Spencer suggests represent the range of IPCC values and 5.0 which he proposes as a value closer to that which he has derived from satellite observational data.

Here is a snap-shot of the spreadsheet showing a table of results from nine runs for each feedback parameter value. Both the conventional and the inverted regression slopes and their geometric mean have been tabulated.

Figure 3. Snap-shot of spreadsheet, click to enlarge.

Firstly this confirms Roy Spencer’s observation that the regression of dRad against dTemp consistently and significantly under-estimates the feedback parameter used to create the data in the first place (and hence over-estimates climate sensitivity of the model). In this limited test, error is between a third and a half of the correct value. There is only one value of the conventional least squares slope that is greater than the respective feedback parameter value.

Secondly, it is noted that the geometric mean of the two OLS regressions does provide a reasonably close to the true feedback parameter, for the value derived from satellite observations. Variations are fairly evenly spread either side: the mean is only slightly higher than the true value and the standard deviation is about 9% of the mean.

However, for the two lower feedback values, representing the IPCC range of climate sensitivities, while the usual OLS regression is substantially less than the true value, the geometric mean over-estimates and does not provide a reliable correction over the range of feedbacks.

All the feedbacks represent a net negative feedback ( otherwise the climate system would be fundamentally unstable ). However, the IPCC range of values represents less negative feedbacks, thus a less stable climate. This can be seen reflected in the degree of variability in data plotted in the spreadsheet. The standard deviations of the slopes are also somewhat higher. This can be expected with less feedback controlling variations.

It can be concluded that the ratio of the proportional variability in the two quantities changes as a function of the degree of feedback in the system. The geometric mean of the two slopes does not provide a good estimation of the true feedback for the less stable configurations which have greater variability. This is in agreement with Isobe et al 1990 [link] which considers the merits of several regression methods.

The simple model helps to see how this relates to Rad / Temp scatter plots and climate sensitivity. However, the problem of regression dilution is a totally general mathematical result and can be reproduced from two series having a linear relationship with added random changes, as shown above.

What the papers say

A quick review of several recent papers on the problems of estimating climate sensitivity shows a general lack of appreciation of the regression dilution problem.

Estimates of Earth’s climate sensitivity are uncertain, largely because of uncertainty in the long-term cloud feedback.

Abstract: The sensitivity of the climate system to an imposed radiative imbalance remains the largest source of uncertainty in projections of future anthropogenic climate change.

There seems to be agreement that this is the key problem in assessing future climate trends. However, many authors seem unaware of the regression problem and much published work on this issue seems to rely heavily on the false assumption that OLS regression of dRad against dTemp can be used to correctly determine this ratio, and hence various sensitivities and feedbacks.

To assess climate sensitivity from Earth radiation observations of limited duration and observed sea surface temperatures (SSTs) requires a closed and therefore global domain, equilibrium between the fields, and robust methods of dealing with noise. Noise arises from natural variability in the atmosphere and observational noise in precessing satellite observations.

Whether or not the results provide meaningful insight depends critically on assumptions, methods and the time scales ….

Indeed so, unfortunately he then goes on to contradict earlier work by Lindzen and Choi that did address the OLS problem including a detailed statistical analysis comparing their results, by relying on inappropriate use of regression. Certainly not an example of the “robust methods” he is calling for.

Figure 4. Excerpt from Lindzen & Choio 2011, figure 7, showing consistent under-estimation of the slope by OLS regression ( black line ).

As shown by SB10, the presence of any time-varying radiative forcing decorrelates the co-variations between radiative flux and temperature. Low correlations lead to regression-diagnosed feedback parameters biased toward zero, which corresponds to a borderline unstable climate system.

This is an important paper highlighting the need to take account of the lagged response of the climate during regression to avoid the decorrelating effect of delays in the response. However, it does not deal with the further attenuation due to regression dilution. It is ultimately still based on regression of two error laden-variables and thus does not recognise regression dilution that is also present in this situation. Thus it is likely that this paper is still over-estimating sensitivity.

Using a more realistic value of σ(dF_ocean)/σ(dR_cloud) = 20, regression of TOA flux vs. dTs yields a slope that is within 0.4% of lamba.

Then in the conclusion of the paper, emphasis added:

Rather, the evolution of the surface and atmosphere during ENSO variations are dominated by oceanic heat transport. This means in turn that regressions of TOA fluxes vs. δTs can be used to accurately estimate climate sensitivity or the magnitude of climate feedbacks.

Also from a previous paper:

The impact of a spurious long-term trend in either dRall-sky or dRclear-sky is estimated by adding in a trend of T0.5 W/m 2/ decade into the CERES data. This changes the calculated feedback by T0.18 W/m2/K. Adding these errors in quadrature yields a total uncertainty of 0.74 and 0.77 W/m2/K in the calculations, using the ECMWF and MERRA reanalyses, respectively. Other sources of uncertainty are negligible.

The author was apparently unaware that the inaccuracy of regressing two uncontrolled variables is a major source of uncertainty and error.

[Our] new method does moderately well in distinguishing positive from negative feedbacks and in quantifying negative feedbacks. In contrast, we show that simple regression methods used by several existing papers generally exaggerate positive feedbacks and even show positive feedbacks when actual feedbacks are negative.

… but we see clearly that the simple regression always under-estimates negative feedbacks and exaggerates positive feedbacks.

Here the authors have clearly noted that there is a problem with the regression based techniques and go into quite some detail in quantifying the problem, though they do not explicitly identify it as being due to the presence of uncertainty in the x-variable distorting the regression results.

The L&C papers, to their credit, recognise that regression based methods on poorly correlated data seriously under-estimates the slope and utilise techniques to more correctly determine the ratio. They show probability density graphs from Monte Carlo tests to compare the two methods.

It seems the latter authors are exceptional in looking at the sensitivity question without relying on inappropriate use of linear regression. It is certainly part of the reason that their results are considerably lower than almost all other authors on this subject.

For less than perfectly correlated data, OLS regression of Q-N against δTs will tend to underestimate Y values and therefore overestimate the equilibrium climate sensitivity (see Isobe et al. 1990).

Another important reason for adopting our regression model was to reinforce the main conclusion of the paper: the suggestion of a relatively small equilibrium climate sensitivity. To show the robustness of this conclusion, we deliberately adopted the regression model that gave the highest climate sensitivity (smallest Y value). It has been suggested that a technique based on total least squares regression or bisector least squares regression gives a better fit, when errors in the data are uncharacterized (Isobe et al. 1990). For example, for 1985–96 both of these methods suggest YNET of around 3.5 +/- 2.0 W m2 K-1 ( a 0.7–2.4-K equilibrium surface temperature increase for 2 ϫ CO2 ), and this should be compared to our 1.0–3.6-K range quoted in the conclusions of the paper.

Here, the authors explicitly state the regression problem and its effect on the results of their study on sensitivity. However, when writing in 2005, they apparently feared that it would impede the acceptance of what was already a low value of climate sensitivity if they presented the mathematically more accurate, but lower figures.

It is interesting to note that Roy Spencer, in a non peer reviewed article, found an very similar figure of 3.66 W/m2/K by comparing ERBE data to MSU derived temperatures following Mt Pinatubo.[link]

So Forster and Gregory felt constrained to bury their best estimation of climate sensitivity, and the discussion of the regression problem in an appendix. In view of the ‘gatekeeper’ activities revealed in the Climategate emails, this may have been a wise judgement in 2005.

Now, ten years after the publication of F&G 2006, proper application of the best mathematical techniques available to correct this systematic over-estimation of climate sensitivity is long overdue.

A more recent study Lewis & Curry 2014 [link] used a different method of identifying changes between selected periods and thus is not affected by regression issues. This method also found lower values of climate sensitivity.

Conclusion

Inappropriate use of linear regression can produce spurious and significantly low estimations of the true slope of a linear relationship if both variables have significant measurement error or other perturbing factors.

This is precisely the case when attempting to regress modelled or observed radiative flux against surface temperatures in order to estimate sensitivity of the climate system.

In the sense that this regression is conventionally done in climatology, it will under-estimate the net feedback factor (often denoted as ‘lambda’). Since climate sensitivity is defined as the reciprocal of this term, this results in an over-estimation of climate sensitivity.

This situation may account for the difference between regression-based estimations of climate sensitivity and those produced by other methods. Many techniques to reduce this effect are available in the broader scientific literature, thought there is no single, generally applicable solution to the problem.

Those using linear regression to assess climate sensitivity need to account for this significant source of error when supplying uncertainly values in published estimations of climate sensitivity or take steps to address this issue.

The decorrelation due to simultaneous presence of both the in-phase and orthogonal climate reactions, as noted by Spencer et al, also needs to be accounted for to get the most accurate information from the available data. One possible approach to this is detailed here: https://judithcurry.com/2015/02/06/on-determination-of-tropical-feedbacks/

A mathematical explanation of the origin of regression dilution is provided here:
On the origins of regression dilution.

• DS. Grok and then mind following comments even if you do not grok the underlying math. This stuff is important. Science stats screw ups are unforgivable, yet common, in ‘climate science’.
And there are many examples. Mann’s centered PCA always producing a hockey stick from ARIMA red noise being Exhibit 1, thanks to Steve McIntyre and Ross McKitrick. And autocorrelated temp series of any sort ALWAYS produce red noise rather than white (random) noise. A statistical issue known in econometrics since I was a ‘grad’ student way back when.

• Greg Goodman

Well if you are not capable of following a simple article like that you’re probably wasting you time here.

Since this problem has been known for at least 120 years, I don’t think it would get far as a submission to a statistics journal and with the current editor of Nature having declared the debate is over I don’t see much hope of getting correction to anything published by them on the subject.

However, Judith has about 2500 people ‘following’ Climate Etc. most of the them presumably more technically competent than your good self.
Hopefully it will get noticed by some of those who would benefit from being aware of the issue.

• David Springer

• Greg Goodman

Well, I know enough to fit a straight line, so that puts be ahead of the pack , apparently. Since you admit you can’t even follow that much you you should probably stop wasting space and go walk the dog or something.

Haven’t got the time or the skill? You seem to have plenty of time so I guess its the other quality you are critically short of. Not that anyone would have guessed if you have not told us.

• Threatened self-esteem accounts for a large portion of conflict at the individual level.

• RobP

Actually, as someone who usually refers to sadistics rather than statistics, I found this post very well explained. I am not sure why the slope would always be under-estimated if the x- axis is not linear, but the fact that an OLS cannot give a correct measure in such a case is – almost – obvious.

Thank you, Greg, for your work. It explains quite a bit of the discrepancy between the different methods of estimating climate sensitivity.

• David Springer: This should be submitted to a journal not the general public.

It is too well known for that.

2. George Devries Klein, PhD, PG, FGSA

I learned this in a year-long statistics course way back when.

• Max
Max
Does that necessarily endorse all the hundreds of papers beyond the purview of the IPCC.
______

“I learned this in a year-long statistics course way back when.”

Sadly most climate scientists seem to have little or no formal training in how to process data. They just pick it up as they go along or make up new techniques themselves that they don’t have the wherewithal to validate before using in publishing results.

Since, by definition, peer review is done by their peers, the review process is blind to the problem too.

• Greg,

“Sadly most climate scientists seem to have little or no formal training in how to process data.”
_____________

How do you know?

If poor statistical practice is pervasive in climate science why doesn’t the American Statitical Association call attention to the problem and try to do something about it? The Association recently took the bold step of critisizing the p-value, so why not take issue with the way climate scientists use statistics?

• Max
The Association recently took the bold step of critisizing the p-value, so why not take issue with the way climate scientists use statistics?

By criticizing misuse of p-values the ASA is simultaneously criticising climate researchers (and publishers) who default to 0.5 as “proof” of significance (both statistical and scientific). Or, in the case of Karl’s “pause buster”, the less stringent 0.10 level.

Does the Cisco Kid not know American Statistical Association has an Advisory Committee on Climate Change Policy?

• opluso | March 11, 2016 at 9:20 am |

• Is opluso also unaware the ASA endorses the IPCC’s conclusions?

• JCH

• Wrong. It’s precisely about that.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

• opoluso, I’m sorry if I did’nt make myself clear. I was referring to following part of the ASA President’s comment in the ASA statement on p-values.

Yeah, precisely. LMAO.

That is a good question. There are a number of climate science papers that use “novel” statistical methods which should be considered by the statistical field before they are used in another field. In climate science there is a huge blend of specialties used but there doesn’t seem to be peer review by specialty.

I know of one case where a statistician that developed a method criticized a climate scientist that appeared to misapply the method, but nothing happened.

• JCH, “What it is about is multiple lines of evidence.”

Since climate science is more like a trial than “normal” science, you have to deal with the wonderful methods used by the legal system.

Marcott might be right – yep
• JCH

• JCH

• Double dippin dope…

No single mistake, or small set of mistakes, could notably change the results”

In order to validate your position you have to be your own worst critic and not let everything inconvenient slide. In post normal science that would be like concealing evidence which I believe is not Kosher.

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2631298

• Steven Mosher

Jamal .

“They provide only three values per day – the maximum temperature measured (TMAX), the
minimum temperature measured (TMIN) and an average value for all measurements made during each
day. The actual measurements from which the average was computed are not made available. From
these stations we use only the TMAX and the TMIN values because these values represent actual source”

• Steven Mosher

4. GG, a magnificent post. And you touch on only some of the OLS BLUE theorem problems I was taught back when. Heteroscedasticity, sometimes correctable by log normalization. Autocorrelation, sometimes correctible in error terms by ARIMA. Skewness or Kurtosis in error terms, a ‘certain’ diagnostic of BLUE theorem violations, hence OLS unreliability (since BLUE errors should be normally distributed).
IMO, the availability of simple stat packs doing all the calcs mindlessly now, vitiates all the painful ‘what could go wrong’ deep knowledge stuff back when we were crudely programming OLS stuff ourselves. And that lack of deep stats knowledge is the sharp point of your very pointed post.
Regards.

• Greg

thanks Ristivan, you are correct about stats packages. All people do it look up function, what arguments to feed it, and bang! No one bothers to find out when an where a method is applicable. That are not even aware that they should be doing this.

The prime one is spreadsheets like Excel. You click on a menu option “fit trend” and there is no warming that it may not be a valid result. ( About the only thing M\$ does not ask you “are you sure” about).

• tty

I strongly agree. I originally learnt the basics of statistics as part of a two year mathematics course at university. It was largely theory and not much applications (this was back in the pre-PC era).
Later I took a company-funded course in applied statistics (mostly as related to reliabilty estimates) and I was very surprised at the way that problems and pitfalls were glossed over or simply ignored, and how software packages were used virtually without any understanding of the theory underpinning the methods used.

• David Springer

What is wrong with you today? Greg Goodman gave a compact presentation of a well-known problem in OLS, and supplied relevant climate-related examples. It’s well suited to a range of people interested in climate statistics.

• Fercrisakes Daniel this is a poor venue for it. The following looks lovely. Free open source state of the art software, well illustrated.

• Dummie Donnie still has reptiles on the brain. Small minds, small thoughts.

5. Thanks much for such a comprehensive post, including a history lesson and links to literature as well as examples. Sharing now ~:-)

6. JCH

So Forster and Gregory felt constrained to bury their best estimation of climate sensitivity, and the discussion of the regression problem in an appendix. In view of the ‘gatekeeper’ activities revealed in the Climategate emails, this may have been a wise judgement in 2005.

More junk speculation.

• Greg

What are you trying to say is “junk”. ? The gatekeeping and corruption of peer process was clearly exposed in the CRU emails. F&G clearly state their motivation for not clearly exposing a more rigorous result. Were they guilty of junk speculation too?

The only thing that is speculative is whether this self-censorship was “wise” at the time. The speculative nature of that comment is reflected in the wording “this may have been a wise judgement in 2005.”

So you have nothing to say about the mathematical FACT that most of the literature attempting to estimate clmate sensitivity is biased by misapplication of simple, basic stats?

• Greg

Perhaps if you ( both ) developed the remarkable skill of READING and bother to READ the F&G paper that I provided a link to, you would be able to make more intelligent and pertinent remarks rather than uniformed, post from the hip, snide.

Just an idea.

Thanks for the civil question Max. Necessary for what?

I could have just posted one line : OLS, please read instructions before opening the packet.

Sorry, but I’m biased.

‘Here, the authors explicitly state the regression problem and its effect on the results of their study on sensitivity. However, when writing in 2005, they apparently feared that it would impede the acceptance of what was already a low value of climate sensitivity if they presented the mathematically more accurate, but lower figures.”

“So Forster and Gregory felt constrained to bury their best estimation of climate sensitivity, and the discussion of the regression problem in an appendix. In view of the ‘gatekeeper’ activities revealed in the Climategate emails, this may have been a wise judgement in 2005

• kenfritsch

I posed the question of using total least squares to whoever was the communicating author of Forster and Gregory- I do not recall which one it was – and showed him my results. His reply was that he did not think it was game changer or something to that effect.

As to what is included in the main paper and the SI can be very telling about the authors motivations and whether the author might be willing to bury a weakness in their thesis. If something is in the SI that could be a game changer means that the authors were aware of it but decided to put it in the SI. If a knowledgable reader sees it that way it should raise some antenna. Similar problems are where an obvious sensitivity test was not run and the question that would have presented was entirely ignored.

These situations have been documented in climate science papers and would appear to be more prevalent with authors who tend to be policy advocates.

• “As to what is included in the main paper and the SI can be very telling about the authors motivations and whether the author might be willing to bury a weakness in their thesis.”

It was Andrews who I corresponded about using TLS regression. What I found using TLS was that the ECS values where considerably larger than those reported by Gregory/Andrews and those I determined using LS regression. My reference to Andrew’s reply is in the second link below and he says that they did not look at using TLS even after it was reported that it might be an issue.

The secondary link in the first link below is to a table with my results using TLS and LS (and not RLS as the table label shows).

• kenfritsch

• Steven Mosher

Kneel..

Let’s be clear. I am skeptical that you can prove authorial intentions in this case.
But you seem to think that the science of reading texts is settled.

Go ahead… Prove that they felt fear as Greg claimed.

8. Reblogged this on TheFlippinTruth.

9. Yeahh, and inappropriate use can lead to high estimations as well. Statistics are a blunt tool suited to real correlations in correct relationships. Anything unknown produces equivocal scatter.

It is possible to construct a dataset where OLS will over estimate the slope but it’s pretty contrived. The term regression dilution is used because it will almost always reduce the fitted slope. In the case of CS , which is the reciprocal of the rad vs temp slope , this lead to over-estimation of CS.

• Yes, I understand this is a legitimate technical criticism akin to “running mean smoothers” a while back. Have done triple running means ever since that one, but since I have little regard for regressions through noisy data…

Right. Relatively greater values of negative feedback suggest lower values of sensitivity and greater stability (as this post shows). But if the feedback is not instantaneous but delayed – as it is with changes in atmospheric and ocean circulation, albedo, etc. – there will be the increasing likelihood of oscillations and ringing; that is, changes to climate not related to forcings but merely the character of a nonlinear, non-equilibrium system – like our planet’s. Kapish?

11. Herewith kinda’ deja-vu all over again apropos cli-sci-
Another opinion of Greg Goodman. About the same as mine.

• Greg Goodman

BTW , could you remind us what your opinion of my work is? Just speaking as someone that admits he does not have the skill to understand it, that could be very useful to others.

• AK

• David Springer,

Greg Goodman is correct on this technical point. consult the appendix Foster and Gregory: For less than perfectly correlated data, OLS regression of Q-N against δTs will tend to underestimate Y values and therefore overestimate the equilibrium climate sensitivity (see Isobe et al. 1990). .

Some more posits:

That makes no sense. All phenomena display random variation (variation that is non-reproducible and non-predictable; the distribution can sometimes be known with reasonable accuracy), and statistics is the study of how the random variation affects measurements, estimates, calculations, conclusions, and all inferences generally. Greg Goodman presented a short tutorial on one of the neglected consequences of random variation in the context of linear regression. You can’t increase the precision of knowledge by ignoring the random variability, all you can do that way is increase your error rates.

Back to the earlier question: Is there a reason that tamino is downplaying this problem, given that he and Gregory included it in an appendix to a published paper? I have seen papers where derivations and other details are presented in appendices or supporting online material (e.g. Romps et al), but I have not seen a paper where an important result derived in an appendix was not given prominence in the main text.

I have the document from the ASA and the comments by distinguished statisticians as well (Jim Berger, Rod Little, Greenland et al., Gelman) . I even forwarded them to some colleagues who are not ASA members. Your comments are not that good.

I didn’t say Greg is wagging the dog here. I don’t know. He’s not wagging the dog if what he is saying is a game changer. But he may be if what he’s saying makes little difference.

“If wanted to waste my days dealing with idiots like you and Tammy I would turn comments on.”

How odd – a defender of the faith complains, not with references to obvious errors of maths or logic in the post, but instead an attack on the tone and attitude of the post and the “my expert is smarter than yours” retort of a child.

Based upon your assertion in another article here that land and sea surface temperature records can’t be averaged together into anything meaningful it’s a pretty f*cking low opinion bolstered by your inability to publish any of this pap on anything more substantial than personal blogs. Point to something more substantial and I’ll apologize. Good luck.

• Springmiester, “Based upon your assertion in another article here that land and sea surface temperature records can’t be averaged together into anything meaningful it’s a pretty f*cking low….”

• JCH, Put down the crayons

• So much anger.

https://en.wikipedia.org/wiki/Argument_from_authority

• Plus lots ‘n lots Peter.

• Greg Goodman

That graph illustrates how you can get ‘accelerated warming’ out of a non trending periodic function, similar to the variations in climate.

• Greg, that’s exactly what Jones and Trenberth did in one of their misleading graphs in IPCC AR4.

Clearly extrapolating a model tuned to fit 1960 -1998 out to 2100 is absurd, even if the data was of exceptionally good rather than exceptionally bad quality.

• JCH
“Clearly extrapolating a model tuned to fit 1960 -1998 out to 2100 is absurd, even if the data was of exceptionally good rather than exceptionally bad quality.”
Here is a paper that appears to be discussing some of this stuff.

So much anger.

Except possibly for the speculation (which may be correct) of why Foster and Gregory put their most important result in the appendix where almost everyone would ignore it, the presentation by Greg Goodman is pretty good.

• JCH – the most important metric is almost certainly rate of change in ocean heat content. The signal is believed to be on the order of 0.5W/m2 more energy at top of atmosphere entering the system than is exiting which, because the ocean is by far the major solar heat reservoir, is roughly equivalent to how much the ocean is warming. The margin of error in measurment is +-4.0W/m2. Figure it out. We have to guess at the polarity using proxy measurements with similar physical inaccuracies like ocean level and arctic ice extent. Oh it must be warming because look at the ice melting. Right.

• max1ok

15. If you want to do blind statistics to suggest stuff, which is okay, MDSCAL is probably a good thing to look into.

17. A recently released statement by the American Statistical Association seems relevant.

18. Greg Goodman

First let me admit that my statistics is very limited but I get the idea of a tightly constrained x-axis (e.g. time of observation) vs. a Y- axis where there is significant error or variability.

Both reconstructions are from the same place with different sampling times. The Mothadi 2010 has about 400 years between samples and the Oppo 2009 about 50 years between samples. The Mothadi data was used by Marcott et al who tried time jiggling to reduce uncertainty. Doesn’t look like it would work very well.

It is a veritable orgy of rhetological fallacies.

22. David L. Hagen

Thanks, interesting.

23. The author says that to get random deviations from a ‘true relationship,’ the data must not embody, ‘significant periodic components’ or ‘excessive number of out-lying data points.’ That must be why Michael Mann decided to began the analysis leading to his apocryphal ‘hockey stick’ by first getting rid of the LIA and MWP, right?

• If you compare Airports with non airports the answer is the same.
• No doubt, but the point was the change in temperature over time.

• Greg Goodman

Good grief.

Greg could you discuss a little why and how averaging to reduce error does or does not apply in this context?

Do you understand the greenhouse effect?

Allow me belatedly to commend you on a post well done.

• Greg Goodman

