Comment and Reply to GRL on evaluation of CMIP6 simulations

by Nicola Scafetta

Outcome of an exchange of Comments at Geophysical Research Letters (GRL)  on my paper regarding ECS of CMIP6 climate models

Back in March 2022 Gavin Schmidt on RealClimate.org critiqued one of my papers:

  • Scafetta, N., Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m, Geophysical Research Letters, 49, e2022GL097716, 2022, https://doi.org/10.1029/2022GL097716.

My GRL paper compared the warming of the global surface temperature data from 1980–1990 to 2011–2021 against the CMIP6 GCM hindcasts and found that only the GCM macro-ensemble made with the models with an Equilibrium Climate Sensitivity (ECS) ≤ 3 °C well agrees with the global surface temperature observations. The result is rather important because the GCMs with a low ECS are also those that project a moderate and nonalarming warming for the 21st century, in particular when the SSP2-4.5 scenario, which is the only SSP that seems to be realistic, is used for the climate projections.

Schmidt disliked my paper and claimed that it contains “numerous conceptual and statistical errors that undermine all of the conclusions”. Together with Gareth Jones and John Kennedy, he wrote a letter to the Editorial Board of GRL asking them to retract my paper. They claimed that (1) my GRL 2022 paper overlooked the error of the mean of the temperature data from 2011 to 2021, which they claimed to be 0.10 °C, and (2) they insisted that “the full ensemble for each model must be used” to test the models.

Their retraction request was rejected. GRL decided that a Comment-Reply exchange was more appropriate to clarify the subtle statistical issues that were being raised by their critiques and my rebuttals. Thus, Schmidt, Jones and Kennedy submitted their formal Comment, which essentially repeated the claims previously published on Real Climate. After their Comment was accepted on the 28th of January 2023, GRL asked me to write a formal Reply, which I submitted on the 21st of February 2023. My Reply was accepted on the 22nd of July 2023 and, finally, on the 21st of September both papers were published by GRL:

  • Schmidt, G.A., Jones, G.S., & Kennedy, J.J. (2023). Comment on “Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m” by N. Scafetta (2022). Geophysical Research Letters, 50, e2022GL102530. https://doi.org/10.1029/2022GL102530
  • Scafetta, N. (2023). Reply to “Comment on ‘Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m’ by N. Scafetta (2022)” by Schmidt et al. (2023). Geophysical Research Letters, 50, e2023GL104960. https://doi.org/10.1029/2023GL104960

My Reply demonstrates that Schmidt et al. made gross statistical and physical errors and that, in any case, their critiques do not change the conclusions of my 2022 GRL paper.

The Plain Language Summary of my Reply reads:

Schmidt, Jones, and Kennedy’s (SJK) (2023, GRL, link) assessment of the error of the ERA-T2m 2011–2021 mean (σμ,95% = 0.10 °C) incorrectly assumes that, during such a period, the global surface temperature was constant (T(t) = M) and that its interannual variability (ΔT i = T i – T (ti) = T i – M) was random noise. This is a nonphysical interpretation of the climate system that inflates the real error of the temperature mean by 5–10 times. In fact, the analysis of the ensemble of the global surface temperature members yields a decadal-scale error of about 0.01–0.02 °C, as reported in published records and deduced from the Gaussian error propagation formula (GEPF) of a function of several variables (such as the mean of a temperature sequence of 11 different years). Instead, SJK assessed such error using the standard deviation of the mean (SDOM), which is an equation that can only be used when there exists a distribution of repeated measurements of the same variable, which is not the present case. Furthermore, SJK misinterpreted Scafetta (2022, GRL, link) and ignored published literature such as Scafetta (2023, Climate Dynamics, link) that already contradicted their main claim about the role of the internal variability of the models and confirmed the results of Scafetta (2022, GRL,[link].

Both publications are open access, so interested readers can judge the scientific merits of both points of view for themselves.  See also Schmidt’s latest post at RealClimate [link].

I found the Comment by Schmidt, Jones and Kennedy to be outdated and paradoxical because their main arguments had already been fully rebutted in another and much more extended paper of mine (Scafetta, N., CMIP6 GCM ensemble members versus global surface temperatures, Climate Dynamics 60, 3091–3120, 2023, [link], which they did not even cite. They also ignored other works (e.g. Lewis, N., Objectively combining climate sensitivity evidence, Climate Dynamics 60, 3139–3165, 2023 [link], first published on 18 September 2022) which essentially confirmed my main result that the actual ECS had to be ≤ 3 °C. The same result is now also confirmed by a third work (Spencer, R.W., Christy, J.R., Effective climate sensitivity distributions from a 1D model of global ocean and land temperature trends, 1970–2021, Theoretical and Applied Climatology, 2023 [link]. My GRL Reply performs the calculations using the same data as in my GRL 2022 study, also considering Schmidt et al.’s main critiques outlined above, and once again validates the original finding in my  2022 GRL paper.

Herein, I would like to address only a major statistical and simple topic discussed in my Reply that might be of general interest: how to calculate the error of the mean of a temperature record.

The issue was to determine the error of the mean of the global surface temperature record from 2011 to 2021, that is an 11-year period. Schmidt, Jones, and Kennedy claimed that such an error must be calculated with an equation known as the Standard Deviation of the Mean (SDOM) and adopted the following equation:

Screen Shot 2023-09-24 at 8.59.45 AM

where T i are the N = 11 annual temperature values from 2011 to 2021 and

Screen Shot 2023-09-24 at 9.01.50 AM

is the mean over the 11-year period. As a result, they stated that the global surface temperature records from 2011 to 2021 are affected by a mean error of 0.10 °C.

However, such a result is clearly incorrect because the decadal uncertainty associated with the global surface temperature record from 2011 to 2021 (or even since 1980) has never been calculated to be 0.10 °C in scientific literature. Even on an annual scale, the global surface temperature data error has been reported to be much smaller than 0.10 °C, as also GISTEM (authored by Schmidt) and HadCRUT (authored by Kennedy) clearly show. For example, the Berkeley Earth’s global surface temperature record [link] reports a decadal scale error of about 0.02 °C (I reported the data version published in April 2023). Moreover, the claimed 0.10°C error is arbitrary calculated because Eq. 1 with the monthly temperature record (which has N = 132) yields an error of about 0.03 °C. As a result, utilizing the SDOM makes no sense because by simply interpolating the data and raising N, one may obtain an error as small as desired.

In fact, Schmidt, Jones, and Kennedy did not realize that, in our specific case, Eq. 2 is not the mean of a distribution of N repeated random measurements of one quantity, but a function of N different quantities. The 11 annual mean temperature data used for evaluating the mean from 2011 to 2021 are not 11 stochastic estimates of their 11-year mean and, therefore, they do not form a distribution of stochastic measurements of one quantity. When one just has a function of N different quantities, its error cannot be computed with the SDOM but only with a different equation known as the Gaussian Error Propagation Formula (GEPF) of a function of several quantities. In the case of the function called “mean”, the GEPF establishes that its error is given by the equation

Screen Shot 2023-09-24 at 9.03.02 AM

where σzi2 is the variance of the single measurements zi, that is the reported experimental error of zi, and σzi,zj is the covariance of the individual measurement errors. When Eq. 3 is applied to the global surface temperature data from 2011 to 2021, it yields an error that varies between 0.01 and 0.02 °C according to whether the covariance of the errors is used or not.

The difference between the SDOM and the GEPF is covered in any 101 course of Statistics and Error Analysis in Physics and are detailed in popular textbooks (e.g. see Chapter 3 and Chapter 4 in Taylor, J.R., An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (second edition), University Science Books, 1997; see also Chapter 4 and 5 in Evaluation of measurement data — Guide to the expression of uncertainty in measurement, JCGM 100:2008. [link]

In a nutshell, the GEPF must be used to assess the error of the mean weight between Mary and John (two different quantities) when using the same scale; the SDOM must be used to estimate the error of the weight of John (one quantity) when using two measurements from two separate scales. For example, every child knows that the mean between 10 and 20 (two different quantities) is 15. However, the SDOM adopted by Schmidt, Jones and Kennedy (Eq. 1) calculates 15±5 even when 10 and 20 indicate two different quantities and are error-free, which is clearly wrong because, for example, 11 or 17 are not be the mean of 10 and 20. The SDOM can be used only if 10 and 20 are two stochastic measures of the same thing of which one would like to estimate the best estimate.

Even more paradoxically, the erroneous adoption of the SDOM physically implies that Schmidt, Jones, and Kennedy assumed that the climate temperature of the Earth from 2011 to 2021 was constant, and that natural fluctuations such as ENSO and (natural and anthropogenic) trends are just errors of measure. To justify such a claim, Schmidt, Jones, and Kennedy even invented a new concept in climatology, the concept of “random nature” (perhaps derived from a parallel universe theory?). However, their interpretation of the temperature data is clearly nonphysical. Natural variability does not contribute to the error of measure of the data, but at most only to the error of a model regression coefficient of the data. However, here the issue was not to test an isothermal climate model of the type T (t) = M.

Then, Schmidt, Jones and Kennedy used their erroneous and inflated 0.10 °C error of the mean of the global surface temperature record from 2011 to 2021 to qualitatively claim that the conclusion of my GRL 2022 paper was wrong just because a very few GCM member simulations obtained with a very few GCM models with an ECS > 3 °C agree with the data within such erroneous interval, as Figure 1a shows. However, their own figure clearly shows that all the GCMs with ECS > 3 °C produce hindcasts that are statistically skewed toward temperature values larger than the warming reported by the data: see the green dots indicating the GCM average simulations. Thus, statistically speaking, such models run too hot. In fact, as my Figure 2 shows, when the right error of the mean is considered and the climate models are ensembled into three macro-GCM indicating the three ECS ranges (1.5–3.0 °C; 3.0–4.5 °C; and 4.5–6.0 °C) as did in my 2022 GRL paper and the proper statistics is evaluated also assuming some statistical dispersion due to their internal variability, the warm bias of the GCM groups with an ECS > 3.0 becomes evident. My figures are reported below.

In conclusion, the Comment by Schmidt, Jones, and Kennedy is flawed, both statistically and physically. Its publication, together with my Reply, is important only because pointing out such errors is also useful for educational purposes.

I need to add that this is not the first time that Schmidt has critiqued one of my works using severely flawed mathematics and logic. Some readers may remember that in 2009 Benestad and Schmidt published a paper in JGR (Benestad, R.E., and G.A. Schmidt, Solar trends and global warming, J. Geophys. Res. 114, D14101, 2009, [link], which was actually a kind of comment on some of my works. Here Schmidt made severe and naïve errors in the wavelet analysis and multilinear regression model, as I first demonstrated here [link]. Such errors obscured the empirically evident and significant solar contribution to climate change and might have misled the scientific community on this topic. For the interested readers, the detailed rebuttal of Benestad and Schmidt’s paper was later published here: Scafetta, N., Discussion on common errors in analyzing sea level accelerations, solar trends and global warming, Pattern Recognition in Physics 1, 37–57 [link]. Schmidt recently wrote other Real Climate flawed articles that critique papers that I have coauthored with Dr. Connolly, Dr. Soon, and many other colleagues which show the possibility that the sun can significantly contribute to climate change of the last century. The rebuttals of his critiques are found on [link].

In conclusion, these are cases that clearly demonstrate the necessity of having formal Comments and Replies published together to let the readers to properly evaluate both viewpoints. Thus, I am surprised that on RealClimate, Schmidt appears to complain that his Comment was not published by alone, before or even without my Reply. However, it is critical that professionally written Comments and Replies are published concurrently. Furthermore,  for the sake of science, any form of political manipulation of journals behind the scenes (as the ClimateGate emails revealed link) must be abhorred. This must be done mostly for ethical reasons, notably to avoid potentially occurring instances of scientific disinformation campaigns promoted by the authors of the Comments and by various activist scientists.

by Nicola Scafetta

Outcome of an exchange of Comments at Geophysical Research Letters (GRL)  on my paper regarding ECS of CMIP6 climate models

Back in March 2022 Gavin Schmidt on RealClimate.org critiqued one of my papers:

  • Scafetta, N., Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m, Geophysical Research Letters, 49, e2022GL097716, 2022, https://doi.org/10.1029/2022GL097716.

My GRL paper compared the warming of the global surface temperature data from 1980–1990 to 2011–2021 against the CMIP6 GCM hindcasts and found that only the GCM macro-ensemble made with the models with an Equilibrium Climate Sensitivity (ECS) ≤ 3 °C well agrees with the global surface temperature observations. The result is rather important because the GCMs with a low ECS are also those that project a moderate and nonalarming warming for the 21st century, in particular when the SSP2-4.5 scenario, which is the only SSP that seems to be realistic, is used for the climate projections.

Schmidt disliked my paper and claimed that it contains “numerous conceptual and statistical errors that undermine all of the conclusions”. Together with Gareth Jones and John Kennedy, he wrote a letter to the Editorial Board of GRL asking them to retract my paper. They claimed that (1) my GRL 2022 paper overlooked the error of the mean of the temperature data from 2011 to 2021, which they claimed to be 0.10 °C, and (2) they insisted that “the full ensemble for each model must be used” to test the models.

Their retraction request was rejected. GRL decided that a Comment-Reply exchange was more appropriate to clarify the subtle statistical issues that were being raised by their critiques and my rebuttals. Thus, Schmidt, Jones and Kennedy submitted their formal Comment, which essentially repeated the claims previously published on Real Climate. After their Comment was accepted on the 28th of January 2023, GRL asked me to write a formal Reply, which I submitted on the 21st of February 2023. My Reply was accepted on the 22nd of July 2023 and, finally, on the 21st of September both papers were published by GRL:

  • Schmidt, G.A., Jones, G.S., & Kennedy, J.J. (2023). Comment on “Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m” by N. Scafetta (2022). Geophysical Research Letters, 50, e2022GL102530. https://doi.org/10.1029/2022GL102530
  • Scafetta, N. (2023). Reply to “Comment on ‘Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m’ by N. Scafetta (2022)” by Schmidt et al. (2023). Geophysical Research Letters, 50, e2023GL104960. https://doi.org/10.1029/2023GL104960

My Reply demonstrates that Schmidt et al. made gross statistical and physical errors and that, in any case, their critiques do not change the conclusions of my 2022 GRL paper.

The Plain Language Summary of my Reply reads:

Schmidt, Jones, and Kennedy’s (SJK) (2023, GRL, link) assessment of the error of the ERA-T2m 2011–2021 mean (σμ,95% = 0.10 °C) incorrectly assumes that, during such a period, the global surface temperature was constant (T(t) = M) and that its interannual variability (ΔT i = T i – T (ti) = T i – M) was random noise. This is a nonphysical interpretation of the climate system that inflates the real error of the temperature mean by 5–10 times. In fact, the analysis of the ensemble of the global surface temperature members yields a decadal-scale error of about 0.01–0.02 °C, as reported in published records and deduced from the Gaussian error propagation formula (GEPF) of a function of several variables (such as the mean of a temperature sequence of 11 different years). Instead, SJK assessed such error using the standard deviation of the mean (SDOM), which is an equation that can only be used when there exists a distribution of repeated measurements of the same variable, which is not the present case. Furthermore, SJK misinterpreted Scafetta (2022, GRL, link) and ignored published literature such as Scafetta (2023, Climate Dynamics, link) that already contradicted their main claim about the role of the internal variability of the models and confirmed the results of Scafetta (2022, GRL,[link].

Both publications are open access, so interested readers can judge the scientific merits of both points of view for themselves.  See also Schmidt’s latest post at RealClimate [link].

I found the Comment by Schmidt, Jones and Kennedy to be outdated and paradoxical because their main arguments had already been fully rebutted in another and much more extended paper of mine (Scafetta, N., CMIP6 GCM ensemble members versus global surface temperatures, Climate Dynamics 60, 3091–3120, 2023, [link], which they did not even cite. They also ignored other works (e.g. Lewis, N., Objectively combining climate sensitivity evidence, Climate Dynamics 60, 3139–3165, 2023 [link], first published on 18 September 2022) which essentially confirmed my main result that the actual ECS had to be ≤ 3 °C. The same result is now also confirmed by a third work (Spencer, R.W., Christy, J.R., Effective climate sensitivity distributions from a 1D model of global ocean and land temperature trends, 1970–2021, Theoretical and Applied Climatology, 2023 [link]. My GRL Reply performs the calculations using the same data as in my GRL 2022 study, also considering Schmidt et al.’s main critiques outlined above, and once again validates the original finding in my  2022 GRL paper.

Herein, I would like to address only a major statistical and simple topic discussed in my Reply that might be of general interest: how to calculate the error of the mean of a temperature record.

The issue was to determine the error of the mean of the global surface temperature record from 2011 to 2021, that is an 11-year period. Schmidt, Jones, and Kennedy claimed that such an error must be calculated with an equation known as the Standard Deviation of the Mean (SDOM) and adopted the following equation:

Screen Shot 2023-09-24 at 8.59.45 AM

where T i are the N = 11 annual temperature values from 2011 to 2021 and

Screen Shot 2023-09-24 at 9.01.50 AM

is the mean over the 11-year period. As a result, they stated that the global surface temperature records from 2011 to 2021 are affected by a mean error of 0.10 °C.

However, such a result is clearly incorrect because the decadal uncertainty associated with the global surface temperature record from 2011 to 2021 (or even since 1980) has never been calculated to be 0.10 °C in scientific literature. Even on an annual scale, the global surface temperature data error has been reported to be much smaller than 0.10 °C, as also GISTEM (authored by Schmidt) and HadCRUT (authored by Kennedy) clearly show. For example, the Berkeley Earth’s global surface temperature record [link] reports a decadal scale error of about 0.02 °C (I reported the data version published in April 2023). Moreover, the claimed 0.10°C error is arbitrary calculated because Eq. 1 with the monthly temperature record (which has N = 132) yields an error of about 0.03 °C. As a result, utilizing the SDOM makes no sense because by simply interpolating the data and raising N, one may obtain an error as small as desired.

In fact, Schmidt, Jones, and Kennedy did not realize that, in our specific case, Eq. 2 is not the mean of a distribution of N repeated random measurements of one quantity, but a function of N different quantities. The 11 annual mean temperature data used for evaluating the mean from 2011 to 2021 are not 11 stochastic estimates of their 11-year mean and, therefore, they do not form a distribution of stochastic measurements of one quantity. When one just has a function of N different quantities, its error cannot be computed with the SDOM but only with a different equation known as the Gaussian Error Propagation Formula (GEPF) of a function of several quantities. In the case of the function called “mean”, the GEPF establishes that its error is given by the equation

Screen Shot 2023-09-24 at 9.03.02 AM

where σzi2 is the variance of the single measurements zi, that is the reported experimental error of zi, and σzi,zj is the covariance of the individual measurement errors. When Eq. 3 is applied to the global surface temperature data from 2011 to 2021, it yields an error that varies between 0.01 and 0.02 °C according to whether the covariance of the errors is used or not.

The difference between the SDOM and the GEPF is covered in any 101 course of Statistics and Error Analysis in Physics and are detailed in popular textbooks (e.g. see Chapter 3 and Chapter 4 in Taylor, J.R., An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (second edition), University Science Books, 1997; see also Chapter 4 and 5 in Evaluation of measurement data — Guide to the expression of uncertainty in measurement, JCGM 100:2008. [link]

In a nutshell, the GEPF must be used to assess the error of the mean weight between Mary and John (two different quantities) when using the same scale; the SDOM must be used to estimate the error of the weight of John (one quantity) when using two measurements from two separate scales. For example, every child knows that the mean between 10 and 20 (two different quantities) is 15. However, the SDOM adopted by Schmidt, Jones and Kennedy (Eq. 1) calculates 15±5 even when 10 and 20 indicate two different quantities and are error-free, which is clearly wrong because, for example, 11 or 17 are not be the mean of 10 and 20. The SDOM can be used only if 10 and 20 are two stochastic measures of the same thing of which one would like to estimate the best estimate.

Even more paradoxically, the erroneous adoption of the SDOM physically implies that Schmidt, Jones, and Kennedy assumed that the climate temperature of the Earth from 2011 to 2021 was constant, and that natural fluctuations such as ENSO and (natural and anthropogenic) trends are just errors of measure. To justify such a claim, Schmidt, Jones, and Kennedy even invented a new concept in climatology, the concept of “random nature” (perhaps derived from a parallel universe theory?). However, their interpretation of the temperature data is clearly nonphysical. Natural variability does not contribute to the error of measure of the data, but at most only to the error of a model regression coefficient of the data. However, here the issue was not to test an isothermal climate model of the type T (t) = M.

Then, Schmidt, Jones and Kennedy used their erroneous and inflated 0.10 °C error of the mean of the global surface temperature record from 2011 to 2021 to qualitatively claim that the conclusion of my GRL 2022 paper was wrong just because a very few GCM member simulations obtained with a very few GCM models with an ECS > 3 °C agree with the data within such erroneous interval, as Figure 1a shows. However, their own figure clearly shows that all the GCMs with ECS > 3 °C produce hindcasts that are statistically skewed toward temperature values larger than the warming reported by the data: see the green dots indicating the GCM average simulations. Thus, statistically speaking, such models run too hot. In fact, as my Figure 2 shows, when the right error of the mean is considered and the climate models are ensembled into three macro-GCM indicating the three ECS ranges (1.5–3.0 °C; 3.0–4.5 °C; and 4.5–6.0 °C) as did in my 2022 GRL paper and the proper statistics is evaluated also assuming some statistical dispersion due to their internal variability, the warm bias of the GCM groups with an ECS > 3.0 becomes evident. My figures are reported below.

In conclusion, the Comment by Schmidt, Jones, and Kennedy is flawed, both statistically and physically. Its publication, together with my Reply, is important only because pointing out such errors is also useful for educational purposes.

I need to add that this is not the first time that Schmidt has critiqued one of my works using severely flawed mathematics and logic. Some readers may remember that in 2009 Benestad and Schmidt published a paper in JGR (Benestad, R.E., and G.A. Schmidt, Solar trends and global warming, J. Geophys. Res. 114, D14101, 2009, [link], which was actually a kind of comment on some of my works. Here Schmidt made severe and naïve errors in the wavelet analysis and multilinear regression model, as I first demonstrated here [link]. Such errors obscured the empirically evident and significant solar contribution to climate change and might have misled the scientific community on this topic. For the interested readers, the detailed rebuttal of Benestad and Schmidt’s paper was later published here: Scafetta, N., Discussion on common errors in analyzing sea level accelerations, solar trends and global warming, Pattern Recognition in Physics 1, 37–57 [link]. Schmidt recently wrote other Real Climate flawed articles that critique papers that I have coauthored with Dr. Connolly, Dr. Soon, and many other colleagues which show the possibility that the sun can significantly contribute to climate change of the last century. The rebuttals of his critiques are found on [link].

In conclusion, these are cases that clearly demonstrate the necessity of having formal Comments and Replies published together to let the readers to properly evaluate both viewpoints. Thus, I am surprised that on RealClimate, Schmidt appears to complain that his Comment was not published by alone, before or even without my Reply. However, it is critical that professionally written Comments and Replies are published concurrently. Furthermore,  for the sake of science, any form of political manipulation of journals behind the scenes (as the ClimateGate emails revealed link) must be abhorred. This must be done mostly for ethical reasons, notably to avoid potentially occurring instances of scientific disinformation campaigns promoted by the authors of the Comments and by various activist scientists.

People interested in knowing more about my climate-change research are invited to watch this recent YouTube presentation (15 Aug 2023, 2 hours)

Nicola Scafetta: Understanding Climate Change | Tom Nelson Pod #126

Screen Shot 2023-09-24 at 9.09.59 AM

Screen Shot 2023-09-24 at 9.10.09 AM

Screen Shot 2023-09-24 at 9.09.59 AM

Screen Shot 2023-09-24 at 9.10.09 AM

62 responses to “Comment and Reply to GRL on evaluation of CMIP6 simulations

  1. Well said Nicola. I found this interesting:
    “I found the Comment by Schmidt, Jones and Kennedy to be outdated and paradoxical because their main arguments had already been fully rebutted in another and much more extended paper of mine (Scafetta, N., CMIP6 GCM ensemble members versus global surface temperatures, Climate Dynamics 60, 3091–3120, 2023, [link], which they did not even cite. They also ignored other works …”

    If they were acting as proper scientists, they would have considered and cited all relevant data. If their goal were merely to cause trouble and push an agenda, then they would ignore anything contrary, which is exactly what they did. The whole “comment” was purely political and childish.

    We document many examples of AR6 ignoring relevant data in our book, The Frozen Climate Views of the IPCC, this seems the same to me.

  2. Pingback: Comment and Reply to GRL on evaluation of CMIP6 simulations - Climate- Science.press

  3. “However, such a result is clearly incorrect because the decadal uncertainty associated with the global surface temperature record from 2011 to 2021 (or even since 1980) has never been calculated to be 0.10 °C in scientific literature.”

    That is a nonsensical objection. The standard error of the mean is a basic statistical calculation, and they did it. You don’t need to go looking in the scientific literature. It applies to numbers from any source. It just says that if you average numbers with this much observed variability (however caused), the mean will have this much variability.

    If you are trying to estimate something about climate based on a set of numbers, you have to allow for the variability of weather. The SJK calc does that; it is included in the variability of the annual data. What Scafetta wants to use is the estimate by BEST and others of the uncertainty of a particular year, based on the weather it actually had. Well, that is part of what goes into the observed variability, but only a small part. If you had measured the annual averages perfectly, you would still have year to year variability, and that would go into the uncertainty of the mean when you are trying to deduce something about climate.

    • Nick wrote: “If you had measured the annual averages perfectly, you would still have year to year variability, and that would go into the uncertainty of the mean when you are trying to deduce something about climate.”

      The definition of “climate” and what is central to be deduced seems to be of issue. One has a choice to focus on the energy reservoir of the surface, which includes the ocean heat content (OHC). Or one can call “climate” the surface temperature ST measured at 6′ above the land or at the ocean’s skin.

      The ST is going to fluctuate much more year to year than the OHC, yet it’s excursion from its mean is certainly pegged statistically to the OHC, except for decadal or multidecadal ocean current oscillations. Nick, you are not suggesting that variation of of annual ST (from HadCRUT or ERA) in the 11 data points from 2011-2021 have a 0.1C uncertainty, are you? Also, you are not refuting Dr. Scafetta’s claim of Schmidt’s use of the wrong equation, are you? What is it that you are defending?

      • Nick, you are not suggesting that variation of of annual ST (from HadCRUT or ERA) in the 11 data points from 2011-2021 have a 0.1C uncertainty, are you?
        The mean of those points has a 0;1C uncertainty. The variation of the points individually (standard deviation) would be about 0.3C.

        Scafetta is using the same equation at Gavin, but with a different sigma, which is based on measurement uncertainty but omits weather uncertainty.

      • Thanks for your reply, Nick. You seem to be agreeing with Gavin to assume all of the measurements were of the same fixed value. Scafetta clearly explained that each year was not assumed to be identical, which is what Gavin’s equation must have as a condition. Scafetta’s claim is that each year was different due to changes in ocean turnover, cloud cover and ice albedo and thus it’s expected they would be different values, which sounds reasonable. So the date point’s excursion from the mean is not a reflection of their uncertainty of measurement.

      • “It just says that if you average numbers with this much observed variability (however caused), the mean will have this much variability.”

        The mean is a single figure. How can it have a variability?

        The whole concept of a global mean temperature is total fudge anyway because temperature is NOT an extensive quantity. Temperatures cannot be added and thus there is no physical meaning to “mean temperature”.

        You could calculate them mean July temperature in Ibiza to plan you holiday dates but don’t think it has any scientific value when trying to estimate the effect of the changes in the earth’s energy budget.

        None of the “basic science” geniuses at the IPCC have cared to notice their basic metric is total bollox to start with.

  4. A good piece of hard evidence for the view that the “Climate Science is Settled Science” paradigm is actively policed against dissent. The attempt by Schmidt to have his professional rival Scafetta’s paper retracted for presenting an opposing view is appalling behaviour.

    It looks like the actions of a person who is aware their position is not as solid as they want it to appear, and who knows that their “Settled Science” is vulnerable to being undermined by many tiny contributions like Scafetta’s; hence censorship must be brought in. Luckily in this case the journal editors were more professional and allowed a debate.

    The more these sorts of ideological behaviours are exposed (thanks due to JC for running this blog), the less confidence I have that orthodox climate science can stand on its own merit, let alone legitimately claim to being “settled”.

  5. Normally when a true value is not known one should use the term “uncertainty” rather than “error”. The latter implying an exact known magnitude of the deviation. (GUM standard)

  6. Here is another way to see the fallacy in Scafetta’s thinking. There is no real difference between the SDOM and the GEPF, except for Scafetta’s interpretation of σ. He wants it to be the measurement uncertainty, and if that was zero, there would be no uncertainty about the mean.

    But he estimates warming from 1980 to 2021 by subtracting the means of the last 10 years from the first 10. Why 10? If you varied that, you’d get a different number. Why not 1? That gets you closest to first and last. 2021 minus 1980.

    People who think about numbers can see the problem with that. It isn’t a good measure of 40 years of warming. 2020 minus 1980 would have given a very different answer, even though you might know 2020 and 2021 very accurately.

    So he uses 10 years. That isn’t to diminish the already small measurement error. It is to reduce sampoling error. 10 years is a sample. And the error he is trying to reduce is the year to year variation of weather. He is trying to reduce the standard error of the mean.

    • I see your point, Nick. S used an arbitrary 10 year period, one at the beginning and one at the end, of the data set in question. As you say, he is trying to minimize the effect of year-to-year variability. And, as you say, the answer you get will vary depending on exactly what years are chosen.

      That said, his approach isn’t unreasonable.

      What measure/technique would you use to put a number on the warming?

      • The best measure is the regression trend. This gives a weighted sum of all the data. But his method isn’t too bad. The issue is assigning uncertainty. He uses 10 years to reduce weather variation uncertainty. It is that uncertainty that is relevant to testing whether warming measured on a weather-varying earth and warming in a weather-varying model are statistically different.

  7. Dr. C. My comments seem to get eaten. Any idea why?

  8. Nick says: “The standard error of the mean is a basic statistical calculation, and they did it.”

    True, but the standard error of the mean is meant to apply to different measurements of the same thing.

    I think all of us agree that climate changes. We only disagree on why. Some think it is human activity, some think it is natural, and most of us think it is both.

    To measure true climate change, we need to remove measurement errors. If we remove both the climate change and the measurement error, we’ve made a mistake. I think that is what Nick Stokes, Schmidt, Jones, and Kennedy want us to do.

    A standard deviation over 11 years includes climate change, it is not all measurement errors. You are removing the very quantity you are trying to calculate.

    It is incumbent upon all researchers to consider what they are doing, not just follow a textbook calculation.

    • steveshowmethedata

      If you take that impossibly high standard that you are advocating that time series data must be considered completely deterministically because of known and unknown deterministic driving processes then you will have to throw out 80 years or more of statistical time series analysis methods and applications e.g throw Box and Jenkins 1970 classic “Time Series Analysis: Forecasting and Control” in the bin! Dont think so!

    • “True, but the standard error of the mean is meant to apply to different measurements of the same thing.”

      That’s a common mantra among the uncertainty cranks at WUWT, who never quote any authority for it. The SEM applies to any set of numbers with apparently random variation of which you are taking the mean. They don’t even have to be measurements at all.

      Here, from NIST is an example of how to calculate the uncertainty of the mean of daily values in a month (May). They use exactly the SEM method of Gavin et al, even though each day is only measured once. Here is the key part:

      https://s3-us-west-1.amazonaws.com/www.moyhu.org/2023/niste2.png

      ” You are removing the very quantity you are trying to calculate.”

      That makes no sense. Gavin’s uncertainty is much larger than Scafetta’s. The difference is that Gavin includes weather variability; Scafetta does not.

      What Scafetta and Gavin are both trying to do is not to measure climate change, but to test whether observations and models indicate an inconsistency between climate.

      • Even though there is tighter range obtained by S via the statistical method used, this figure shows the raw ensemble output. The results are the same without any statistical processing.

        https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00382-022-06493-w/MediaObjects/382_2022_6493_Fig1_HTML.png

        From the paper:

        https://link.springer.com/article/10.1007/s00382-022-06493-w

      • Steve Fitzpatrick

        Nick,
        The problem I think is that both year-on-year weather variation and any underlying trend in ‘the climate’ are adding to Gavin’s calculated uncertainty; it seems to me he really is inflating the uncertainty by mixing the two sources of change. Scafetta’s 10 year averages (beginning and end) seem a less than perfect attempt to isolate the two effects. There may be better approaches, but I doubt Gavin’s is one of them.

      • “That’s a common mantra among the uncertainty cranks at WUWT, who never quote any authority for it.”

        That is false and you know it. The Gorman brothers, among others, have provided you with not only citations but quotations, the most common being Taylor and GUMS.

      • Clyde,
        ” the most common being Taylor and GUMS”
        I have given the example above from NIST, with GUM authority, of a teaching example which averages temperatures measured daily over a month, and uses the standard SEM formula for uncertainty. Exactly as we do.

        People quote stuff, like Andy below from Taylor, saying
        ““Suppose we need to measure some quantity x, and we have identified all sources of systematic error and reduced them to a negligible level. Because all remaining sources of uncertainty are random, we should be able to detect them by repeating the measurement several times. ””
        But that only describes a way in which you can use the formula. It doesn’t say that you can’t use it in other circumstances.

      • The topic of discussion in this sub-thread is not the correct way to handle data, but instead, your willful lying.

      • Nick, the NIST example is sadly rather flawed and artificial. Initially the error term is defined a just reading and calibration errors, later it includes natural variation. Nothing is said about the nature of the latter which is certainly the largest uncertainty component, or why it was initially ignored.

        It is not clear why anyone would be estimating monthly mean Tmax using 22 “non consecutive” days’ Tmax. They admit it ignores autocorrelation. (Temperature variation is not white noise).

        The whole subject of measurement uncertainty in climate is an unqualified mess.

    • steveshowmethedata

      The thin-plate splines I suggest be fitted to the time series is a simple empirical approach to try and account for the aggregate of the known and unknown deterministic driving processes with the residual variance used to attribute as “best we can do” the stochastic component. That’s how a lot of empirical science works, model the deterministic components as best you can and assign residual variation (with care, i.e. accounting for multiple levels of sampling, process, and measurement errors, autocorrelation etc) to the stochastic component. Good experimental and survey design helps greatly making inferences more robust (e.g. via replication and randomisation), that’s how modern statistical methods were developed on the field’s at Rothhamsted.

  9. steveshowmethedata

    Scafetta (2023, “Reply to “Comment on…”) seems to “want his cake and eat it too” by assuming the time series 2011-2021 is completely deterministic (“the 2011–2021 ERA5-T2m interannual variability—which represents the actual climatic chronology that occurred—cannot be replaced by random data”) so he can handwave-away any stochasticity in each series by grid cell BUT in contrast he assumes (by the statistical method he applies of t-tests by grid cell) that the set of CGMs applied are a simple random sample of some superset of CGMs. You cannot have it both ways. You either have both deterministic (i.e. no stochasticity and thus no variance estimation and hypothesis testing) or consider both (part) stochastic.

    My suggestion would be to fit a thin-plate regression spline to each grid cell’s time series of observed ERA5-T2m records by fitting the appropriate linear mixed model (incorporating the thin-plate spline as linear plus random effect terms) to the ERA5-T2m records minus the corresponding prediction of each CGM as an 11 x N catenated vector representing the response variable and including a random effect for each CGM, a random effect for each year in the series, and finally the residual error. The test of no difference would then be based on the support interval for the intercept parameter which under the null hypothesis of zero difference between population-level mean of observations and population-level mean of model predictions (assuming both observations and predictions are sets of random samples within each grid cell). This support interval would be based on the random CGM variance component and the residual variance about the fitted splines for the ERA5-T2m series and the residual representing the interaction of CGM_factor and the times series factor (adjusting for other terms). These last two variances would be greater than zero but less than that obtained by not fitting the spline (i.e. the equivalent of the SJK2023 approach in this last case).

    eg in R using MCMCglmm for each grid cell

    prior1
    > m5d.1
    > summary(m5d.1)

  10. steveshowmethedata

    eg in R using MCMCglmm for each grid cell

    prior1 <- list(G = list(G1=list(V =1, nu = 0.002), G2 = list(V =1, nu = 0.002), G3 = list(V =1, nu = 0.002), R = list(V=1, nu = 0.002))

    m5d.1 <- MCMCglmm(T2m_minus_CGM_pred ~ 1+Years_centred, random=~ spl(Years_centred) + Year_factor + CGM_factor, data = data,
    nitt=130000, thin=100, burnin=30000, prior = prior1, family = “gaussian”, pr=TRUE, verbose = FALSE)

    summary(m5d.1)

  11. steveshowmethedata

    eg in R using MCMCglmm for each grid cell

    prior1 <- list(G = list(G1=list(V =1, nu = 0.002), G2 = list(V =1, nu = 0.002), G3 = list(V =1, nu = 0.002)), R = list(V=1, nu = 0.002))

    m5d.1 <- MCMCglmm(T2m_minus_CGM_pred ~ 1+Years_centred, random=~ spl(Years_centred) + Year_factor + CGM_factor, data = data,
    nitt=130000, thin=100, burnin=30000, prior = prior1, family = “gaussian”, pr=TRUE, verbose = FALSE)

    summary(m5d.1)

  12. Schmidt, Jones, and Kennedy would argue in favor of Piltdown Man if it served their Leftist-liberal global warming alarmist agendas.

  13. Here is a good description of standard deviation and standard error, if anyone needs a refresher.

    https://statisticsbyjim.com/basics/difference-standard-deviation-vs-standard-error/

    More …
    https://statisticsbyjim.com/hypothesis-testing/standard-error-mean/

  14. From Nick above:
    “‘True, but the standard error of the mean is meant to apply to different measurements of the same thing.’

    That’s a common mantra among the uncertainty cranks at WUWT, who never quote any authority for it.”

    You have a short memory Nick, the source is Taylor, page 98, where he writes:

    “Suppose we need to measure some quantity x, and we have identified all sources of systematic error and reduced them to a negligible level. Because all remaining sources of uncertainty are random, we should be able to detect them by repeating the measurement several times. ”

    In this case the systematic error taylor refers to is Climate Change over the 11-year period, whether man-made or natural. If you assume climate change is zero, why are we having this debate?

    Taylor, 1997, An Introduction to Error Analysis, second edition

    As cited here: https://andymaypetrophysicist.com/2023/04/13/the-error-of-the-mean-a-dispute-between-gavin-schmidt-and-nicola-scafetta/

    • Andy,
      All that says is that you can repeat a measurement and get a better estimate. Of course you can. It doesn’t say that application of the standard error of the mean is restricted to those circumstances.

  15. This link will download the details of both Schmidt’s and Scafetta’s calculations of the standard error of the mean so you can compare them in detail.
    https://andymaypetrophysicist.com/wp-content/uploads/2023/04/ERA5-values.xlsx

  16. thecliffclavenoffinance

    The climate in 100 years will be warmer, unless it is colder. No human alive today has any idea what the climate will be like in 100 years. Therefore, no climate computer game created by a human can predict the climate in 100 years.

    The predictions of CAGW are data free guesses.

    There are no historical CAGW data because CAGW has never happened before.

    There are never data for the future climate.

    Therefore, predictions of CAGW, which were first widely publicized in the 1979 Charney Report, are data free predictions.

    When there are no data, there is no science.

    The wide range of ECS of CO2 guesses are meaningless. No one can measure the exact effect of CO2 in the atmosphere because there are too many climate change variables to know exactly what each one is doing.

    The lab spectroscopy measurements of CO2 could be useful to guess the ECS of CO2 in the atmosphere, but a guess is not a fact.

    The obvious conclusion, which even this intelligent author does not seem to get, is that climate change is too complex to make predictions of the future climate.

    And it is very possible that even great knowledge of EVERY climate change variable will NOT lead to accurate long term climate predictions.

    The right answer to most climate science questions is “we don’t know”. We can find evidence of both natural and manmade causes of climate change, but we have no idea of the percentage split.

    We do know, from anecdotes over past centuries, and climate reconstructions, that living in a warm century during an interglacial period, is very good news. Guess what? We ARE living in a warm century during an interglacial period NOW, and we should be celebrating our good luck.

    One does not need a scientist to explain why warmer winters from global warming are good news.

    Over seven billion people have lived with global warming since 1975. If there was a climate emergency, wouldn’t some of them have been harmed by the warmer winters and warmer nights? But where is the list of climate change casualties? Only in leftist imaginations.

    http://www.HonestClimateScience.Blogspot.com

    • Sounds good to me. my past experience of modelling for economic policy – a much less complex field than climate – made me very cautious of projections even ten years ahead, using them as an indicator of the difference between possible outcomes of various policies, rather than as a forecast of what they would be. We might demonstrate that A was likely to outperform B, but not what the precise outcome would be.

  17. Phillip B Flexon MD MS FACS

    Great paper. The issue of variation of temperature measured at different points in time almost needs a newly devised mathematical methodology. Your point about treating it as signal error which would be applicable only if measured the temperature at the same time and place with identical thermometers would be the only way Schmidt et al could justify their methodology is clear to me as a modeler in Population Genetics but no Climate background. Climate and weather is inherently highly complex and begs extra caution to get back to real recorded data and skepticism of all methods of “data correction”
    P Flexon MD MS

  18. I have not looked at the method of estimating confidence limits for the paper under discussion here, but I do know that looking at the difference between individual model and observed global mean surface temperature (GMST) trends points to those models that best emulate the observed GMST trends having the lowest transient climate responses (TCR) and equilibrium climate sensitivities (ECS).

    The comparison method includes regression, using both OLS and TLS and end point temperature differences, over the entire smoothed temperature series for the historical period and a temperature difference for the models between the actual trends models produce in the historical period and that projected based on the models sensitivities with the future period scenarios.

    It is those models that best reproduce the observed temperature trends with the least differences in actual and sensitivity projected temperature trends that have the smallest TCR and ECS values.

    The range of the differences of those historical temperature trends for the actual and projected for the models should warn against using a mean of model results for statistical analyses. Models results and validity should be evaluated on an individual basis.

    I judge that the models performance in emulating the observed temperature in the historical period is slowly being given the interest it deserves, but it will be a slow process.

    • If I remember correctly Nic Lewis showed in his papers from 2013 to 2015 how the CMIP5 was way hot compared with the HadCRUT record. Lewis took the empirical approach of taking the actual observation record of temperature and CO2 for the last 100-plus years, with only assumption being that both of them were largely accurate.

      I would like to see a paper looking at the CMIP3 models performance against Lewis’s energy balance method for the last 10 years. We’ve had at least two ENSO cycles. If the Lewis range for ECS is validated the CMIP should curtail the inclusion of models in excess of 3 deg C per doubling of CO2.

      • Hi Ron, IMHO 10 years is a bit too short period. I tried to replicate the method of Nic this way: From GISS I got the GMST difference between the average 1970…1980 and 2013…2022, both periods are 10 years long, this gives 0.864 K. The same procedure with the latest ERF data, the total Forcing Delta gives 2.14W/m². For a doubling of CO2 one calculates presently with 3.9 W/m², so the TCR can be calculated with 1.57 K/2*CO2. “The multimodel mean values of ECS and TCR in CMIP5 are 3.2° and 1.8°C, respectively (Table 1), while comparable values of those in CMIP6 are 3.7° and 2.0°C (Table 2).” (see https://doi.org/10.1126/sciadv.aba1981 ). This means: The MMM of CMIP5 was about 15% too hot, in the case of CMIP6 it’s 27% globaly. However, the warming rates ( and the discrepancies between models and observations) are very different around the globe:
        https://i.imgur.com/cHXJm85.png
        (upper part: Observations (GISS) lower part: CMIP5 MMM)
        See the very small warming in great parts of the Eastern Pacific, the “pattern” which can’t be shown by CMIPs. Indeed the sensitivity is higher in many CMIPs than observations show over longer time scales.

  19. Maybe 10 years in it’s not too early to could get Judith to revisit how well her analysis of how the models running too hot worked out?

    > “The stadium wave signal predicts that the current pause in global warming could extend into the 2030s,” Wyatt said, the paper’s lead author.

    >> Curry added, “This prediction is in contrast to the recently released IPCC AR5 Report that projects an imminent resumption of the warming, likely to be in the range of a 0.3 to 0.7 degree Celsius rise in global mean surface temperature from 2016 to 2035.” Curry is the chair of the Department of Earth and Atmospheric Sciences at the Georgia Institute of Technology.

    https://judithcurry.com/2013/10/10/the-stadium-wave/#comments

    • Joshie, You should give up on your obsession with proving Judith wrong. You don’t have the scientific chops to do it and your truly unfocused mind that distorts past statements and lies about what people say means you will be ignored as you should be.

      • David –

        Why would asking for an update on how her prediction turned out relative to climate models be “proving Judith wrong?”

        Are you saying that the models did better than her prediction?

        If so, would that invalidate or fslsify the stadium wave theory? Has Judith published anything or otherwise updated the public in her theory and prediction? If so, could you link to it?

      • Or do you think 10 years is too soon to make a meaningful assessment?

  20. Unadjusted weather model initialization data has shown 0.013 deg/year warming since 1979. This is less than half the rate that most climate models forecast. The hot bias is obvious to any honest evaluation. Weather forecast models are routinely adjusted statistically for bias and uncertainty (MOS). This is harder to do with climate models but the implication is clear… the models will almost certainly continue to run significantly too hot because the basic climate model assumptions have not been changed.

  21. Nicola … If I’m being honest, most of what you wrote is way over my head. I just try and absorb what I can. However, what I do know and can appreciate is how you are in the front line trenches in this climate conflict. When any human endeavor (not just science) doesn’t have open debate, valuable resources are sure to be wasted. As with Judith, and many others who post on here, you risk your career and good name with your research and opinions. Thank you.

  22. great read, thanks for sharing

    minor typo: “Moreover, the claimed 0.10°C error is arbitrary calculated” should be “arbitrarily”

    of course we already have strong evidence that true ECS is very likely under 2.0 thanks to the 2000-2020 CERES radiative balance study (already posted here by our gracious host) in which clouds clearly dominate the changes in flux during the period with the highest CO2 concentrations

    nevertheless this is a good example of the general contempt for appropriate use of basic statistics among the cranks at realclimate — SDOM is clearly not the correct choice as you could arbitrarily decrease error by increasing N, which makes no sense here given that we are not even measuring the same quantity each time

    it’s especially amusing coming from Schmidt, given that his agency publishes GISS, which has claimed since 1999 to know post-1960 monthly global temperatures to within .05 degree of the actual physical value (c.f. https://data.giss.nasa.gov/gistemp/faq/) , notwithstanding the significant proportion of pre-1999 temperatures that were later changed by more than .1 degrees — and of course GISS expects to continue changing pre-1999 temperatures by more than .1 degrees with each new GISTEMP version (c.f. https://data.giss.nasa.gov/gistemp/faq/#q211 )

    so while we can’t know what July 1971 temperatures will be adjusted to in 2039 by GISTEMP v9, it’s reassuring to know that the value will certainly still be accurate to within .05 degrees, even it’s not within .1 degrees of the July 1971 temperature reported today, or in 1999

    ironically that is the same accuracy that Gavin now claims is the limit to which we can know the decadal average… when evaluating climate models :)

    oh well, it’s not like multi-trillion-dollar global economic policies are at stake

  23. Robert Sinclair Weller

    Is it true that Spyros G. Makridakis gave up the study of forecasting as he was not happy with the accuracy obtained?

  24. How do you “average” temperature anomalies of land, ice and sea when the three media have different specific heat capacities ( land being about half that of sea water )?

    Temperature is NOT a measure of energy and cannot be averaged across different physical media.

    You should get zero marks if high school physics test if you try to do that.

  25. Pingback: Les origines des changements climatiques | Science, climat et énergie

  26. For me, the depressing part of this post is that there’s no discussion of autocorrelation.

    The effect of autocorrelation is to increase the uncertainty of the data, both the mean and the trend. And the reduction is far from small.

    The proper data-based way to adjust for autocorrelation is to actually investigate the effect of autocorrelation on your particular dataset. You do this by seeing how fast the uncertainty decreases when you use larger and larger subsamples of the data.

    The usual way to do the adjustment is to calculate an “effective N”, the effective number of actually independent data points in the dataset. I describe the technique in a post entitled “A Way To Determine Effective N”.

    https://wattsupwiththat.com/2015/07/01/a-way-to-calculate-effective-n/

    And this is not a small adjustment. For example, the Berkeley Earth land temperature data covers 3,284 months (n = 3,284). But the effective N is only 45 … and this means you need to multiply the regular calculation of the standard error of the mean by 8.5.

    Yikes!

    Best regards to all,

    w.

    • Found this:

      When ρ=0, it’s 3, as you’d expect: the variables are uncorrelated. As ρ increases, |R| decreases, again as you’d expect: more correlation between the variables leads to a smaller effective sample size. This behaviour continues until ρ=1/2, where |R|=2.

      But then something strange happens. As ρ increases from 1/2 to 2/2, the effective sample size increases from 2 to ∞. Increasing the correlation increases the effective sample size. For instance, when ρ=0.7, we have |R|=10: the maximum-precision estimator is as precise as if we’d chosen 10 independent individuals! For that value of ρ, the maximum-precision estimator turns out to be
      32Y 1+32Y 2−2Y 3.
      Go figure!

      This is very like the fact that a metric space with n points can have magnitude (“effective number of points”) greater than n, even if the associated matrix Z is positive definite.

      These examples may seem counterintuitive, but Eaton cautions us to beware of our feeble intuitions:

      These examples show that our rather vague intuitive feeling that “positive correlation tends to decrease information content in an experiment” is very far from the truth, even for rather simple normal experiments with three observations.

      https://golem.ph.utexas.edu/category/2014/12/effective_sample_size.html

      • Jim, first, as I mentioned, you can calculate the effect of autocorrelation on the statistics of a dataset experimentally.

        Next, I have no idea what your article has to do with autocorrelation–it’s not even mentioned once.

        Regards,

        w.

      • Right.

      • Willis – I didn’t have time last night to look into this further, but the thought was if you shift a time series in time, it could be conceptually seen as a second time series, so therefore a correlation. I haven’t found a lot of info on negative autocorrelation, but did find this.

        Negative autocorrelation is a violation of independence but it is generally less worrisome because (a) it seems to appear less frequently than positive autocorrelation, and (b) it actually produces greater precision in the average than an independent series would. The alternating pattern in a negative autocorrelation insures that a series will be more likely to bracket the true mean. Still it represents a lost opportunity to model the correlation and get a better estimate of confidence limits.

        http://www.pmean.com/09/NegativeAutocorrelation.html

    • NICOLA SCAFETTA

      Willies, the only autocorrelation that here matters is the one taken into account by the covariance matrix which is included in Eq. 3 above.

      • Thanks, Nicola. I was speaking of temporal autocorrelation. However, it seems you’re not considering the 11 annual means to be a time series, just measurements of different things with no times specified.

        Is that correct? If not, what am I missing?

        w.

  27. Nicola Scafetta

    Willis, as I said, the only autocorrelation that here matters is the one taken into account by the covariance matrix. The covariance matrix takes into account the temporal autocorrelation (in fact in Eq. 3 you find a term that correlates the values in t=i with those in t=j for all i and j).

    However, what you are missing is that what matters for the error of the mean is the covariance among the “errors” of the data (see the definition of covariance in Eq. 4 of my paper), not the covariance among the data themselves. In fact, the covariance of the “errors” is the stochastic component that can alter the error of the mean, while a possible covariance of the data refers to the physical relation among the data and do not contribute to the statistical error of the data mean because it is physics, not statistics.

    For example, let us calculate the average between 10+sigma_1 and 20+sigma_2 where each number can have an error sigma=+/-1. The error of the mean depends only on the covariance of the “errors”, not on the difference between 10 and 20 (as SJK assumed). If the errors are random the covariance is zero and the error of the mean is SQRT (1^2+1^1 + 2*0) /2 = 0.7 (from Eq. 3 above). However, if every time you have 10+1=11 you also have 20+1=21, and every time you have 10-1=9 you also have 20-1=19, then the errors are covarying with covariance = 1 (positive correlation) and the error of the mean is SQRT (1^2 + 1^2+ 2*1) /2=1 instead of 0.7. Instead, if every time you have 10+1=11 you also have 20-1=19, and every time you have 10-1=9 you also have 20+1=21, then the errors are covarying with covariance = -1 (negative correlation) and the error of the mean is SQRT (1^2 + 1^2 – 2*1) /2=0 instead of 0.7.

  28. Much confusion is generated in this discussion by the failure to realize that the essential matter in rigorous climate science is not the abstract calculation of statistical measures, but the accurate determination of the actual time-histories of geophysical variables, pre-eminently temperature.

    Given the fact that all field measurements produce only time-limited records of variables much shorter than the longest climatic cycles seen from proxy data, it should come as no surprise that the familiar statistical measures based on the assumption that we have multiple, independent measurements of a static quantity simply don’t apply. The means and variances of geophysical data vary depending not only upon record-length, but upon the phases of the climatic cycles that produce lagged correlations.

    What is essential is to ensure the limited record length doesn’t severely truncate the lagged-correlation functions that carry the information about the cycles of concern. The sample mean is always known precisely in properly performed measurements! The influence of autocorrelation upon estimates of the infinite process mean is but a secondary, practically irrelevant concern.

  29. What is essential in rigorous climate science is to ensure that the always-limited record length doesn’t over-severely truncate the lagged-correlation functions that carry the information about the climate cycles known theoretically or via proxies.

    At best, we have less than 200 years of consistently dependable station data throughout the globe. This provides no reliable information about cycles beyond the centennial scale. The sample mean is always known quite precisely in properly performed measurements and does not deteriorate with autocorrelation, whose influence upon estimates of the infinite process mean thus remains a secondary, practically irresolvable concern.