Climate Data Records: Maturity Matrix

by Judith Curry

The demand for climate information, with long observational records spanning decades to centuries and the information’s broad application for decision making across many socioeconomic sectors, requires that geophysicists adopt more rigorous processes for the sustained production of climate data records (CDRs). Such processes, methods, and standards are more typically found in the systems engineering community and have not generally been adopted in the climate sci- ence community. – John Bates and Jeffrey Privette

The latest issue of EOS has an article entitled A maturity model for assessing the completeness of climate data records.    The paper is behind paywall, here are some liberal excerpts:

We propose the use of a maturity matrix for climate data records that characterizes the process of moving from a basic research product (e.g., raw data and initial product) to a sustained and routinely generated product (e.g., a quality-controlled homogenized data set).

This model of increasing product and process maturity is similar to NASA’s technical readiness levels for flight hardware and instrumentation and the software industry’s capability maturity model. Over time, engineers who have worked on many projects developed a set of best practices that identified the processes required to optimize cost, schedule, and risk. In the NASA maturity model, they identified steps in technology readiness, denoted as the technology readiness level (TRL). TRL 1 occurs when basic research has taken the first steps toward application. TRL 9 is when a technology has been fully proven to work consistently for the intended purpose and is operational.

Similarly, the computer software industry has widely adopted the Capability Maturity Model Integration (CMMI) to develop software processes to improve performance, efficiency, and reproducibility. 

[T]here are numerous iterative steps involved in the creation of climate data records. These steps can be imagined as an expanding spiral, beginning with instrument testing on the ground, expanding to calibration and validation of the instrument and products to archiving and preservation of relevant data and provenance of the data flow, and finally broadening to comparisons and assessments of the products. In addition, the sustained involvement of research experts is required, as history has shown that new problems in producing homoge- neous CDRs arise as different instruments are used over time to observe the climate.

The proposed CDR maturity matrix combines best practices from the scientific community, preservation description information from the archive community, and software best practices from the engineering community into six levels of completeness. These maturity levels capture the community best practices that have arisen over the past 2 decades in fielding climate observing systems, particularly satellite observing systems. Each level is defined by thematic areas: software readiness (stability of code), metadata (amount and compliance with international standards), documenta- tion (description of the processing steps and algorithms for scientific and general com- munities), product validation (quality and amount in time and space), public access (availability of data and code), and utility (uses by broader community).

Maturity levels 1 and 2 are associated with the analysis of data records from new instruments or a new analysis of historic observations or proxy observations. Although products at this stage of development may be used in research, there is insufficient maturity of the product for it to be used in decision making. Initial operational capability (IOC) is achieved in maturity levels 3 and 4. At these levels, the product has achieved sufficient maturity in both the science and applications that it may tentatively be used in decision making. Finally, full operational capability (FOC), levels 5 and 6, is achieved only after the product has demonstrated that all aspects of maturity are complete. This level of matu- rity ensures that the CDR product can be reli- ably used for decision making.

Quantifiable standards should exist at each maturity level for each thematic area. For example, peer-reviewed publications are required in three separate areas to address product documentation, validation, and utility. The maturity level matrix also pays particular attention to software maturity and access. This includes requiring that the code be managed and reproducible, that meta- data have provenance tracking and meet international standards, and that all code be publicly accessible. The product must be assessed by multiple teams, and positive value must be demonstrated; uncertainty must be documented. Each of these steps must be independently verifiable.

This maturity matrix model may serve in the future as a requirement for use of data sets in international assessments or in other societal and public policy applications, similar to certification programs that engineering professions conduct. The model focuses on process improvement to ensure traceability and transparency of CDRs but includes steps related to standard scientific review and assessment. Adoption of this standard by the climate community would help ensure quality long-term CDRs and facilitate their use in decision making across all natural and social science disciplines.

JC message to NOAA:  MUCH more of this, please.

The hodge podge nature of climate data records was revealed by the Climategate emails.  Since then, the situation has been moving in a much better direction, large through the efforts of John Bates, which I have been aware of since the early planning stages a number of years ago.

Establishment of credible Climate Data Records is essential not only for climate research, but also for ‘climate services.’  NOAA – save your money and axe the rest of ‘climate services;’  the private sector, universities and local governments can handle the rest as needed.  Spend your money on the Climate Data Records.

52 responses to “Climate Data Records: Maturity Matrix

  1. About time, many critics have been seeking such initiatives for years,

  2. David Springer

    I’m not going to hold my breath waiting for an admission from the usual suspects that meteorological records are inadequate for the task of sifting an anthropogenic warming signal out of the natural noise.

  3. There are so few years of even relatively reliable data over extensive areas of the Earth, how can we even begin to assume we know what’s normal about climate?

    • you dont really have to know what is ‘normal’ ( which isnt very well defined)
      to understand the issues related to dumping C02 in the atmosphere.

      • Steven Mosher

        392 part per million (by volume).

        Not really a big deal, Mosh.


        PS But the plants love it!

  4. And yet, at the same time, these jokers above will tell you how it was definitely warmer in the MWP.

    Go figure.

    • +1

    • As opposed to those telling me there definitely wasn’t an MWP?

    • David Springer

      Find where I ever said it warmer in the MWP, dopey.

    • Michael

      Physical evidence (oh horrors!) tells us it was warmer in many places during the MWP as well as during earlier warmer periods than today.

      Old Viking farms buried in Greenland permafrost, carbon-dated remains of trees recovered far above today’s tree line under receding glaciers, even occasional remnants of an old silver mine, which had been abandoned and covered up by advancing snow and ice.

      Then there are historical records from all over the civilized world of the time.

      Of course, there are also paleo-climate studies using different paleo methods from all over the world, which also point to a slightly warmer MWP than today, but these are less conclusive IMO than the actual physical evidence or the historical records.


  5. Just a shout out for Peter Thorne’s ISTI project. it’s at a beta stage and looking pretty good from my perspective.

    i’ll have more to say later, but for folks who dont remember this is the project that was launched after climategate.

  6. If the climate scientist’s activism ethical dilemma of Stephen Schneider is the prevailing/dominant view point, what matters credible Climate Records Data? If what we have observed in the Gergis et al fiasco, anything and everything is bent for a political endpoint. Science is the least of their worries nor cares.

    If you seek a credible Climate Records Data set for fledgling climate scientists, to look critically at data, to hone their skills on a more sure footing than advocacy, then credible Climate Records Data makes sense.

    It all depends upon to which audience you are speaking. I have, and have always had hope for the next generation. Those that follow, a new set of eyes, perspective, insight, enthusiasm. “Science moves forward one funeral at a time.”

  7. A formal process for producing CDR’s is long overdue. When we are dealing with average global climate, it is particularly important. For example, there are far more weather stations per square kilometre in North America or Europe than in Antartica or the Pacific Ocean, so how do you avoid bias? And ideally measurements in different places should be performed simultaneously, only satellite observations could come close to that.

    I believe that the world would be a differenjt place today if the IPCC had been forced to adopt such a stardard. They would not be able to ‘cherry-pick’ or reject certain periods of time, as they did before and after 1940 or declare 1940 temperature as normal, a laughable decision. See my website above.

  8. Yes, to this proposal. It is way overdue – especially considering the enormously costly decision that depend on this data. It’s time the climate data records are moved from science to engineering best practice.

    We also need best practice estimates of ACO2 damage costs and benefits.

  9. Will this effort be before or after the historical European records have been fiddled with by the likes of P. Jones and others?

  10. We need to establish reliable climate data?

    But I thought climate scientists could measure the average temperature of the entire globe, land, sea and atmosphere, to within hundredths of a degree, and trends to such precision over periods of decades, centuries and even millenia? Not to mention measuring sea level to within millimeters.

    I thought we already had such precise measurements of climate data that we should be turning over control of the global energy economy to the climate data measurers. Isn’t it a waste to spend any more money on anything as mundane as improving the measurement of climate data?

    Besides – Data? We don’t need no stinkin’ data. We’ve got climate models, and Bayesian priors, and Mannian statistics, and teleconnections, and….

  11. Halloween Chorus:

    ‘Fair is foul and foul is fair.
    Hover through the fog and filthy air.’

    I’d say more about overturnin’ the natural order but
    I’ve succumbed to a virus ( so I’ll just lurk for a while.
    Me fingers are too weak to complete senten

    • Beth

      I can’t remember which it is: feed a fever and starve a cold; or visa versa? In any case, a sip to your health and speedy recovery.

  12. On WUWT there are long rumbles by Dr Jan P Perlwitz, including this:
    The original scientific reference where the statistical model is being described, on which the trend calculator is based, is this one:
    Foster and Rahmstorf, 2011: Global temperature evolution 1979–2010. Environmental Research Letters, 6, doi:10.1088/1748-9326/6/4/044022,

    My reply to Dr Jan P Perlwitz and anyone who considers above as a proof of anything, is that Foster knows nothing about historical records. When I put this
    on RealClimate, without label, despite it being from the world best known temperature data records , he said it was a fraud and proceeded with outburst of vulgar obscenities. He was duly told off by Gavin and his posts deleted.
    Temperature models from an ‘expert’ who knows nothing about historical records are worth just that NOTHING.

    Therefore one could conclude that any opinion and knowledge which are based on the Foster’s so called expertise, they are also worth nothing too.
    Climate modelers should learn about the past before attempting to prophesize future.

  13. An ever increasing number of climate scientists have an ever increasing need for data to feed into insatiable climate models whose requirement for data has grown exponentially as computer systems become larger and more sophisticated.

    So what do we feed them with?

    * Global temperature data which is highly inconsistent

    * Sea surface temperatures which were derived in a manner that can not be called scientific and get more unreliable as we travel back in time

    *Sea ice data based on assumptions about historic sea ice

    * Sea level data based on a tiny nmber of Historic tide gauges from the NH before the modern record is spliced on top

    * Tree ring and other proxies that are thought to have global significance and are supposedly accurate to fractions of a degree and valid for thousands of years into the past.

    All of this excused by ‘averaging.’ This historic material is then compared on an apples and oranges basis with the Satellite record.

    What we do know is that there was an extended warm period lasting until around 1250. There was then a downturn before renewed warmth in the 15th and 18th centuries, interspersed by intense cold. Glaciers ceased to advance around 1750 and retreated mostly from around 1850.

    Land temperatures have been on the rise throughout the historic record as confirmed by Cet and Best.Is anythig strange happening in the modern era?

    To believe we have highly accurate data that tracks the ups and downs of the climate that we can feed into climate models to determine what will happen 500 years into the futurei is deluding ourselves.To use this mish mash to determine far reachig international policy is bizarre.

    The physics need to be proven that demonstrate that Co2 concentration above 300ppm is an elephant sitting on natural variability and not-as seems to be the case-merely a flea

    • Tony,

      I would not make the same list of problems but your basic point is certainly valid.

      The data is not of uniform quality and the older data is sparse. It’s possible to learn from sparse data of nonuniform quality but it’s much easier to make erroneous analysis based on that. It may be impossible to store raw data in a way that makes it use easy as every data point may need different handling due to various reasons. That leads to the need for preprocessing to make the data structure more uniform. The preprocessing may include corrections for known errors and filling for missing data. Such preprocessing may, however, introduce new errors and there’s commonly some disagreement on the best ways of doing that.

      This kind of issues have influenced all data depositories that we have presently on data related to climate. These issues are certainly one of the main reasons for the lack of better repositories. I have seen earlier initiatives for proceeding but with little result as far as I know. Let’s hope that something genuinely better will finally come out.

      • Pekka

        Thanks for your thoughtful response.

        Let’s hope that some initiative will be made that is genuinely independent of any coercion and will follow the science and data and recognise that there may not be a coherent narrative that enables the data to be neatly packaged and used in models

    • tony b

      Excellent summary.


  14. Ian Blanchard


    More succintly, it’s time that the empiricists claimed back the lead in climate science from the modellers.

    The answers to many of the controversial questions in climate science will be based on collecting more and better quality data, not in attempting to make a silk hockey stick out of a sow’s ear.

    • Ian Blanchard

      it’s time that the empiricists claimed back the lead in climate science from the modellers.

      IF this transition occurs, it will be an extremely painful one for the modeling community and for the IPCC “CAGW” premise, which rests on model simulations based on theoretical deliberations rather than empirical scientific evidence based on real-time physical observations or reproducible experimentation..


    • Ian, you write “More succintly, it’s time that the empiricists claimed back the lead in climate science from the modellers.”

      You are, of course, absolutely correct. But you only tell half the story; how the empiricists lost the lead in the first place. The real problem goes back many decades when, for CAGW, the empiricists stayed on the sidelines, and let the modellers call the shots. Any idea in physics should remain a hypothesis, unless and until there is sufficient empirical data to turn it into a theory. What happened with CAGW was that the empiricists were asleep at the switch, and the modellers convinced all the learned scientific societies of the world, led by the Royal Society and the American Physical Society, that the models provided incontrovertible proof that CAGW was real. And, unfortunately, our hostess as well.

      How we get the empiricists to resume their classic role in physics, I have no idea, but one thing I am certain of. It will be the empirical data that will prove whether CAGW is right or wrong.

  15. F**k [manners] me sideways with a bicycle! Did you guys and gals see this?

    Congratulations, Penn State!

    Mann is toast.

    • harry

      These “blanket” Nobel Peace Prize awards are great.

      First one for the “2,500 scientists” of IPCC (including Hockey-shtick Mann).

      Now one for the 500 million inhabitants of the EU!

      Goody, goody! Where’s mine?


  16. One essential element of this process will be honest, engineering standard error reporting.

    As judith has pointed out many times this is far from the case right now.

    John Kennedy, for one, seems to be moving things in the right direction at Met Office but current results are a long way from what would be accepted in engineering. Frankly a lot is still being swept under the carpet.

    Substantial bias corrections are made with a fairly broad brush : eg. start here , end here assume 2% per. Since these adjustments are nearly as large as the resulting signal the accuracy of these hypothetical correction is just as important as the data measurement error. Yet this does not seem to be included in the documented error calculations.

    ARGO and satellite altimetetry are two more areas where error assessment totally unreaslistic.

    • Greg

      Upthread I suggested John would be a good candidate to head the SST records.

      • I get the feeling that John is doing the best he can within the constraints imposed by his employer to push things in the right direction. However, the Met. Office is not about to admit they screwed up and mislead the world by overstating the value of immature climate models.

        They are still claiming to the public that they are “the best mean we have of predicting climate” despite their abject failure to do so even in their initial projections.

        It will probalby take about 10 years to turn this particular boat around, by which time they will have been overtaken by events (and climate).

  17. Dr Curry’s last paragraph came very close to articulating some thoughts I’ve had recently when reading the words of government-funded climatologists. Specifically the use of the word “product(s)” to describe their works which previously might have been placed in the aisle signposted as “service(s)”.

    In either case, they should remember that there is currently not a free-market for much of their merchandise.

  18. Michael , don’t think either “products” or “servies” are appropriate to objective science. However, this does underline a lot of what is wrong with post 19th centuy science.

    If the dollar is God and the “free” market reigns then you taylor you “product” to fit the “needs” of the customer. He who pays the piper…

    If the marketability of your “product” is based on publishing record and citation counting, there is pressure to conform or twist the scientific results to ensure publication and next years grants.

    That is _one_ part of the corruption of science. Another is noble cause corruption.

    David Roberts, Grist ” If we respond to the moral imperative to raise public awareness and alarm about climate, we have to be deceptive.”

    Which is straight out of the late Steven Scneider’s “balance of being honest and being effective”.

    If we are to get anywhere with climate science and understanding climate we need to find the right balance between object science and objective science and stop trying provide “products” and be deceptive by “moral imperitive”.

    • Greg, I assure you, I try not to confuse them with objective science. My pitiful publishing record might be longer otherwise.

      What I didn’t say in my first post was my, speculative thought, that some of the institutions might be feeling the heat behind the scenes. People call for the UK Met Office to privatized, something I am now in favor of.

      Running with “the big-boys” may improve the frequency of good weather/climate predictions, if only by ditching the ones more likely to get funded by predicting disasters. [Sorry, "projecting" disaster].

      Much of my career to date has been on projects ultimately aimed at inventing treatments for pre-existing human illnesses, not inventing new illnesses for a pre-existing treatment [though that may happen in some places].

  19. “Products” and “services” sounds a lot like “a bill of goods”, rather than objective scientific data.

    And this leads to 70% of US poll respondents stating that climate scientists “fudged” the data.

    Unfortunately, NOAA is one of the worst culprits.

    Millions are spent to replace the old spotty, expendable XBT temperature buoys with the comprehensive ARGO system in 2003 to measure upper ocean heat content. When it (oh horrors!) does not repeat the warming bias of the XBTs, but shows a cooling trend in the upper ocean instead, there is much consternation, with team leader, Josh Willis, calling it a “speed bump”. Many “adjustments”, “corrections”, “manipulations” and “eliminations of incorrect data” later, the record is flat rather than cooling. Look for it to show a warming trend soon.

    Boo, NOAA!


  20. David L. Hagen

    The UN-reliabilty of current data is exemplified by the raw vs processed data at Darwin Zero.
    A declining raw temperature record was “homogenized” into a 6 degree/centure rise!

    The current state of climate records is seriously lacking in maturity to put it mildly. The numerous examples of processing “adjustments”, homogenizations, pasteurizations etc. that change from cooling to warming trends – which then are reprocessed to further increase the warming trends – defy common sense that the data is “reliable”.
    That is equivalent to NASA aiming for Mars and ending up on Venus!

    As an “engineer” I strongly endorse the request to upgrade the climate records from “scientist” to “engineering” standards of reliability and maturity.

    Adoption of this standard by the climate community would help ensure quality long-term CDRs and facilitate their use in decision making across all natural and social science disciplines.

  21. Seems to me the consensus climate would not be interested in collecting high quality data for climate records, because that might imply the existing data has a problem. So it will probably be the climate radicals that can get organised to properly address the core problems of what exactly we “should” measure to get a handle on “global climate”, if there is a such a beast – perhaps something involving the integration of measures of energy apparent at the surface….

    (Lest we forget the thermosphere is so hot the earth should be roasting from above….)

  22. Pingback: Weekly Climate and Energy News Roundup | Watts Up With That?

  23. Pingback: 20th century mean global sea level rise | Climate Etc.

  24. Pingback: 20th Century Mean Global Sea Level Rise - CLIMATE HIMALAYA

  25. Interesting! See below. Yet another driver of uncertainty in result. Nova says results may “differ significantly”. In an iterative algorithim, one in which a prior result is an input to the next result(s), the error is compounded. It might be useful to know how big the resultant error becomes, particularly in climate models. Perhaps that is discussed in the paywalled source (A maturity model for assessing the completeness of climate data records.)

    Nova, Joanne. “WARNING: Using a Different Computer Could Change the Climate Catastrophe.” Scientific. JoNova: Science, Carbon, Climate and Tax, July 28, 2013.

    When the same model code with the same data is run in a different computing environment (hardware, operating system, compiler, libraries, optimizer), the results can differ significantly. So even if reviewers or critics obtained a climate model, they could not replicate the results without knowing exactly what computing environment the model was originally run in.

    This raises that telling question: What kind of planet do we live on? Do we have a Intel Earth or an IBM one? It matters. They get different weather, apparently.

    There is a chaotic element (or two) involved, and the famous random butterfly effect on the planet’s surface is also mirrored in the way the code is handled. There is a binary butterfly effect. But don’t for a moment think that this “mirroring” is useful: these are different butterflies, and two random events don’t produce order, they produce chaos squared.

  26. Pingback: So what is the best available scientific evidence, anyways? | Climate Etc.