Post-mortem on the forecasts of Hurricanes Hermine and Matthew

by Judith Curry

Reflections on forecasting hurricanes in light of U.S. landfalling Hurricanes Hermine and Matthew, highlighting the complexities of forecast ensemble interpretation.

A recent issue of the NYTimes Magazine has an article:  Why Isn’t the U.S. Better at Predicting Extreme Weather? Subheading:  Hurricanes like Matthew have laid bare the dirty secret of the National Weather Service: its technologies and methods are woefully behind the times.

The article highlights comments from Cliff Mass, a well known critic of U.S. (NOAA) weather forecasting (see in particular Cliff’s post on Matthew). A previous CE post on Cliff Mass’ articles is here.  The article is well worth reading, as well as Cliff’s post on Matthew.

So, is the U.S. National Hurricane Center woefully behind the times in hurricane forecasting?  The issue is not nearly as clear cut as implied by the NYTimes article.  At the end of the day, the NHC does a pretty good job with hurricane forecasts out to 5 days.

In this post, I describe:

  • Ensemble hurricane forecasting, including problems with NOAA’s approach in this context.
  • The forecast history of Hurricanes Hermine and Matthew, highlighting the advanced forecast techniques used by my company Climate Forecast Applications Network (CFAN)
  • The superb role of the private sector weather services companies in analyzing the forecasts, providing specialized insights for commercial clients, and communicating them with the public.

Ensemble hurricane forecasting

For background on ensemble hurricane forecasting see these previous articles at Climate Etc.:

From the NYTimes Magazine article:

While Mass is the most outspoken on the subject, many experts insist that if the Weather Service wants to meaningfully improve its predictions, it must employ a technique called ensemble forecasting. The basic premise is either to tweak the physics equations or to make repeated changes to a model’s variables: You might bump up the temperature slightly, for example, and then run the model again. After a half-dozen or so reruns, you get a set, or “ensemble,” of forecasts that can be compared with one another. When all the forecasts in an ensemble agree, it’s a reasonably sure bet that the predictions will pan out.

The NOAA GEFS does produce a global ensemble forecast (20 ensemble members) from which hurricane tracks are produced (note: ECMWF has 51 ensemble members). My post Hurricane Sandy Part n provides clarification on the National Hurricane Center’s approach:

The problem with the NOAA National Hurricane Center  forecasts is more fundamental than the NOAA GEFS model not performing very well. The NHC forecasters are using 20th century forecast methods, while the private sector is using 21st century ensemble based methods. The NHC does a ‘poor mans’ ensemble, pooling the deterministic runs of different forecast models. They use only the NOAA GFS deterministic forecast to force the regional models (e.g. GWRF, GFDL) which they have counted on to provide better track and intensity forecasts — which have been all but useless because they are being forced by the erroneous large scale fields from the NOAA GFS forecast.

As an example of ensemble hurricane forecasts in the sense described in the NYTimes Magazine article, see this figure from Hurricane Sandy.  The figure below shows forecast tracks from the models that were initialized on Oct 23 12Z (for reference, Sandy made landfall on Oct 29)


The raw tracks from the NOAA GFS model are shown in the right hand panel, probability distribution of the raw tracks from the ECMWF (European) is shown in the center panel, and CFAN’s calibrated tracks are shown in the left hand panel.

Does the ECMWF model always do the best on hurricane forecasts?  In terms of tracks, sometimes the NOAA model does better on shorter timescales (particularly in the 3-4 day window), but beyond 4 days we have found that the ECMWF model is nearly always superior to the NOAA model.

With regards to CFAN’s track calibration: our calibration increases probabilistic skill on average by 8%, with 10% improvement for days 4-7 and 17% improvement for days 7-10.

The NYTimes Magazine article stated:

When all the forecasts in an ensemble agree, it’s a reasonably sure bet that the predictions will pan out.

All of the forecasts in an ensemble showing agreement does not mean it is a good bet that the predictions will pan out.  The ensemble members may agree if the ensemble generation method produces an insufficiently disperse ensemble. Also, if a narrow ensemble is followed by a subsequent ensemble that is well outside the bounds of the previous ensemble, then the original ensemble forecast was essentially useless.  The discussion of Hurricane Matthew below will highlight some complex issues of hurricane ensemble interpretation

Hurricane Hermine

Hurricane Hermine was the first hurricane to make landfall in Florida since Hurricane Wilma in 2005 (landfall on 9/2 as a Cat 1 hurricane).

We first spotted this African Easterly Wave on 8/11, as having the potential to develop into a tropical cyclone.  By 8/18, both ECMWF and GFS identified this as a system with potential U.S. landfall impacts.  By 8/28, both models had honed in on the eventual track for a landfall on the Florida Panhandle (our calibrated ECMWF forecasts caught it maybe a day earlier).   In the interim 10 days, the big question was whether the strike would be on the Atlantic or Gulf  Coast of Florida – for a few days around 8/24, both models suggested a scenario where the storm would head into the Gulf of Mexico and impact the oil production region.


In terms of intensity forecasts, by 8/29 CFAN’s ECMWF calibrated intensity forecasts settled down to a Tropical Storm or Cat 1, with a small probability of Cat 2.


On 8/29, NOAA’s HWRF was predicting Cat 4, while the others were predicting a tropical storm (NOAA’s consensus forecast was reasonable).

All in all, the models did a good job predicting Hermine’s landfall impacts 5 days in advance, with several weeks warning of a possible landfall strike in Florida.

Hurricane Matthew

Hurricane Matthew was the first Category 5 Atlantic hurricane since Hurricane Felix in 2007. Originating from a tropical wave that emerged off Africa on September 22, Matthew developed into a tropical storm  on September 28, and Matthew became a hurricane on September 29, achieving Category 5 intensity the following day.  Matthew was a strong Category 4 hurricane as it made its first landfall over Haiti on October 4. The storm then paralleled the coast of the southeastern United States, making its fourth and final landfall over South Carolina as a Category 1 cyclone on October 8, and turned away from Cape Hatteras, North Carolina on October 9.

The NOAA model spotted this one on 9/20, and the ECMWF picked it up two days later.  Take a look at the difference in tracks between the two models on 9/26 (NOAA is on top, ECMWF is on bottom):



So which of these is more realistic?  NOAA captured the Atlantic track early on and pretty much stuck with it, whereas ECWMF forecasts showed a wide spread.  Given the forecast uncertainty at this point, I would say that ECMWF was more realistic, whereas NOAA’s ensemble was underdisperse.  At this point, none of the ensemble members in either model closely tracks the eventual trajectory skirting the southeast Atlantic coast.

The next big forecast challenge was the explosive intensification that occurred on 9/29.  None of the models captured this.  On 9/28, our ECMWF intensity forecasts did show a slow creep upwards towards a major hurricane.  Of the NOAA models, only GFDL showed a major hurricane about 4 days out.  Forecasting such explosive rapid intensification is THE major challenge for hurricane forecasting.

The next major forecast challenge was whether/where there would be a U.S. landfall.  It wasn’t until 9/30 until the NOAA model started to show some tracks striking the southeast U.S., whereas the ECMWF showed a broader range of possible tracks striking southeast, with the center of mass shifting to the Atlantic coast.  On October 3, both models converged on a track that would skirt the Florida coast.  To illustrate the impact of CFAN’s track calibration, compare the raw ECMWF tracks on 10/3 (top) with CFAN’s calibrated tracks (bottom):



And the NOAA track forecast initialized on 10/3:


The next forecast challenge was whether Matthew would head northwards after South Carolina (and strike North Carolina and/or locations to the north), or turn eastward and head out into the central Atlantic, or loop around to south.

The loop scenario was first suggested on 10/2 by the ECMWF deterministic/high resolution run.  By 10/5 12 Z, the NOAA model had gone ‘full in’ for the loop scenario:


Whereas the ECMWF ensemble was split between the loop scenario and a scenario that went out to sea:


Again, we see that the NOAA ensemble was underdisperse, and the people in North Carolina were not expecting to be impacted as of 10/5.  In fact, NOAA GEFS persisted with the loop scenario until 10/8.

So, there were good and bad aspects of the forecast for Matthew, with NOAA performing better for some aspects and ECMWF performing better for other aspects.

Ensemble interpretation

I would like to comment on CFAN’s high predictability track cluster, which was publicized in a series of posts at WeatherUnderground:

These posts referenced CFAN’s High Predictability Cluster (HPC), that selects the four ECMWF ensemble members [gray lines] that most closely match the deterministic (high resolution) run [red line] during the first 72 hours.  Shown below is the HPC cluster forecast for 10/2:.


The idea behind this is that the high-resolution ‘deterministic’ run is expected to be the best out to 2-3 days; focusing on the ensemble members that match the deterministic most closely for the first few days should help cull the ensemble members to provide a narrower estimate of where the tracks might go.

Well, sometimes the deterministic run doesn’t do well — in fact, it was the first run to predict the loop scenario.  Hence, for the period  up to 10/8, the HPC went with the (erroneous) loop scenario.

So, how should we interpret the HPC, and when does it do better than the ensemble average or the full ensemble spread?  Here is what the statistics say for 2009-2011 hurricane seasons for track error:

  • When the spread is narrow, the high-resolution deterministic overall performs the best
  • The High Predictability Cluster performs best beyond 108 hours, when the ensemble spread is large.

Matthew suggests that when there are two genuinely divergent scenarios, that the HPC may not help, if the deterministic solution turns out to be poor.

Is the ECMWF ensemble large enough for tropical cyclone forecasts?  Not quite, although the track ensemble nearly always bounds the eventual track out to at least 10 days in advance.

The NCEP GEFS ensemble is too small, with a spread that is too small that often does not bound the actual track at the longer time horizons.

The National Hurricane Center has focused its efforts on intensity forecasts, developing a number of different high-resolution regional models that are forced by the GFS deterministic run.  These models produce an ensemble of intensity forecasts, which the NHC considers in its consensus forecast of intensity.  For tracks, they produce a consensus track, with a cone of uncertainty that is derived from the uncertainty historical forecasts.  The high-resolution simulations are all but useless if the GFS deterministic run isn’t any good.

Whereas CFAN produces an interactive cone of uncertainty based upon the ensemble of ECMWF and GEFS forecasts, and produces an ensemble intensity forecast using statistical models that are applied to each ensemble member.

I note that other weather modeling centers produce tropical cyclone tracks; however, CFAN focuses on ECMWF and NOAA GEFS since these are the models that receive the most scrutiny in terms of tropical cyclone tracks.

While the ECMWF was an absolute star relative to the NOAA models for Hurricane Sandy, we clearly don’t see the extreme dominance of ECMWF over the NOAA models for all hurricanes. The bottom line is that it is very useful to examine both models, and the forecasts from each model benefit significantly from CFAN’s calibrations for tracks and intensity.

Forecast applications

So, isn’t 5 days sufficient warning for a hurricane landfall?  It certainly is sufficient time for those intending to evacuate.  However, there are many different types of decisions that benefit from extended range outlooks of hurricane landfall impacts.

CFAN has clients in the energy trading sector, regional power providers and state/local emergency management.  The original driver for our extended range hurricane forecasts was for natural gas trading; our clients were looking to get a jump relative to the market forecasts so that they could make more lucrative natural gas trading decisions.  With regards to regional power providers, they are faced with the challenges of minimizing vulnerability to their electric power deliver and minimizing the duration of power outages.  This requires having maintenance personnel in place before the storm strikes.

While CFAN doesn’t have any clients in the retail and logistics sectors, having extended range tropical cyclone forecasts can help protect stock, stock stores to help citizens cope with the emergency (before and after the storm), and redirect transportation and shipping.  The insurance sector also uses hurricane forecasts to assess damage.

With the advent of social media, we have seen a major segment of the public information on tropical cyclones shift to the internet and social media.  WeatherUnderground provides superb, in depth blog posts.  Ryan Maue of WeatherBell and also Accuweather were very active in tweeting the latest information.

CFAN used to have a tropical cyclone forecast team making daily forecasts for our clients, but it was too labor intensive.  Further, as our web-based forecast products improved, there was little added benefit from human forecasters.  CFAN’s hurricane forecasts are disseminated by The Weather Company, so you can keep up with our forecasts by following WeatherUnderground and Michael Ventrice on twitter.


Returning to the premise of the NYTimes article — I would agree with Cliff Mass that NOAA is behind the times in terms of their use of ensemble forecasting methods for hurricane forecasts.  I would rather see the size of the GEFS ensemble increase, than focus on high resolution regional forecasts using multiple models.  Personally, I would drop the high-resolution model forecasts out beyond say 2-3 days from landfall,  unless they can be run in ensemble mode with different boundary forcings from different ensemble methods.

The paucity of U.S. landfalling hurricanes for the past decade, and particularly the lack of a major hurricane landfall, seems to have reduced the impetus and customer base for high quality, detailed hurricane forecast products.  Social media is taking over informing the public about hurricane risk, particularly outside the 5 day horizon for the National Hurricane Center forecasts.  Apart from the daunting challenge of predicting rapid intensification, the frontier in hurricane forecasting seems to be predicting high-resolution probabilistic landfall impacts.  CFAN is providing high-resolution probabilistic wind speeds at landfall for specific regions where we have clients.  We will see whether interest in such forecast products increases as a result of Matthew.


27 responses to “Post-mortem on the forecasts of Hurricanes Hermine and Matthew

  1. Pingback: Post-mortem on the forecasts of Hurricanes Hermine and Matthew – Enjeux énergies et environnement

  2. The ‘cone of uncertainty’ is another way of saying we don’t have a good handle on the ‘God factor,’ right?

  3. nice, the first storm heading in bold should be Hermine not Matthew.

  4. Curious George

    There is no occurrence of the word “supercomputer” in this post. It would be hard to find an article on “climate modelling” without it. A shift in resource allocation is indicated.

  5. Did the NYT contact Judith Curry while researching their article?

    • Nope. The fact that my company provides the best, state-of-the-art hurricane forecasts is largely unknown. I’ve tweeted this article on my personal twitter account, and also the CFAN account; so far not even a ‘like’. I will try tweeting again tomorrow after i figure out a more attention grabbing approach.

  6. I think the prediction by NOAA in 2015 of the trajectory of Hurricane Joaquin is very telling – note the uncertainty range:

    “The graphic above shows all of the NOAA forecast tracks, and the “cone of uncertainty” on October 1 and October 7. It is clear that they claim certainty much greater than is realistic.”
    Final Joaquin Scorecard

    • But they all seem to converge on the Bahamas.

      Oh. Nevermind.

      • Yes, …”seem to converge on the Bahamas”….if one reads then backwards…..

        Numeric weather models, like these used to predict hurricane paths, are “profoundly sensitive to initial conditions” — thus the very small differences in initial positions, winds speeds, air pressure readings etc. cause the ensemble members to give different results far greater than the proportional differences in these inputs. The results disperse, diverge, spread out, become more and more different as they project further and further into the future. What the ensemble members do not do i “converge”….

  7. Really interesting insider’s view of hurricane forecasting. While hurricanes are more an initial conditions problem than climate models (which tend to be more boundary condition e.g. Forcings determined), it certainly shows the difficulty in basing radical energy policy on climate models. Especially when these can be shown wrong now in three ways. 1. They produce a tropical troposphere hotspot that does not in fact exist. Even Santer’s new paper, which threw in stratosphere adjustments to reduce the discrepancy but that are physically irrelevant to the tropics, concluded by 1.7x. Big miss. 2. They missed the pause. 3. Their ECS is 2x what is observed since 1880.

  8. Why can’t the government do better at predicting extreme weather?

    Well … they weren’t able to predict that building western style social structures in the Levant and Transoxiana would take decades, be super expensive, and mostly impossible.

    Is healthcare economic modeling easier than weather modeling?
    Just wondering.

  9. Steven Mosher

    Has the calibration that is applied been validated against “reality”

    or has ECMWF been through IV&V?

    or does a model have to reflect reality perfectly to be of use to it’s users and customers?

    In the end, who decides if the model is useful? and

    where is the dang Damage function? ( peter lang is looking for it)

    • Curious George

      For a damage function, ask insurance companies.

    • Not only am l wanting to know the empirical evidence for the damage functions – used in IAMs to justify SCC, policy actions and the beliefs of Alarmists’ (like Mosher) that GHG emissions are dangerous – but so are all policy annalists who are involved in assessing climate policy options rationally.

    • Hurricane predictions are two-part- will this season have a lot of hurricanes, will that specific storm hit me. Living in a hurricane-prone area, I look at both.
      Over the 10-year unprecedented (love that word) hurricane drought, NOAA forecast five of the seasons to be “active” or “above normal,” three to be “normal” and two to be “below normal.”
      Two out of 10 isn’t very useful. And the reason for the article is because people are, shall we say unsatisfied, with tracking forecasts.
      There is another aspect to “useful.” Millions of people are now accustomed to believing that an official “above normal” or “normal” forecast means “you won’t see any hurricanes.” That is not useful, in fact it’s downright frightening to the people who have to plan for emergency response for a population that has no experience of hurricanes and no idea what to do when one is coming. It’s also frightening to those of us who live in those zones and now have to contend not only with the normal damage, but deal with lawn furniture missiles from people who didn’t prepare. When Isabel hit in ’03 people in my area boarded up doors and windows. When Matthew hit this summer there were people who had left the big table umbrellas open by their pools.

      • jeffnsails850: “Over the 10-year unprecedented (love that word) hurricane drought, NOAA forecast five of the seasons to be “active” or “above normal,” three to be “normal” and two to be “below normal.”
        Two out of 10 isn’t very useful”

        Yet the same so-called “scientists” STILL reckon they can predict the climate over the next 100 years, and World governments are planning to spend many trillions of dollars and cause the deaths of untold millions of people on the strength of those predictions.

        How much longer is the population at large going to put up with this buffoonery?

  10. Tweet by Roger Pielke, Jr:

    “President Obama saw 4 hurricane landfalls President GW Bush saw 18 BHO had the fewest landfalls per year of any President (to 1901) Lucky”

    • Hell, Obama (not to mention Hokey Schtick Mann) had a Nobel Prize before he even walked into the White House for mending the climate, what do you expect?

  11. A better question is why do the weather networks and weather awfulizers always choose to portray the worst possible choice of forecasts to highlight?

    • Jeff ==> I have spent the last 12 years in hurricane areas, mostly living on our boat. The greatest danger to human life is to those who “walk on the sunny side” and ignore warnings based on “hoping for the best” and belief that “it can’t happen to me”.

      This is a self-fulling prophecy — except when it isn’t. Usually, the hurricane predicted to landfall at your town, or at Cape Canaveral/Cocoa Beach, as a major hurricane does what Matthew did, changes course at the last minute and doesn’t hit there, hits someone else, or stays out at sea a bit longer (as did Matthew). This give people a false sense of security and they refuse to evacuate, don’t move their boat to the mangroves or other safe anchorage, don’t the hills.

      In short, people are generally stupid about hurricane threats.

      I have been in a predicted hurricane path a dozen times. I have prepared my boat for a hurricane hit five times, and been hit twice. I was always glad I had prepared. I did my own forecasting with hourly weather checks.

      I was happy for the three misses.

      At 10 pm Friday night, Matthew was still schedule to come ashore at Cocoa Beach as a Cat 3 hurricane. Had it done so, it would have been truly devastating — Cocoa Beach, Patrick AFB, Cape Canaveral/Kennedy Space Center are all on nothing more than a sand bar, with a maximum elevation of ten feet or so. My boat would have been there too (but for fortune). As you know, Matthew hinked east 20 or 30 miles and the eye stayed out to sea. Does that mean that all those people should have stayed in their homes and taken their chances? No No and No again.

      Preparing for a hurricane landfall takes days — and path prediction is only good out 24 hours, maybe 48. One has to prepare when one knows the chances are good you won’t get hit — but the loss is you are hit is enormous.

  12. One word – albeit hyphenated – non-linear.

  13. “for reference, Sandy made landfall on Oct 29”

    Um, not as a hurricane…

  14. It has been a few years since I was in the TC forecasting game, but I clearly remember that low latitude storms like Matthew are more prone to get to cat four or five. I was looking at Matthew as it made the Caribbean and was surprised to see how slowly the NHC intensified it. I think they were looking for some shear to hold it back. Back in the day, we didn’t use numerical models to forecast intensity, but it might be interesting to see if they consistently under-forecast intensity for low latitude storms. Chris Landsea might remember something about this.

    The track forecast for Matthew looked great to me. The real problem is that the public does not fully understand what they are getting and probably never will.

  15. Reblogged this on Climate Collections.