by Judith Curry
Reflections on forecasting hurricanes in light of U.S. landfalling Hurricanes Hermine and Matthew, highlighting the complexities of forecast ensemble interpretation.
A recent issue of the NYTimes Magazine has an article: Why Isn’t the U.S. Better at Predicting Extreme Weather? Subheading: Hurricanes like Matthew have laid bare the dirty secret of the National Weather Service: its technologies and methods are woefully behind the times.
The article highlights comments from Cliff Mass, a well known critic of U.S. (NOAA) weather forecasting (see in particular Cliff’s post on Matthew). A previous CE post on Cliff Mass’ articles is here. The article is well worth reading, as well as Cliff’s post on Matthew.
So, is the U.S. National Hurricane Center woefully behind the times in hurricane forecasting? The issue is not nearly as clear cut as implied by the NYTimes article. At the end of the day, the NHC does a pretty good job with hurricane forecasts out to 5 days.
In this post, I describe:
- Ensemble hurricane forecasting, including problems with NOAA’s approach in this context.
- The forecast history of Hurricanes Hermine and Matthew, highlighting the advanced forecast techniques used by my company Climate Forecast Applications Network (CFAN)
- The superb role of the private sector weather services companies in analyzing the forecasts, providing specialized insights for commercial clients, and communicating them with the public.
Ensemble hurricane forecasting
For background on ensemble hurricane forecasting see these previous articles at Climate Etc.:
From the NYTimes Magazine article:
While Mass is the most outspoken on the subject, many experts insist that if the Weather Service wants to meaningfully improve its predictions, it must employ a technique called ensemble forecasting. The basic premise is either to tweak the physics equations or to make repeated changes to a model’s variables: You might bump up the temperature slightly, for example, and then run the model again. After a half-dozen or so reruns, you get a set, or “ensemble,” of forecasts that can be compared with one another. When all the forecasts in an ensemble agree, it’s a reasonably sure bet that the predictions will pan out.
The NOAA GEFS does produce a global ensemble forecast (20 ensemble members) from which hurricane tracks are produced (note: ECMWF has 51 ensemble members). My post Hurricane Sandy Part n provides clarification on the National Hurricane Center’s approach:
The problem with the NOAA National Hurricane Center forecasts is more fundamental than the NOAA GEFS model not performing very well. The NHC forecasters are using 20th century forecast methods, while the private sector is using 21st century ensemble based methods. The NHC does a ‘poor mans’ ensemble, pooling the deterministic runs of different forecast models. They use only the NOAA GFS deterministic forecast to force the regional models (e.g. GWRF, GFDL) which they have counted on to provide better track and intensity forecasts — which have been all but useless because they are being forced by the erroneous large scale fields from the NOAA GFS forecast.
As an example of ensemble hurricane forecasts in the sense described in the NYTimes Magazine article, see this figure from Hurricane Sandy. The figure below shows forecast tracks from the models that were initialized on Oct 23 12Z (for reference, Sandy made landfall on Oct 29)
The raw tracks from the NOAA GFS model are shown in the right hand panel, probability distribution of the raw tracks from the ECMWF (European) is shown in the center panel, and CFAN’s calibrated tracks are shown in the left hand panel.
Does the ECMWF model always do the best on hurricane forecasts? In terms of tracks, sometimes the NOAA model does better on shorter timescales (particularly in the 3-4 day window), but beyond 4 days we have found that the ECMWF model is nearly always superior to the NOAA model.
With regards to CFAN’s track calibration: our calibration increases probabilistic skill on average by 8%, with 10% improvement for days 4-7 and 17% improvement for days 7-10.
The NYTimes Magazine article stated:
When all the forecasts in an ensemble agree, it’s a reasonably sure bet that the predictions will pan out.
All of the forecasts in an ensemble showing agreement does not mean it is a good bet that the predictions will pan out. The ensemble members may agree if the ensemble generation method produces an insufficiently disperse ensemble. Also, if a narrow ensemble is followed by a subsequent ensemble that is well outside the bounds of the previous ensemble, then the original ensemble forecast was essentially useless. The discussion of Hurricane Matthew below will highlight some complex issues of hurricane ensemble interpretation
Hurricane Hermine was the first hurricane to make landfall in Florida since Hurricane Wilma in 2005 (landfall on 9/2 as a Cat 1 hurricane).
We first spotted this African Easterly Wave on 8/11, as having the potential to develop into a tropical cyclone. By 8/18, both ECMWF and GFS identified this as a system with potential U.S. landfall impacts. By 8/28, both models had honed in on the eventual track for a landfall on the Florida Panhandle (our calibrated ECMWF forecasts caught it maybe a day earlier). In the interim 10 days, the big question was whether the strike would be on the Atlantic or Gulf Coast of Florida – for a few days around 8/24, both models suggested a scenario where the storm would head into the Gulf of Mexico and impact the oil production region.
In terms of intensity forecasts, by 8/29 CFAN’s ECMWF calibrated intensity forecasts settled down to a Tropical Storm or Cat 1, with a small probability of Cat 2.
On 8/29, NOAA’s HWRF was predicting Cat 4, while the others were predicting a tropical storm (NOAA’s consensus forecast was reasonable).
All in all, the models did a good job predicting Hermine’s landfall impacts 5 days in advance, with several weeks warning of a possible landfall strike in Florida.
Hurricane Matthew was the first Category 5 Atlantic hurricane since Hurricane Felix in 2007. Originating from a tropical wave that emerged off Africa on September 22, Matthew developed into a tropical storm on September 28, and Matthew became a hurricane on September 29, achieving Category 5 intensity the following day. Matthew was a strong Category 4 hurricane as it made its first landfall over Haiti on October 4. The storm then paralleled the coast of the southeastern United States, making its fourth and final landfall over South Carolina as a Category 1 cyclone on October 8, and turned away from Cape Hatteras, North Carolina on October 9.
The NOAA model spotted this one on 9/20, and the ECMWF picked it up two days later. Take a look at the difference in tracks between the two models on 9/26 (NOAA is on top, ECMWF is on bottom):
So which of these is more realistic? NOAA captured the Atlantic track early on and pretty much stuck with it, whereas ECWMF forecasts showed a wide spread. Given the forecast uncertainty at this point, I would say that ECMWF was more realistic, whereas NOAA’s ensemble was underdisperse. At this point, none of the ensemble members in either model closely tracks the eventual trajectory skirting the southeast Atlantic coast.
The next big forecast challenge was the explosive intensification that occurred on 9/29. None of the models captured this. On 9/28, our ECMWF intensity forecasts did show a slow creep upwards towards a major hurricane. Of the NOAA models, only GFDL showed a major hurricane about 4 days out. Forecasting such explosive rapid intensification is THE major challenge for hurricane forecasting.
The next major forecast challenge was whether/where there would be a U.S. landfall. It wasn’t until 9/30 until the NOAA model started to show some tracks striking the southeast U.S., whereas the ECMWF showed a broader range of possible tracks striking southeast, with the center of mass shifting to the Atlantic coast. On October 3, both models converged on a track that would skirt the Florida coast. To illustrate the impact of CFAN’s track calibration, compare the raw ECMWF tracks on 10/3 (top) with CFAN’s calibrated tracks (bottom):
And the NOAA track forecast initialized on 10/3:
The next forecast challenge was whether Matthew would head northwards after South Carolina (and strike North Carolina and/or locations to the north), or turn eastward and head out into the central Atlantic, or loop around to south.
The loop scenario was first suggested on 10/2 by the ECMWF deterministic/high resolution run. By 10/5 12 Z, the NOAA model had gone ‘full in’ for the loop scenario:
Whereas the ECMWF ensemble was split between the loop scenario and a scenario that went out to sea:
Again, we see that the NOAA ensemble was underdisperse, and the people in North Carolina were not expecting to be impacted as of 10/5. In fact, NOAA GEFS persisted with the loop scenario until 10/8.
So, there were good and bad aspects of the forecast for Matthew, with NOAA performing better for some aspects and ECMWF performing better for other aspects.
I would like to comment on CFAN’s high predictability track cluster, which was publicized in a series of posts at WeatherUnderground:
These posts referenced CFAN’s High Predictability Cluster (HPC), that selects the four ECMWF ensemble members [gray lines] that most closely match the deterministic (high resolution) run [red line] during the first 72 hours. Shown below is the HPC cluster forecast for 10/2:.
The idea behind this is that the high-resolution ‘deterministic’ run is expected to be the best out to 2-3 days; focusing on the ensemble members that match the deterministic most closely for the first few days should help cull the ensemble members to provide a narrower estimate of where the tracks might go.
Well, sometimes the deterministic run doesn’t do well — in fact, it was the first run to predict the loop scenario. Hence, for the period up to 10/8, the HPC went with the (erroneous) loop scenario.
So, how should we interpret the HPC, and when does it do better than the ensemble average or the full ensemble spread? Here is what the statistics say for 2009-2011 hurricane seasons for track error:
- When the spread is narrow, the high-resolution deterministic overall performs the best
- The High Predictability Cluster performs best beyond 108 hours, when the ensemble spread is large.
Matthew suggests that when there are two genuinely divergent scenarios, that the HPC may not help, if the deterministic solution turns out to be poor.
Is the ECMWF ensemble large enough for tropical cyclone forecasts? Not quite, although the track ensemble nearly always bounds the eventual track out to at least 10 days in advance.
The NCEP GEFS ensemble is too small, with a spread that is too small that often does not bound the actual track at the longer time horizons.
The National Hurricane Center has focused its efforts on intensity forecasts, developing a number of different high-resolution regional models that are forced by the GFS deterministic run. These models produce an ensemble of intensity forecasts, which the NHC considers in its consensus forecast of intensity. For tracks, they produce a consensus track, with a cone of uncertainty that is derived from the uncertainty historical forecasts. The high-resolution simulations are all but useless if the GFS deterministic run isn’t any good.
Whereas CFAN produces an interactive cone of uncertainty based upon the ensemble of ECMWF and GEFS forecasts, and produces an ensemble intensity forecast using statistical models that are applied to each ensemble member.
I note that other weather modeling centers produce tropical cyclone tracks; however, CFAN focuses on ECMWF and NOAA GEFS since these are the models that receive the most scrutiny in terms of tropical cyclone tracks.
While the ECMWF was an absolute star relative to the NOAA models for Hurricane Sandy, we clearly don’t see the extreme dominance of ECMWF over the NOAA models for all hurricanes. The bottom line is that it is very useful to examine both models, and the forecasts from each model benefit significantly from CFAN’s calibrations for tracks and intensity.
So, isn’t 5 days sufficient warning for a hurricane landfall? It certainly is sufficient time for those intending to evacuate. However, there are many different types of decisions that benefit from extended range outlooks of hurricane landfall impacts.
CFAN has clients in the energy trading sector, regional power providers and state/local emergency management. The original driver for our extended range hurricane forecasts was for natural gas trading; our clients were looking to get a jump relative to the market forecasts so that they could make more lucrative natural gas trading decisions. With regards to regional power providers, they are faced with the challenges of minimizing vulnerability to their electric power deliver and minimizing the duration of power outages. This requires having maintenance personnel in place before the storm strikes.
While CFAN doesn’t have any clients in the retail and logistics sectors, having extended range tropical cyclone forecasts can help protect stock, stock stores to help citizens cope with the emergency (before and after the storm), and redirect transportation and shipping. The insurance sector also uses hurricane forecasts to assess damage.
With the advent of social media, we have seen a major segment of the public information on tropical cyclones shift to the internet and social media. WeatherUnderground provides superb, in depth blog posts. Ryan Maue of WeatherBell and also Accuweather were very active in tweeting the latest information.
CFAN used to have a tropical cyclone forecast team making daily forecasts for our clients, but it was too labor intensive. Further, as our web-based forecast products improved, there was little added benefit from human forecasters. CFAN’s hurricane forecasts are disseminated by The Weather Company, so you can keep up with our forecasts by following WeatherUnderground and Michael Ventrice on twitter.
Returning to the premise of the NYTimes article — I would agree with Cliff Mass that NOAA is behind the times in terms of their use of ensemble forecasting methods for hurricane forecasts. I would rather see the size of the GEFS ensemble increase, than focus on high resolution regional forecasts using multiple models. Personally, I would drop the high-resolution model forecasts out beyond say 2-3 days from landfall, unless they can be run in ensemble mode with different boundary forcings from different ensemble methods.
The paucity of U.S. landfalling hurricanes for the past decade, and particularly the lack of a major hurricane landfall, seems to have reduced the impetus and customer base for high quality, detailed hurricane forecast products. Social media is taking over informing the public about hurricane risk, particularly outside the 5 day horizon for the National Hurricane Center forecasts. Apart from the daunting challenge of predicting rapid intensification, the frontier in hurricane forecasting seems to be predicting high-resolution probabilistic landfall impacts. CFAN is providing high-resolution probabilistic wind speeds at landfall for specific regions where we have clients. We will see whether interest in such forecast products increases as a result of Matthew.