There is an interesting news article ($) in Science this week by Paul Voosen on the increasing amount of transparency on climate model tuning. (Full disclosure, I spoke to him a couple of times for this article and I’m working on tuning description paper for the US climate modeling centers). The main points of the article are worth highlighting here, even if a few of the characterizations are slightly off.

The basic thrust of the article is that climate modeling groups are making significant efforts to increase the transparency and availability of model tuning processes for the next round of intercomparisons (CMIP6). This partly stems from a paper from the MPI-Hamburg group (Mauritsen et al, 2012), which was perhaps the first article to concentrate solely on the tuning process and the impact that it has on important behaviour of the model (such as it’s sensitivity to increasing CO 2 ). That isn’t to say that details of tunings were not discussed previously, but the tendency was to describe them briefly in the model description papers (such as Schmidt et al. (2006) for the GISS model). Some discussion has appeared in IPCC reports too (h/t Gareth Jones), but not in much depth. Thus useful information was hard to collate and compare across all model groups, and it turns out that matters.

For instance, if some analyses of the model ensemble tries to weight models based on some their skill compared to observations, it is obviously important to know whether a model group tuned their model to achieve a good result or whether it arose naturally from the the basic physics. In a more general sense this relates to whether “data accommodation” improves a model predictive skill or not. This is quite subtle though – weather forecast models obviously do better if they have initial conditions that are closer to the observations, and one might argue that for particular climate model predictions that are strongly dependent on the base climatology (such as for Arctic sea ice) tuning to the climatology will be worthwhile. The nature of the tuning also matters: allowing an uncertain parameter to vary within reasonable bounds and picking the value that gives the best result, is quite different to inserting completely artificial fluxes to correct for biases. Both have been done historically, but the latter is now much rarer.

A recent summary paper in BAMS (Hourdin et al., 2016) discussed current practices and gave results from a survey of the modeling groups. In that survey, it was almost universal that groups tuned for radiation balance at the top of the atmosphere (usually by adjusting uncertain cloud parameters), but there is a split on pratices like using flux corrections (2/3rds of groups disagreed with that). This figure gives some more details:





Summary results on tuning practices from the survey of CMIP5 modeling groups published in Hourdin et al. (2016).



The Science article though does make some claims that I don’t think are correct. I assume these are statements that are paraphrases from scientists that the writer talked to, but they would have been better as quotes, as opposed to generalisations. For instance, the article claims that

“… climate modelers [will now] openly discuss and document tuning in ways that they had long avoided, fearing criticism by climate skeptics.

…

The taboo reflected fears that climate contrarians would use the practice of tuning to seed doubt about models— and, by extension, the reality of human driven warming. “The community became defensive,” [Bjorn] Stevens says. “It was afraid of talking about things that they thought could be unfairly used against them.”

This is, I think, demonstrably untrue, since tuning has been discussed widely in papers including here on RealClimate. Perhaps it does reflect some people’s opinion, but it is not true generally.

The targets for tuning are vary across groups, and again, it matters which you pick. Tuning to the seasonal cycle, or to the climatological average, or to the variance of some field – which can be well characterised from observations, is different to tuning to a transient change of over time – which is often less well known. Indeed, many groups specifically leave transient changes out of their tuning procedures in order to maintain those trends for out-of-sample evaluation of the model (approximately half the groups according to the Hourdin et al survey).

The article says something a little ambiguous on this:

Indeed, whether climate scientists like to admit it or not, nearly every model has been calibrated precisely to the 20th century climate records—otherwise it would have ended up in the trash. “It’s fair to say all models have tuned it,” says Isaac Held.

Does that mean the global mean surface temperature trends over the 20th Century, or just that some 20th Century data is used? And what does ‘precisely’ mean in this context? The spread of 20th Century trends (1900-1999) in the CMIP5 simulations [0.25,1.17]ºC is clearly too broad to be the result of precisely tuning anything! On a similar issue, the article contains an example of the MPI-Hamburg model being tuned to avoid a 7ºC sensitivity. That is probably justified since there is plenty of evidence to rule out such a high value, but tuning to a specific value (albeit within the nominal range of 2 to 4.5ºC) is not justified. My experience is that most groups do not ‘precisely’ tune their models to 20th Century trends or climate sensitivity, but given this example and the Hourdin results, more clarity on exactly what is done (whether explicitly or implicitly) is needed.

One odd comment relates the UK Met Office/Hadley Centre models:

Proprietary concerns also get in the way. For example, the United Kingdom’s Met Office sells weather forecasts driven by its climate model. Disclosing too much about its code could encourage copycats and jeopardize its business.

It would be worrying if the centers didn’t discuss tuning in the science literature through fear of commercial rivals, and I don’t think this really characterises the Hadley Centre position. Some groups code’s (incl. the Hadley Center) are however restricted for various reasons, though I personally see that as an unsustainable position in the long-term if groups want to partake in international model intercomparisons that will be used for public policy.

The article ends up on an interesting note:

Daniel Williamson, a statistician at the University of Exeter in the United Kingdom, says that centers should submit multiple versions of their models for comparison, each representing a different tuning strategy. The current method obscures uncertainty and inhibits improvement, he says. “Once people start being open, we can do it better.”

I think this is exactly right. We should be using alternate tunings to expand the representation of structural uncertainty in the ensemble, and I hope many of the groups will take this opportunity to do so.

References