It is a truism that all models are wrong. Just as no map can capture the real landscape and no portrait the true self, numerical models by necessity have to contain approximations to the complexity of the real world and so can never be perfect replications of reality. Similarly, any specific observations are only partial reflections of what is actually happening and have multiple sources of error. It is therefore to be expected that there will be discrepancies between models and observations. However, why these arise and what one should conclude from them are interesting and more subtle than most people realise. Indeed, such discrepancies are the classic way we learn something new – and it often isn’t what people first thought of.



The first thing to note is that any climate model-observation mismatch can have multiple (non-exclusive) causes which (simply put) are:

The observations are in error The models are in error The comparison is flawed

In climate science there have been multiple examples of each possibility and multiple ways in which each set of errors has arisen, and so we’ll take them in turn.

1. Observational Error

These errors can be straight-up mistakes in transcription, instrument failure, or data corruption etc., but these are generally easy to spot and so I won’t dwell on this class of error. More subtly, most of the “observations” that we compare climate models to are actually syntheses of large amounts of raw observations. These data products are not just a function of the raw observations, but also of the assumptions and the “model” (usually statistical) that go into building the synthesis. These assumptions can relate to space or time interpolation, corrections for non-climate related factors, or inversions of the raw data to get the relevant climate variable. Examples of these kinds of errors being responsible for a climate model/observation discrepancy range from the omission of orbital decay effects in producing the UAH MSU data sets, or the problems of no-modern analogs in the CLIMAP reconstruction of ice age ocean temperatures.

In other fields, these kinds of issues arise in unacknowledged laboratory effects or instrument calibration errors. Examples abound, most recently for instance, the supposed ‘observation’ of ‘faster-than-light’ neutrinos.

2. Model Error

There are of course many model errors. These range from the inability to resolve sub-grid features of the topography, approximations made for computational efficiency, the necessarily incomplete physical scope of the models and inevitable coding bugs. Sometimes model-observation discrepancies can be easily traced to such issues. However, more often, model output is a function of multiple aspects of a simulation, and so even if the model is undoubtedly biased (a good example is the persistent ‘double ITCZ’ bias in simulations of tropical rainfall) it can be hard to associate this with a specific conceptual or coding error. The most useful comparisons are then those that allow for the most direct assessment of the cause of any discrepancy.”Process-based” diagnostics – where comparisons are made for specific processes, rather than specific fields, are becoming very useful in this respect.

When a comparison is being made in a specific experiment though, there are a few additional considerations. Any particular simulation (and hence diagnostic from it) arises as a result from a collection of multiple assumptions – in the model physics itself, the forcings of the simulation (such as the history of aerosols in a 20th Century experiment), and the initial conditions used in the simulation. Each potential source of the mismatch needs to be independently examined.

3. Flawed Comparisons

Even with a near-perfect model and accurate observations, model-observation comparisons can show big discrepancies because the diagnostics being compared while similar in both cases, actually end up be subtly (and perhaps importantly) biased. This can be as simple as assuming an estimate of the global mean surface temperature anomaly is truly global when it in fact has large gaps in regions that are behaving anomalously. This can be dealt with by masking the model fields prior to averaging, but it isn’t always done. Other examples have involved assuming the MSU-TMT record can be compared to temperatures at a specific height in the model, instead of using the full weighting profile. Yet another might be comparing satellite retrievals of low clouds with the model averages, but forgetting that satellites can’t see low clouds if they are hiding behind upper level ones. In paleo-climate, simple transfer functions of proxies like isotopes can often be complicated by other influences on the proxy (e.g. Werner et al, 2000). It is therefore incumbent on the modellers to try and produce diagnostics that are commensurate with what the observations actually represent.

Flaws in comparisons can be more conceptual as well – for instance comparing the ensemble mean of a set of model runs to the single realisation of the real world. Or comparing a single run with its own weather to a short term observation. These are not wrong so much as potentially misleading – since it is obvious why there is going to be a discrepancy, albeit one that doesn’t have much implications for our understanding.

Implications

The implications of any specific discrepancy therefore aren’t immediately obvious (for those who like their philosophy a little more academic, this is basically a rephrasing of the Quine/Duhem position on scientific underdetermination). Since any actual model prediction depends on a collection of hypotheses together, as do the ‘observation’ and the comparison, there are multiple chances for errors to creep in. It takes work to figure out where though.

The alternative ‘Popperian’ view – well encapsulated by Richard Feynman:

… we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong.

actually doesn’t work except in the purest of circumstances (and I’m not even sure I can think of a clean example). A recent obvious counter-example in physics was the fact that the ‘faster-than-light’ neutrino experiment has not falsified special relativity – despite Feynman’s dictum.

But does this exposition help in any current issues related to climate science? I think it does – mainly because it forces one to think about the other ancillary hypotheses are. For three particular mismatches – sea ice loss rates being much too low in CMIP3, tropical MSU-TMT rising too fast in CMIP5, or the ensemble mean global mean temperatures diverging from HadCRUT4 – it is likely that there are multiple sources of these mismatches across all three categories described above. The sea ice loss rate seems to be very sensitive to model resolution and has improved in CMIP5 – implicating aspects of the model structure as the main source of the problem. MSU-TMT trends have a lot of structural uncertainty in the observations (note the differences in trends between the UAH and RSS products). And global mean temperature trends are quite sensitive to observational products, masking, forcings in the models, and initial condition sensitivity.

Working out what is responsible for what is, as they say, an “active research question”.

Update: From the comments:

“our earth is a globe

whose surface we probe

no map can replace her

but just try to trace her”

– Steve Waterman, The World of Maps

References