Around Dec 8, 2009, the UK Met Office released “value added” data for a “subset” of 1741 stations – see here, describing the release as follows:

The data downloadable from this page are a subset of the full HadCRUT3 record of global temperatures, which is one of the global temperature records that have underpinned IPCC assessment reports and numerous scientific studies. The data subset consists of a network of individual land stations that has been designated by the World Meteorological Organization for use in climate monitoring. The data show monthly average temperature values for over 1,500 land stations

In question 7 of the webpage, they asked and answered rhetorically:

7. Why are you releasing a subset of the data now? We can only release data from NMSs when we have permission from them to do so. In the meantime we are releasing data from a network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release.

Today, I’m going to do a quick analysis of the Hadley subset, which has some interesting attributes.

A “Subset”?

First, I checked whether the Hadley Subset is actually a subset of the CRU station list archived in response to Willis Eschenbach’s FOI request.

The Hadley subset is available in the following zipfile http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/All.zip . I’ve downloaded this zipfile and collated the station information into an organized list (which I’ll post up when I learn how to upload data to the reconfigured CA).

The CRU stationlist was formerly available at http://www.cru.uea.ac.uk/cru/data/landstations/crustnsused.txt (presently unavailable). I’d taken the precaution of saving a copy (which I’ll make available soon.)

Of 1741 Hadley Subset stations, 1521 had identification numbers matching CRU identification numbers. However, 220 Hadley Subset identifications did not match the CRU station list.

The metadata for the Hadley Subset states that the “source” of the data is either “Jones” or “Jones+Anders” (Anders Moberg being the coauthor of Jones and Moberg 2003, the “peer-reviewed” publication of CRUTEM2 in the litchurchur.)

What is one to make of this inconsistency? My surmise is that the CRU stationlist provided in response to the FOI request must have been inaccurate in respect to the omitted stations, though it’s also possible that there were some changes between 2007 and 2009. I strongly doubt that the Met Office obtained and collated station data from third party sources not via CRU.



Any Additional Data for which we have Permission to Release

The Met Office said that the Subset was a “network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release”.

CRU said that they would seek permission from national meteorological services (NMSs) to release station data. The Met Office didn’t report on the progress of this supposed program. However, if the Subset includes “any additional data for which we have permission to release”, as the Met Office has represented, then this should be evident in the station lists as follows: all stations in the CRU station list for an agreeing country would necessarily be in the Subset station list.

I compared the two lists by country. (This comparison is hampered by the appalling sloppiness of the country designations in both CRU and Met Office station lists. I know that climate scientists like to make fun of taking care of such details, but it doesn’t take that long to allocate 1741 stations to consistently spelled countries and it’s the sort of thing that’s helpful if you’re administering a set of confidentiality agreements. The 2-digit codes are related to countries, but don’t yield a precise matching.) In any event, I spent an hour and a half or so making a consistent allocation to countries and the Met Office is welcome to the results.

Denmark has been in the news lately. So let’s examine whether all the Danish CRU stations are in the Hadley Subset. Alborg and Koebenhavn are in the Hadley Subset, while CRU stations Vestervig, Tarm, Bogo, Hammerodde Fyr, Nordby and Tranebjerg aren’t. If we can rely on the Met Office statement that they included “any additional data for which we have permission to release”, this means that Denmark has not yet provided permission for release of 6 stations. (It’s a little odd to think that they would have consented to the release of 2 stations and not the other 6, but hey…)

Similar puzzles abound in the Hadley Subset. Other countries in a similar situation to Denmark – with some stations released and others not – include Norway, Sweden, Finland, Ireland, Iceland, Greenland, Netherlands, Switzerland, Spain, Portugal, Germany, Canada, USA, Australia, Russia, India, China … Actually most countries in the world.

The countries with complete CRU inventories in the Met Office Subset is a much shorter list: UK, France, New Zealand, some small countries like Latvia, Lithuania. The most surprising aspect of the list is that is dominated by Third World and especially African countries: Laos, Vietnam, Ecuador, Tunisia, Guinea, Senegal, Niger, Mali, Ethiopia, Somalia, Eritrea, Kenya, Uganda, Tanzania, Chad, Liberia, Cote d’Ivoire, Ghana, Benin, Zimbabwe, Zambia, Madagascar, Namibia.

It seems surprising that these countries would have answered the bell for permission to release data so much faster than European countries.

Reviewing the Met Office statement on the release:

In the meantime we are releasing data from a network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release.

My interpretation is that the “additional data for which we have permission” has made a negligible contribution to the Hadley Subset – indeed, it looks possible that the only “additional data” in the Hadley Subset comes from Britain itself (plus possibly France, New Zealand.)

The Designated Network

This raises an interesting question about the “network of stations designated by the World Meteorological Organisation for climate monitoring” and the permissions attached to this network, as it doesn’t appear that new permissions have a material impact on the release.

The Met Office did not provide a reference to a document describing the “network”, linking only to the WMO. Googling the phrase “network of stations designated by the World Meteorological Organisation for climate monitoring” leads to the GCOS network – a list of 1025 stations is here – and perhaps this is what the Hadley Center has in mind.

There is considerable overlap between the Hadley Subset and the GCOS network but neither is a subset of the other. In a first cut matching effort, I compared the 5-digit identification of the GCOS network to the first 5 digits of the Hadley identifier and matched 760 stations, leaving 981 unmatched. There didn’t appear to be any particular country pattern to the residuals.

Conversely, from the 1025 GCOS stations, there were the 760 matches leaving 265 unmatched.

This leaves a number of puzzle-type questions about the Hadley Subset:

– how were these selected?

– why is the Jones version of GCOS stations not considered confidential?

– why is the Jones version of non-GCOS stations in the Hadley Subset not considered confidential?

– why was this not made available last summer?

I’m not suggesting that climate science stands or falls on these questions. For now, they are merely interesting little puzzles.

Update (10 pm Dec 27). A reader observed in the comments that the Regional Basic Climatological Network (RBCN) dataset see here datalist ftp://ftp.wmo.int/wmo-ddbs/RBCN_DEC2009.xls has a much closer genetic relationship to the Hadley Subset than GCOS. Ive confirmed the closer relationship though not all problems are resolved.

The CRU data set (FOI version) contained 4158 stations of which 1521 were matched in the Hadley Subset and 2617 unmatched.

Of the 1521 Hadley Subset matches, the first five digits matched RBCN WMO numbers for 1477, with 44 unmatched. The 44 unmatched were from only a few jurisdictions: UK, France, New Zealand (three jurisdictions mentioned in my notes above) plus one from Liberia and one from Bahamas.

Conversely, of the 2617 CRU FOI stations excluded from the Hadley Subset, 145 were matched in the RBCN data set and 2472 were unmatched. The 145 stations matched in the RBCN data set but not included in the Hadley Subset were mostly from USA, Canada, Mexico and South Africa, with a few stragglers from Mongolia, Panama, Dominica, Israel and one from Russia (or as CRU refers to it – USSR). [Update Dec 28 – a reader observed below that these 145 stations all had a non-zero sixth digit. If an extended 6-digit RBCN identification is defined using the SUBINDEX field, there is not a match at the 6-digit level for 144 of the 145 stations. Using 6-digit identifications, of the 2617 FOI exclusions, there is only one common identification in the RBCN data set – 718011: St John’s West CDA in RBCN and St Johns UA in CRU).

Lets hypothesize for now that (1) the Hadley Subset consists of CRU stations in the RBCN network; (2) additional permissions to date are restricted to UK, France and New Zealand . Then the failure to include the 145 excluded matches might be sloppiness; or perhaps a different RBCN version; or perhaps we havent quite got the right network. [Update – as noted above, a reader has clarified that the 6-digit identification resolves the 145 stations.]

As noted in a prior post, the webpage stated:

The stations that we have released are those in the CRUTEM3 database that are also either in the WMO Regional Basic Climatological Network (RBCN) and so freely available without restrictions on re-use; or those for which we have received permission from the national met. service which owns the underlying station data.

I remain puzzled as to how the RBCN network would supersede the supposed confidentiality agreements.



