For quite a while, I’ve urged people interested in gridded temperatures to really look at the SST data – realdata not adjusted data. SST makes up 2/3 of the record, but temperature critics spend 99.99% of their time on land data. In part, it’s because the data sets are much larger, but increased power of ordinary laptops is making these data sets accessible without work stations, whereas this was not the case even a few years ago. I’ve taken a first look at ICOADS – and since we’re talking climate science – naturally the data has some peculiar features. ICOADS collaters deserve great credit for their care with metadata – they obviously feel a responsibility towards the data that wasn’t felt by CRU (who notoriously kept only the “value added” version.) The data sets are large and rich and deserve a great deal of statistical analysis – basic cross-classifications as opposed to rushing off to make bucket adjustments. (I’m going to put this file away unfortunately, but commend it to others.)

While I was at Erice, I chatted with William Kininmonth of Australia quite a bit. I’m mildly interested in examining what the most unchanging tropical gridcell looks like – looking closely at different data sets for such a gridcell. I was thinking of something in tropical Pacific. Will suggested the gridcell containing Honiara (near Bougainville) about 9S, 160E. Almost at the border of two gridcells.

The range for Honiara in GISS dset1 (monthly) is from 25.1 to 27.5 deg C (data at GISS only for 1951-1986 for some inexplicable reason.) Now that I could extract ICOADS SST into R, I thought that it would be interesting to see what SST realdata looked like. It’s climate science, so naturally a surprise awaited.

I collated ICOADS data from ftp://ftp.ncdc.noaa.gov/pub/data/icoads/ into R-objects from 1900 to 1980. This was a very short script, but it probably took over 8 hours. The data sets get very large as one approaches the present. In 1900, the data set is 6 MB, by 1980 is 160 MB and in 2006 was over 1 GB. The data sets have all sorts of information – wind speeds, lots of coded meta data and several versions of the data seem to be carried in the same record ( I haven’t decoded all the fields yet.) Each measurement has a different record, so you see why it is such a large field.

The most relevant fields for SST purposes seem to be the time of measurement (year, month, day, hour), lat, long, air temp, SST, plus metadata such as country and, as I found out, the “deck” from which the measurement originated. I extracted the data from the two gridcells adjacent to Honiara (157 and 162E; 7.5S), yielding 21,138 records from 1900-1980 with SST values ( I didn’t bring it forward only because the later data sets get very large and I was experimenting.)

As a plain vanilla exercise, I plotted the average temperature by month for these gridcells with minimal variation yielding the following strange series.



Figure 1. Average monthly SST from ICOADS for two Honiara gridcells

For comparison, here is the GISS dset1 version for the Honiara station plotted on the same y-scale. Why were these so different?



Figure 2. Honiara temperature (GISS dset1)

Re-examining the data, the SST data included records with both negative SST and air temperature. This is easy to show after the fact, but not so easy in the first pass. I experimented with permutations of typos in the SST column before it became clear to me that both air temperature and SST were negative. Here is data for January 1973 during a problem period. Obviously something was wrong if Honiara has below zero SST and air temperature.



Figure 3: Honiara gridcells+ SST versus Air Temperature

Parsing the January 1973 data, it turned out that all the negative data came from “deck” 732. Here’s a barplot of the January 1973 average of SST measurements by “deck”.



Figure 4. Honiara gridcells January SST by deck.

I tracked down information on deck identifications at ICOADS here , which showed that deck 732 derived from:

Russian Marine Met. Data Set (MORMET) (received at NCAR)

Which perhaps explains the negative values, though it’s not the most obvious source of data for Honiara gridcells.

The problem seems to arise in the lat-longs of the Russian data as transcribed by NCAR. I presume that this sort of thing gets screened out as unphysical when the gridded versions are made, but it seems like the sort of thing that people in the trade might have picked up by now – I noticed it in my first couple of hours of handling the data. It is very much to the credit of ICOADS compilers that they’ve preserved metadata with such diligence and making this sort of check possible. Collectors of marine data seem to feel an obligation towards their data that was obviously not felt by CRU when they discarded their underlying station data (keeping only the “value-added” version).

I don’t know how the SST grids check for unphysical values but one would presume that this error has not arisen in the original data but in the handling of it at NCAR and is correctable.

More problematic is that even if one scraps the obviously unphysical Russian data, the differences between values by “deck” remain very large as shown below – ranging from 28.3 deg C to 30.6 deg C in this very stable gridcell.



The largest value came from “International Marine (US- or foreign-keyed ship data)”, the lowest value from “Autodin (US Dept. of Defense Automated Digital Network)”.

I did a similar experiment with data extracted from gridcells near Hawaii (where I compared CRU and Ersst data on another occasion.) Here is a similar example for gridcell 22.5N , 157.5W, where there is 3.2 deg C difference in the average between Netherlands and Germany/USSR. There is a similar pattern in gridcell 17.5N 157W where Netherlands was again the lowest.

It seems odd that the scientists in this field are able to make bucket adjustments to tenths of a degree in the 19th century when differences as large as this arise in the 1970s. Since Netherlands accounts for a larger proportion of early data than later data, it would be interesting to check whether this is a potential bias.

Please do not interpret these as anything more than notes. The ICOADS data set is huge; the compilers have taken care to preserve enough metadata to make possible some interesting statistical analyses. I don’t think that I’ll follow up much further right now as these sorts of data sets, no matter how interesting, quickly become black holes for time. However, I urge some of the people who worry about UHI to start looking at SST data. I’ve posted up some tools to facilitate access to ICOADS.



