Something odd in the CRUTEM3 station errors Out of the blue I got a



Ilya went on to say that he was having a hard time reproducing the Met Office's numbers. I encouraged him to write a blog post with an example. He did that (and it looks like he had to create a



The trouble was Ilya's numbers didn't match the Met Office's. And his numbers weren't off by a constant factor or constant difference. They followed a similar pattern to the Met Office's, but they were not correct. At first I assumed Ilya was wrong and so I checked and double checked has calculations. His calculations looked right; the Met Office numbers looked wrong.



Then I wrote out the mathematics from the Brohan et al. paper and looked for where the error could be. And I found the source. I quickly emailed Ilya and boarded the plane to dream of CRUTEM and HadCRUT as I tried to sleep upright.



Mathematical Interlude



The station error consists of three components: the measurement error, the homogenisation error and the normal error. The first two are estimated in the paper as 0.03°C and 0.4°C respectively. The normal error is calculated from the standard deviation information in the station files.



The formula for the normal error for a single month, i, is as follows:







Unfortunately, the paper uses rather sloppy mathematical language because the N on the left is not the N on the right, the subscript i isn't defined, and so I am going to express this a bit more clearly as follows:







This means that normal error for month i is the standard deviation for month i (that's σ i ) divided by the square root of the number of years used to generate the normal values in the station files (which I call m i ). Typically we have:







because 30 years of data from 1961 to 1990 are used. In cases where less than 30 years are available (because of missing data) then a number less than 30 is used.



Now to get the station error, ε i , the three error components are joined together by







That works for any grid square where for any month there's just a single station reporting a temperature, but in general there are more. So when there are many station errors they are averaged using a



Suppose there are n stations each with a station error ε i,j (to which I've added the subscript j to differentiate them) then the final station error for a month i is as follows:









Return to narrative



What Ilya had discovered was that the formula above (from the paper) works only when there is a single station in a grid square. When there were two or more it failed; that's when he approached me asking for help.



What I discovered at the airport was that if you replaced the number 30 with 15 the formula worked and the values for station errors for grid squares containing exactly two stations were now correct.



Both Ilya and I came to the same conclusion that in fact the number 15 wasn't picked from thin air, but in fact was 30 divided by 2 (the number of stations in the grid square). We both tested this hypothesis on squares with more than two stations and found that it worked.



So it appears that the normal error used as part of the calculation of the station error is being scaled by the number of stations in the grid square. This leads to an odd situation that Ilya noted: the more stations in a square the worse the error range. That's counterintuitive, you'd expect the more observations the better estimate you'd have.



Examples



Ilya had shown me an example in 1947, but I didn't want to take his word for it (although he later showed me a program to check all the stations errors so I should have believed him), and so I took a look at three locations in January 1850. For these three locations all the data underlying CRUTEM3 had been released:



1. The grid square which consists of the single station 723660: this corresponds to the grid square with corner 35N, 105W. Here the Met Office data gives station errors of: 0.5072 0.5424 0.4857 0.4962 -1e+30 -1e+30 0.4407 0.4407 -1e+30 0.4756 -1e+30 0.5186 . The strange negative numbers are missing data (it's missing because in the underlying file there are no normals for 1850 in those months, although the actual normals aren't needed for the station error calculation so it doesn't matter). Using the formula from the paper give the correct answer: 0.5072 0.5424 0.4857 0.4962 0.4486 0.4756 0.4407 0.4407 0.4661 0.4756 0.5072 0.5186 . This makes sense since our correction value of 1 for 1 station in the square doesn't change the formula.



There is, however, something else wrong with this. The paper says that if less than 30 years of data are available the number m i should be set to the number of years. In 723660 there are only 17 years of data, so this station error appears to have been incorrectly calculated based on 30 years.



2. The grid square which consists of the two stations 753041, 756439: this corresponds to the grid square with corner 35N, 80W. Here the Met Office data gives station errors of: 0.6168 0.569 0.5452 0.4008 0.4345 0.3642 0.3373 0.353 0.3881 0.4624 0.4076 0.5767 and using the formula from the paper (without our correction): 0.4801 0.4496 0.4346 0.3472 0.3669 0.3264 0.3116 0.3202 0.3399 0.3836 0.3511 0.4545 . If a correction of 2 is used so that each σ i is divided by the square root of 15 instead of 30 the correct values are generated.



3. The grid square which consists of the four stations 720388, 724080, 756192, 756490: this corresponds to the grid square with corner 35N, 75W. Here the Met Office data gives station errors of: 0.5073 0.4409 0.4329 0.3361 0.3286 0.2905 0.2712 0.2807 0.2973 0.3739 0.3325 0.4613 and using the formula from the paper (without our correction): 0.3074 0.2807 0.2775 0.2417 0.2391 0.2264 0.2204 0.2233 0.2286 0.2552 0.2404 0.2887 . If a correction of 4 is used so that each σ i is divided by the square root of 7.5 instead of 30 the correct values are generated.



Conclusion



I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.



If these are bugs then it indicates that CRUTEM3 will need to be reissued because the error ranges will be all wrong.



I've emailed the Met Office asking them to help. If you see an error in our working please let us know! Out of the blue I got a comment on my blog about CRUTEM3 station errors. The commenter wanted to know if I'd tried to verify them: I said I hadn't since not all the underlying data for CRUTEM3 had been released. The commenter (who I now know to be someone called Ilya Goz) correctly pointed out that although a subset had been released, for some years and some locations on the globe that subset was in fact the entire set of data and so the errors could be checked.Ilya went on to say that he was having a hard time reproducing the Met Office's numbers. I encouraged him to write a blog post with an example. He did that (and it looks like he had to create a blog to do it). Sitting in the departures lounge at SFO I read through his blog post and Brohan et al. . Ilya's reasoning seemed sound, his example was clear and I checked his underlying data against that given by the Met Office.The trouble was Ilya's numbers didn't match the Met Office's. And his numbers weren't off by a constant factor or constant difference. They followed a similar pattern to the Met Office's, but they were not correct. At first I assumed Ilya was wrong and so I checked and double checked has calculations. His calculations looked right; the Met Office numbers looked wrong.Then I wrote out the mathematics from the Brohan et al. paper and looked for where the error could be. And I found the source. I quickly emailed Ilya and boarded the plane to dream of CRUTEM and HadCRUT as I tried to sleep upright.The station error consists of three components: the measurement error, the homogenisation error and the normal error. The first two are estimated in the paper as 0.03°C and 0.4°C respectively. The normal error is calculated from the standard deviation information in the station files.The formula for the normal error for a single month,, is as follows:Unfortunately, the paper uses rather sloppy mathematical language because theon the left is not theon the right, the subscriptisn't defined, and so I am going to express this a bit more clearly as follows:This means that normal error for monthis the standard deviation for month(that's) divided by the square root of the number of years used to generate the normal values in the station files (which I call). Typically we have:because 30 years of data from 1961 to 1990 are used. In cases where less than 30 years are available (because of missing data) then a number less than 30 is used.Now to get the station error,, the three error components are joined together by quadrature as follows:That works for any grid square where for any month there's just a single station reporting a temperature, but in general there are more. So when there are many station errors they are averaged using a root mean square and then divided by the square root of the number of stations.Suppose there arestations each with a station error(to which I've added the subscriptto differentiate them) then the final station error for a monthis as follows:What Ilya had discovered was that the formula above (from the paper) works only when there is a single station in a grid square. When there were two or more it failed; that's when he approached me asking for help.What I discovered at the airport was that if you replaced the number 30 with 15 the formula worked and the values for station errors for grid squares containing exactly two stations were now correct.Both Ilya and I came to the same conclusion that in fact the number 15 wasn't picked from thin air, but in fact was 30 divided by 2 (the number of stations in the grid square). We both tested this hypothesis on squares with more than two stations and found that it worked.So it appears that the normal error used as part of the calculation of the station error is being scaled by the number of stations in the grid square. This leads to an odd situation that Ilya noted: the more stations in a square the worse the error range. That's counterintuitive, you'd expect the more observations the better estimate you'd have.Ilya had shown me an example in 1947, but I didn't want to take his word for it (although he later showed me a program to check all the stations errors so I should have believed him), and so I took a look at three locations in January 1850. For these three locations all the data underlying CRUTEM3 had been released:1. The grid square which consists of the single station: this corresponds to the grid square with corner 35N, 105W. Here the Met Office data gives station errors of:. The strange negative numbers are missing data (it's missing because in the underlying file there are no normals for 1850 in those months, although the actual normals aren't needed for the station error calculation so it doesn't matter). Using the formula from the paper give the correct answer:. This makes sense since our correction value of 1 for 1 station in the square doesn't change the formula.There is, however, something else wrong with this. The paper says that if less than 30 years of data are available the numbershould be set to the number of years. In 723660 there are only 17 years of data, so this station error appears to have been incorrectly calculated based on 30 years.2. The grid square which consists of the two stations: this corresponds to the grid square with corner 35N, 80W. Here the Met Office data gives station errors of:and using the formula from the paper (without our correction):. If a correction of 2 is used so that eachis divided by the square root of 15 instead of 30 the correct values are generated.3. The grid square which consists of the four stations: this corresponds to the grid square with corner 35N, 75W. Here the Met Office data gives station errors of:and using the formula from the paper (without our correction):. If a correction of 4 is used so that eachis divided by the square root of 7.5 instead of 30 the correct values are generated.I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.If these are bugs then it indicates that CRUTEM3 will need to be reissued because the error ranges will be all wrong.I've emailed the Met Office asking them to help. If you see an error in our working please let us know! Labels: climate change Available Now The Geek Atlas

With this unique traveler's guide, you'll learn about 128 destinations around the world where discoveries in science, mathematics, or technology occurred or is happening now. Travel to Munich to see the world's largest science museum, watch Foucault's pendulum swinging in Paris, ponder a descendant of Newton's apple tree at Trinity College, Cambridge, and more. Each site in The Geek Atlas focuses on discoveries or inventions, and includes information about the people and the science behind them. With this unique traveler's guide, you'll learn about 128 destinations around the world where discoveries in science, mathematics, or technology occurred or is happening now. Travel to Munich to see the world's largest science museum, watch Foucault's pendulum swinging in Paris, ponder a descendant of Newton's apple tree at Trinity College, Cambridge, and more. Each site in The Geek Atlas focuses on discoveries or inventions, and includes information about the people and the science behind them. GNU Make Unleashed

230 pages of GNU Make from basics to advanced. Covering topics not covered in other GNU Make books such as: eliminating recursive make, doing arithmetic, Makefile debugging techniques and more.



Everything you wanted to know about making real Makefiles. Search Enter your search terms Web www.jgc.org Submit search form Previous Posts A compliment from The Times

New version of CRUTEM3 and HADCRUT3

John's Amazing Diet Secrets Revealed!

£1,000 for Bletchley Park thanks to The Geek Atlas...

The squawking Squaw King was stabbed in a stab bed...

Price drop on GNU Make Unleashed

A not very illuminating reply from the Met Office

Update list of my GNU Make articles

Stay classy, SoftwareFX, stay classy

Met Office has confirmed that it's ok to use the e... 230 pages of GNU Make from basics to advanced. Covering topics not covered in other GNU Make books such as: eliminating recursive make, doing arithmetic, Makefile debugging techniques and more.Everything you wanted to know about makingMakefiles. Subscribe to

Posts [Atom]