

use strict;

use warnings;



# Program to read data released by the UK Met Office on 08/12/2009 and

# produce a KML file showing the location of all of the temperature

# stations in the released data. This KML file can be imported to

# Google Earth or Google Maps for display.

#

# In reading the Met Office data the program also looks for missing

# information (such as missing fields in the files and missing data in

# the observations). It also checks for any oddities such as

# non-numeric values in numeric fields.



# A directory called MetOffice08122009 is assumed to exist and contain

# a collection of two-digit subdirectories which contain the actual

# obseravtion files.



my $top = 'MetOffice08122009/';

my @subdirs = glob "$top*";



# Verify that all the subdirectories have two digits and that there's

# no extraneous data. In doing so build up a list of all the

# observation files. These should also be in a specific form: each

# has a six digit name where the first two digits correspond to the

# two digits in the directory name.

#

# If this code doesn't force an error the @obs array while contain the

# names of all the observation files in the released data. Each entry

# will be the path to that file.



my @obs;



foreach my $s (@subdirs) {

if ( $s !~ /^$top(\d\d)$/ ) {

die "Unexpected subdirectory $s

";

}



# Captures the two-digit portion of the directory name

my $two = $1;



my @o = glob "$top$two/*";

foreach my $f (@o) {

if ( $f !~ /^$s\/$two\d{4}$/ ) {

die "Unexpected file $f

";

} else {

push @obs, ($f);

}

}

}



print "Found ", $#obs+1, " observation files

";



# This hash contains the fields that are expected in the released data

# and maps to a regular expression for recognizing the data. These

# expressions are used to look for anomalies (such as an alpha

# character in a numeric field).



my %fields = (

'Number' => qr/^\d{6}$/,



# The Name and Country fields contain all sorts of extra characters

# that you might not expect e.g. SPAIN is followed by a line of -

# characters



'Name' => qr/^[A-Za-z \.\-\/\(\)\d\*',]+$/,

'Country' => qr/^[A-Z \.\-\(\)']+$/,

'Lat' => qr/^-?\d{1,2}\.\d$/,

'Long' => qr/^-?\d{1,3}\.\d$/,

'Height' => qr/^-?\d+$/,

'Start year' => qr/^[12]\d{3}$/,

'End year' => qr/^[12]\d{3}$/,

'First Good year' => qr/^[12]\d{3}$/,

'Source ID' => qr/^\d+$/,

'Source file' => qr/^.+$/,

'Jones data to' => qr/^[12]\d{3}$/,

'Normals source' => qr/^.+$/,

'Normals source start year' => qr/^[12]\d{3}$/,

'Normals source end year' => qr/^[12]\d{3}$/,

'Normals' => qr/^(\s*-?\d+\.\d\s*){12}$/,

'Normals source percent availability' => qr/^\d+$/,

'Normals source variable code' => qr/^\d+$/,

'Standard deviations source' => qr/^.+$/,

'Standard deviations source start year' => qr/^[12]\d{3}$/,

'Standard deviations source end year' => qr/^[12]\d{3}$/,

'Standard deviations' => qr/^(\s*-?\d+\.\d\s*){12}$/

);



# Now read through each of the observation files and verify that they

# have all the fields that are required. Extract the latitude and

# longitude and use these to fill in hash elements named 'latitude'

# and 'longitude' in the %observations hash. This hash is keyed by

# the observation number.

#

# For example an entry in this hash might look like

#

# $observations{988510}

# $observations{988510}{latitude} = 6.1

# $observations{988510}{longitude} = -125.2

# $observations{988510}{name} = 'REJAH BUAYAN'



my %observations;



foreach my $f (@obs) {

if ( open F, "<$f" ) {



# Parse out the name of the file to extract the expected Number

# which should match the Number field inside the file



$f =~ /$top\d{2}\/(\d{6})/;

my $number = $1;



# This will get set to 1 as soon as we find the Obs: marker in the

# file and hence are done with the header portion and are on to

# the actual observations



my $doing_obs = 0;



# This is used to make sure that the observations are all there,

# i.e. that there are no gaps in the years in a single file



my $year_obs = 0;



while (<F>) {

chomp;



my $line = $_;



if ( !$doing_obs ) {



# The line should have an = sign in it, or it should be the

# start of the actual observations which begins with Obs:



if ( $line =~ /^([^=]+)\s*=\s*(.+)/ ) {

my ( $field_name, $field_value ) = ( $1, $2 );



if ( exists($fields{$field_name}) ) {

if ( $field_value =~ /$fields{$field_name}/ ) {



# Check that the Number field is the same as the number

# in the file name



if ( ( $field_name eq 'Number' ) && ( $field_value != $number ) ) {

die "Number field $field_value doesn't match file name $f

";

}



# Capture latitude and longitude and name



if ( $field_name eq 'Lat' ) {

$observations{$number}{latitude} = $field_value;

}

if ( $field_name eq 'Long' ) {



# Note that I am flipping the sign on the longitude here because

# it appears that the Met Office is swapping east and west

# compared to how KML thinks about longitude



$observations{$number}{longitude} = -$field_value;

}

if ( $field_name eq 'Name' ) {

$observations{$number}{name} = $field_value;

}

} else {

die "Value for field $field_name in $f is incorrect ($field_value)

";

}

} else {



# Since I am not using any extraneous data I just report

# it and carry on.



print STDERR "Unexpected header value $field_name in file $f

";

}

} elsif ( $line =~ /^Obs:$/ ) {

$doing_obs = 1;

} else {

die "Unexpected line [$line] in $f

";

}

} else {



# Observation lines should start with a four figure year

# followed by 12 observations. The years should be contiguous.



if ( $line =~ /([12]\d{3})\s+(\s*-?\d+\.\d\s*){12}/ ) {

if ( $year_obs == 0 ) {

$year_obs = $1;

} else {

$year_obs += 1;

}



if ( $year_obs != $1 ) {

die "Gap in observation record at year $1 in file $f

";

}

} else {

die "Unexpected observation line [$line] in $f

";

}

}

}



close F;



# Verify that we have captured latitude, longitude and name



if ( !exists( $observations{$number}{latitude} ) ||

!exists( $observations{$number}{longitude} ) ||

!exists( $observations{$number}{name} ) ) {

die "Incomplete information for $f

";

}

} else {

die "Failed to open observation file $f

";

}

}



# Now use the %observations hash to build the output KML file, this

# will be stored in a file called MetOffice08122009.kml.

#

# IMPORTANT NOTE: this is currently outputting an altitude of 0 for

# each observation.



my $kml = 'MetOffice08122009.kml';



if ( open F, ">$kml" ) {

print F<<EOF;

<?xml version="1.0" encoding="UTF-8"?>

<kml xmlns="http://www.opengis.net/kml/2.2">

<Document>

<name>$kml</name>

EOF



foreach my $o (sort keys %observations) {

print F<<EOF;

<Placemark>

<name>$observations{$o}{name}</name>

<description>$o</description>

<Point>

<coordinates>$observations{$o}{longitude},

$observations{$o}{latitude},0</coordinates>

</Point>

</Placemark>

EOF

}



print F<<EOF;

</Document>

</kml>

EOF



} else {

die "Failed to create output KML file $kml

";

}



The UK Met Office has released a subset of the HadCRUT data that contains temperature observations from 1729 stations around the world. To get started with this data you need to go here I downloaded the data and wrote a program to verify its integrity (looks ok so far) and then extract information about the stations and plot them. To so this I created a KML file from the observation files. Important note: if you are repeating what I am doing you will need to flip the sign on the longitude (The Met Office and KML have a different idea about what the sign means on longitude).The first step was to import this into Google Maps. You can view the map of all the observation stations here . Here's a snapshot of the UK.You can grab the actual KML file I generated here This can also be imported to Google Earth where you can see all the points.Here's the code I used to generate it:Take it under the General Public License Next step: map the temperature data onto Google Earth.

Labels: climate change, pseudo-randomness