This is the file that has the data in that I talk about in this post, in case you aren’t interested in the rest of waffle

An article appeared on the BBC website a couple of days ago with the headline ‘Pollution Hotspots Revealed’. It’s a webpage with a widget that lets you put your postcode in, and it gives you an indication, on a scale of 1–6, how polluted the air is in that postcode. Pretty cool, and useful if you want to check air quality levels where you live.

The data has been prepared by EarthSense, a company who specialise in air quality monitoring, measuring and modelling tools.

Of the data, the BBC says:

In the study, each area is rated on a scale from 1, least polluted, to 6, most polluted. More than four in five postcodes in Great Britain fall into the least polluted category (1). The scale is worked out based on a probability that each area will break the annual legal limit for NO2. Areas fall foul if they average more than 40 micrograms of NO2 per cubic metre. Fewer than 1% of postcodes are rated either 4 or 5, and no postcodes in Great Britain fall into the highest category — six — for areas that average over 100 micrograms of NO2 per cubic metre.

There is a link for more information, and following that takes us to EarthSense website, where there are FAQs, a technicolor map, and the opportunity to buy the data, for £25 per square km.

Now it’s all well and good that over 80% of postcodes in GB fall into the least polluted category. But that means nearly one in five postcodes are not ‘good’. With a tool like this, the first postcode I check is my home. The second is my kids’ school — after all, they spend over 30 hours a week there. But after that, I’m stuck, because I don’t really know any other postcodes.

With tools like this, I’d much rather have all the data, so I can look at it in greater detail. Having access to the underlying data also makes it possible for other people to make good things with it, and it helps people who might want to use the data to lobby local/central government, or transport authorities, or whoever to make their environment better.

Because of this, and egged on by a few people in real life, I decided to write a scraper that used the BBC tool to get the scores for all schools in Greater Manchester. After much googling & stack overflowing, it ended up being pretty straightforward — I used python and selenium to mimic the act of typing a postcode in the box and pressing enter, then grabbing the result and putting it into a text file. Here’s the script.

The output is this text file with the scores for all schools in Greater Manchester. Please feel free to do what you want with it — make a map; send it to your local councillor/MP; write a blog-post. But whatever you do, please remember to include a credit to MappAir with this statement (as per this document):

MappAir®100 © EarthSense Systems Limited [year of publication]

The Data

The detail of the spreadsheet shows that there is one school in Greater Manchester with a rating of 5 out of 6, which according to EarthSense means:

Pollution concentrations in this area are likely to frequently exceed WHO guidelines and regulatory limits. Annual average concentrations are highly likely to be above 40 ug/m3 of NO2 with associated health impacts. Residents should monitor air quality forecasts and manage exposure appropriately.

This is St. Mary’s Roman Catholic Primary School in Stockport. I looked on google maps, and it sits right in the shadow of the M60.

St Mary’s Roman Catholic School in Stockport

If I were a parent, governor, or teacher at that school, I’d want to know what was being done about this.

There are 5 schools (actually 3 schools and 2 universities) who have a score of 4 out of 6.

The schools in GM that are in the most-polluted postcodes

EarthSense say this means:

There are likely to be regular episodes of moderate pollution in these areas, with annual average concentrations above regulatory guidelines. Sensitised individuals (eg. Asthma sufferers) should manage exposure and exercise levels accordingly and there are likely to be some health impacts of long-term exposure.

This still sounds pretty serious, and I’d want to find out more about what’s being done to mitigate this.

No schools in GM have a score of 3 out of 6, while 326 have a score of 2 out of 6. According to EarthSense, a score of 2 means:

The air in your area is generally cleaner than the regulatory limits and should not cause health concerns except in exceptional weather conditions.

hmmm

Not bad at all, but that phrase “should not cause health concerns except in exceptional weather conditions” leaves a bit of room for badness.

Method

This isn’t a how-to guide, but if you want to recreate it, here’s what I did. I used the Get Information About Schools service from the Department for Education as my source of schools’ data. I used Greater Manchester because it’s a subset of schools that I’m familiar with, and scraping 1,400 schools and children’s centres took about 15 minutes. 30,000 would take hours, and I don’t have that sort of time. I’d probably be happy to do other subsets if anyone’s interested. The python script is here. If you want to run it yourself, you’ll need a csv file with school name, urn, postcode and local authority in. Of course, you could adapt it to use postcodes of houses, or football stadia, or dunkin’ donuts, or whatever.

‘Legal’

I’m still not sure what is legal and what isn’t when scraping stuff. This data is in the public domain, as it’s on the BBC website, and I could have manually typed 1,400 postcodes into the tool and put it into a spreadsheet. But after the great Open Data Manchester Metrolink throwdown, I’m cautious about what is and isn’t allowed. I’m not recreating the entire database here, but still…

Any feedback very welcome — comments, criticism, discussion, love, etc.