The UK, like many other countries, runs a food hygiene inspection system that tries to ensure that establishments with poor hygiene standards improve or are shut down. As is often the case, the data collected for operational reasons can provide a rich source of insight when viewed as a whole.

Questions like “Where in the UK has the poorest food hygiene?”, “What kinds of places are the most unhygienic?”, and “What kinds of food are the most unhygienic?” spring to mind. I thought I would apply Mathematica and a little basic data science and provide the answers.

The collected data, over half a million records, is updated daily and is openly available from an API, but this API seems to be targeted at performing individual lookups, so I found it more efficient to import the 414 files from this site instead.



All data in this blog post was extracted on July 15, 2016. You can find the Wolfram Language code for importing and other utilities in the CDF at the foot of this post.

Oxford

As a warmup, I started with somewhere I knew, so here is the Dataset representing my local city of Oxford.





There are 1,285 places to buy food in Oxford, and the rating scheme grades them from 0 (“Urgent improvement necessary”) through 5 (“Very good”).

We can throw the ratings onto a map of Oxford and see, as I would expect, concentrations of establishments around the tourist center and along major arterial roads.

We can see that the vast majority are rated 4 or 5 (in green). We should only be concerned about the 0, 1, and 2 ratings (“Urgent improvement necessary”, “Major improvement necessary”, and “Improvement necessary”), so let’s look at just those.

There are obvious clusters in the center (where all the tourists go) and along Cowley Road (leading to Temple Cowley), which is where a lot of students live. But these also have lots of good establishments. So to normalize for that, we must find the average rating for a location. Since no two establishments are in exactly the same place, I need to create a function that collects all the data within a certain distance of a geo position and finds the average rating.

We can now run that function over the entire map grid to create a moving average value of hygiene. I have used 0.4 miles for the averaging disk, which is large enough to collect quite a few establishments at a time but small enough to avoid blurring the whole city together.

My initial intuition proved right. Cowley Road and the area between the city center and the station are areas of poor average hygiene, but there is also a hotspot in the southwest that I can’t explain. The best average hygiene is in the north, Walton Manor to Summertown, which are the expensive parts of Oxford and the Headington area.

Which councils are failing to protect us?

I am happy that the data is plausible and I have understood it, but there is another issue we must consider before going for our answers: data quality. While the Food Hygiene Rating Scheme is controlled by the national Food Standards Agency, it is operated by over 400 different local authorities. Are they all doing a consistent job? One of the promised benefits of open data is that we can hold our governments accountable—so let’s do that. This is the kind of analysis that I hope central government is doing too.

We can easily look at who is on top of the workload by counting the fraction of businesses that are not yet rated.

Unrated establishments

So if you eat out in North Norfolk, you might be nervous to discover that nearly 25% of establishments have never been inspected.

Suspicious in another way is that around a third of the authorities have inspected every business. That would be great if it were true, but since new businesses must open regularly, you would expect to find a few that are awaiting inspection, so this may just indicate that these authorities don’t record (or perhaps even know about) new establishments until they are inspected.

Time since rating

We can also see how often the average establishment is inspected. The best authorities inspect establishments at least once per year.

But alarmingly, Croydon has an average time since inspection of over 3.5 years. A lot can change in that time.

I can’t see an easy way to measure if the different authorities are applying the rules in a consistent way when they do inspect, so I am just going to have to trust that the values are equivalent.

Regional differences

So back to our original questions. First I am going to throw out all data that does not have a numerical rating. Unfortunately, this excludes Scotland, which runs a different scheme that provides only a pass-or-fail-type conclusion.

We still have plenty of data to work with…

The good news is that most establishments are “Good” or “Very good.”

The average rating value across the country is 4.37.

Here is a quick map of all the 0-rated establishments in the country.

The easiest way to group the data is by the local authority that collected it, since that is stored in every record. By that measure, Newham in London is the worst, with an average rating of 3.4.

And the best is Torridge in Devon at 4.86.

But we can use the "PostCode" key to be much more precise. A full UK postcode is shared by around 15 properties. That is too fine grained, as we will find a lot of postcodes with only one restaurant. We need a collection to infer anything about a neighborhood, so I will use only the first part of the postcode, and throw out all postcodes that do not contain at least 10 establishments.

Finally, I hooked up a postcode API to translate back from the partial postcode to a location name.

The result puts E13, in East London, at the bottom of the list, with adjacent postcodes E12, E7, E8, and E15 also on the list. Indeed, nearly all of the worst postcodes are parts of London, apart from a few Birmingham postcodes.

Topping the best hygiene-rated postcodes is Craigavon in Northern Ireland, with a perfect score.

Regional trends

Can we infer some long-distance trends? For the whole country, we have lots of data and are not looking for very small features. There is a much faster method than the one I used on Oxford. Essentially, by aggregating over square regions rather than circular, I can round each geo position once, rather than having to test it repeatedly for membership of the region. I round all the locations to the nearest 20 miles and then aggregate all the points that now share the same location. I then repeat the process, shifting the box centers by 5 miles to create a moving average square. The Wolfram Language code is attached in a CDF at the bottom of the blog. Here is the result.

So there is an unhygienic center in London (as we already saw) that spreads toward Birmingham (going around north Oxfordshire) before turning east at Manchester until it reaches Hull. There is another notable low area in South Wales around, but not centered on, Cardiff. Generally, rural areas appear to be more hygienic, particularly North Devon, North Wales, and East Cumbria.

What kind of establishments are least hygienic?

Enough regional anthropology. Let’s consider what kind of food is safe. The analysis of the "BusinessType" key is reassuringly predictable. Fast food is the worst; schools and hospitals are the best.

We can drill deeper by inferring something about the food from the business name. Here is a function to measure the average hygiene rating for all establishments containing a particular word.

To reduce the search and ensure enough data for conclusions, I will pick out a list of all words that appear in at least 100 different business names.

And now for each word, we calculate the average rating for businesses using that word in their names.

Amusingly, “lucky” appears on the list of the worst word associations. The worst is “halal.” With the exception of Dixy (which appears to mostly be linked to a chain), they are words associated with small, independent businesses.

We can see it more easily, though less precisely, as a WordCloud of the 80 worst-rated words.

The words associated with the best ratings are mostly large chains, who presumably can put more effort into good management processes. At the top of the list is the Japanese-inspired restaurant chain Wagamama, followed by upmarket supermarket chain Waitrose. There are also some school- and hospital-related words.

Of course, none of this necessarily has anything to do with how good the food tastes, and it is unproven whether there is any link between satisfying the food inspectors and making safe food.

If you really care about food hygiene, then the best advice is probably just to never be rude to the waiter until after you have gotten your food!

Download this post as a Computable Document Format (CDF) file. New to CDF? Get your copy for free with this one-time download.