We’ve all seen those letter grades affixed to the doors of our favorite restaurants. While I don’t give them a second look, I know people who are nervous to eat at restaurants that are anything but A’s. So I imagine, anecdotally, that they can affect business quite a bit.

But where do these grades come from? It turns out the system is a lot like school grading where letter grades match to more granular number grades underneath. First, an inspector goes over to the restaurant and looks for any violations. Next, the points associated with each violation are summed up; the more violations, the higher the scores. Lastly, an appropriate letter grade is given. If you want an A, you need 13 points or less. Want a B, try 14-27 violation points. And a C comes from even more violation points. Seems pretty straightforward, right?

Well, wait a minute. There is one more thing about school grades that you may remember: the ability of a teacher to use a bit of discretion when setting a final grade. When a student is just on the cusp of a higher letter grade, let’s say an 89, some teachers might bump the grade up to an A from a B+, using things like class participation, etc. After all, we all know how bad it feels when you are just below the cusp. Many teachers don’t want to put students there.

Well, it turns out that our Health Inspectors don’t want to either. The histogram below shows that inspectors don’t like giving out the highest possible B. A’s are in Green, B’s blue and C’s red:

In fact, since the start of the current grading system, 3 times more restaurants have been given a score of 13 when compared to 14. Given the way inspections work, this does not make a lot of sense. It would be extremely difficult for a restaurant to do just enough to get by with that sort of precision.

One of the only explanations I could come up with is that the inspectors are using discretion. They seem to be turning a blind eye towards that last violation that would put a restaurant over the edge. Why does this matter? Well, do you really want the grade of the restaurant to depend on how nice the inspector is feeling, or if a restaurant gets a visit from a “mean” inspector or “nice” one? This is public health. When a system has as high stakes as this one, it should be done right.

It’s not surprising that there are some inspectors like this. What is shocking, at least to me, is how widespread this is. To determine how often scores seem to be adjusted, I assumed that if all were fair there should be a fairly flat count between 12 and 16, namely about 7,200 inspections should fall within each of these score buckets. (7,200 was chosen based on drawing a smooth curve in the histogram.) If that’s the case, there should be about 21,600 inspections graded B in that range , split between 14,15 and 16. (7,200 x 3). In fact, there were only 13,029, about 40% lower count than expected. And we can see on the graph exactly where those restaurants ended up: They are A’s with 12 and 13.

How could we prevent this? Well, for one, you could have two inspectors visit each restaurant and independently tabulate scores of different things, and then add them at the end. This would prevent each inspector from knowing that a restaurant is right on the cusp, and allow them to do their job properly. Another option is for the Department of Health to look at these distributions for each inspector, and figure out which ones are doing this.

There are those that will say that being lenient is a good thing. My response would be that we should then loosen the letter grading a bit to make it easier to get an A, and then standardize the inspections to keep them in line with one another. No one wants randomness in the process.

So the data tells us that the current grading system has introduced some serious issues, and I hope the city works to rectify them.

Now, as a bonus, I plotted median health scores in each zip code and mapped it out. Enjoy.

(The data above only consists of only those inspections where it has been at least 30 days since the last inspection, to avoid the immediate reinspections of B’s and C’s that occur. Map was made with QGIS. Histogram was made in IPython, with Pandas. Health inspection data came from here. Zip Code data came from here.)