Let’s start with a short problem description. Among other things, snapADDY builds products that scan and parse business cards. The first step, of course, is to take a photo of the business card in question. This is usually done out in the field, using our apps on a smartphone or tablet.

However, people sometimes tend to take blurry images (e.g. when they’re in a hurry etc.). This results in our OCR engine not being able to properly recognize the text on the card, which in turn screws up the whole subsequent recognition pipeline. To avoid this problem, we decided to add a blur detection to our apps’ camera view and immediately warn the user if he or she took a blurry photo.

Goal: Provide instant feedback if a user takes a blurry photo.

In this project, we followed a two-step development approach:

first, build a prototype in Python (since it has nice libraries for image processing and machine learning readily available)

second, port what’s necessary for production to JavaScript (since our apps are built with Ionic)

Image Processing

Step one, before even thinking about algorithms and implementations, was to get our hands on test data. In my experience, this is a best practice that should be observed at any research & development task:

Try to obtain realistic test data as early as possible.

In our case, we collected a test set of 25 business cards and took two photos of each of them: a blurry one and a sharp one. We did this using a DSLR camera and manually setting the focus. Let’s call this test set synthetic data, since it was artificially produced under something like “laboratory conditions.”

Taking test photos of business cards with blurry focus and difficult light settings. Lenovo product placement unintentional.

In addition, one of our customers kindly provided 66 photos of business cards that were taken out in the field.

With the test data available, we started experimenting with a couple of standard algorithms from computer vision. There is a myriad of different algorithms for blur (or edge) detection in the literature; we decided to keep it simple and focus on the well-known Laplace and Sobel filters.

The basic approach is this:

use Laplace (or Sobel) filter to find edges in the input image compute the variance and the maximum over the pixel values of the filtered image high variance (and a high maximum) suggest clearly distinguished edges, i.e. a sharp image. Low variance suggests a blurred image

Implementations of both filters are available in Python via the scikit-image package. With image loading, resizing and grayscaling, we arrive at the following short script:

Edge detection via the Sobel filter can be done similarly (by using skimage.filters.sobel instead of laplace). We chose to downscale the image to a fixed resolution for comparability between various images and lower running times in production.

However, a crucial question remains:

What is the right threshold to distinguish sharp from blurry images (based on the computed values)?

This is where machine learning comes into play.

Machine Learning

With the code above, we are able to compute the variance and maximum values based on Laplace and Sobel filter for any given image. Below, you see two photos of a business card from our synthetic data set, together with the computed values.

Two photos of the same business card from our test set, one being sharp and the other blurry.

There is a clear difference between the sharp and the blurred image in all four measure. This is a good sign: it seems like these measures can be used as features for discriminating between the two classes of images (blurry and non-blurry). Let’s check for all images in the synthetic data set: