When I first started learning about Scale Invariance, I assumed we could just run the Harris Corner detector across the different levels in a scale space to find invariant features. And, we can — this specific method is called the Harris-Laplace Corner detector. But as I researched and learned more about Scale Invariant detectors, there was one method that kept coming up over and over again. This method is the Difference of Gaussian (DoG) detector.

I found the Difference of Gaussian (DoG) technique really interesting for a couple of reasons. While the Harris Corner detector focuses on, well, corners, the DoG technique identifies blobs. Blobs are a more amorphous region of pixels that share something in common, such as intensity. I also noticed that the DoG technique is often referenced in papers that deal with Feature Descriptors, which we will learn about later. The last bit worth mentioning is that DoG identifies both edges and blob features in the same operation.

Start with Gaussian Scale

To better understand how the Difference of Gaussian works, let’s consider the following example. We are looking at different levels in a Gaussian Scale Space. In other words, these images are blurred with a Gaussian Blur at different sigmas.

The same image with Gaussian blurs of varying strengths (Original Image Source: Carlos Spitzer)

Let’s pick this apart a little bit. If you look closely at the above images, you can probably point out some details (such as the eye) that appears clearly all three images. These details are visible at all of the above sigmas. Now, consider details (such as the tiny feathers around the beak) that you can only see in the original image or the center image — but cannot see well in the rightmost image. These details are only visible at that specific sigma. This is because, as we discussed previously, the sigma represents scale. Some of the details are larger and are visible even with stronger blurs and other details are smaller and are only visible with weaker blurs. In other words, there is a correlation between the strength of the blur, which is represented by the sigma, and the details or features that are visible at that sigma. The sigma is a scale factor that gives us a clue as to which features are visible at a particular scale.

Subtract Next

Let’s look at how we can identify which details are visible at a particular scale in a way that the computer can understand. Think back to when we first discussed Image Processing. We discussed that we can do simple math operations on images. For this next section, we’re now going to focus on image subtraction.

When we subtract one image from another, the result is an image representing the differences between the two source images. When the same pixel on the two source images are the same, the result is 0 (Black). When the two source images are different, the result is a value closer to white the greater the difference between two source images. Check out the following example:

When the pixels are the same on both the source images, the result is a black pixel. When they are different, the result is a white pixel.

Any black pixels in the result image indicates that the corresponding pixel values are the same on the source images. With this simple idea in mind, we can see how this can help us find features that are visible in one sigma and not another. Instead of two arbitrary images, we are going to use the same image blurred at two different sigmas and then evaluate the results. Features that are visible in both images, will appear black while features that are only visible in one sigma, will show up as white. The difference between each set of images are the details that are visible at a given scale.

White pixels in the resulting images are details that are only visible at sigma 0.7 and no longer visible at sigma 1 (Original Image Source: Carlos Spitzer)

Observant readers will notice that the result image resembles the output from the Sobel operation we discussed when talking about Edge Detection. That’s because the Difference of Gaussian extracts Edges as well as Blobs.

We can repeat this process on another two sets of sigmas as seen below.

White pixels in the resulting images are details that are only visible at sigma 2 and no longer visible at sigma 2.8 (Original Image Source: Carlos Spitzer)

The first set of images are at sigma 0.7 and 1 and the resulting image contains a lot of fine details in the feathers. The second set of details are between sigma 2 and 2.8. You can see that a lot of the feather detail is no longer prominent at sigma 2. Larger details such as the eye, is clearly visible in both 0.5 and 2 sigma.

The magic here is that the Gaussian blur “smooths” out the image. Areas that have very little change in contrast appear similar despite one image having a stronger blur than the other. When there is little change in an area, the result of the subtraction is closer to zero (black). In high contrast areas (edges, blobs) the strength of the blur has a bigger impact. The greater the contrast, the more resilient the area is to a Gaussian at a lower sigma.

We repeat this process numerous times to generate differences at various scales.

When looking for features, we need to work at different scales. To achieve this, we repeatedly subtract adjacent scales to produce many Difference of Gaussian results. (Original Image Source: Carlos Spitzer)

SIDENOTE: The Difference of Gaussian technique evolved from another technique called the Laplacian of Gaussian (LoG). It turns out that subtracting Gaussian images from each other is a speedy estimation of the Laplacian of Gaussian. The Difference of Gaussian technique is a less compute intensive approximation of a Laplacian of Gaussian calculation.

Find the Features

At this point, some of you may have noticed that we have a stack of images resulting from subtracting images at different Gaussian blurs, but we don’t yet have a list of features — as we did when we used the Harris Corner detector. Let’s see how we can turn these images into a list of features we can start working with.

To start, we have a stack of images with Gaussian blurs at different scales (remember, we’re using the Sigma value as a scale value) — let’s make sure we sort the images based on the scale value.

We are going to skip the image at the highest scale and start with the second image. It’s ideal to start at the highest scale because the image will be smallest when it comes to pixel resolution.Next, we are going to look at each of the pixels in this image. For every pixel, we are going to look at it’s neighbors — the eight pixels immediately around a given pixel. We will evaluate if this pixel is the local maxima by comparing its value against its eight neighbor pixels and determining if it is the highest value. Likewise, we determine if this pixel is a local minima if it’s the lowest value compared to its neighbors. Pixel values that are either the local maxima or local minima are referred to as local extrema.

This is where things get really interesting! Once we determine if a particular pixel is a local extrema at a particular scale, we then compare that pixel with the corresponding pixel in the previous scale image and the next scale image along with all of its neighbors.

For example, let’s say that we’ve determined that pixel at position (4,5) is the local extrema for a given scale. We compare the value of our pixel against the value of the pixel in the same location of (4,5) and all of those neighbors in the previous scale. We then do the same for the following scale. All-in-all, we compare each pixel against the eight neighbors in the same scale and nine neighbors in the next and previous scales. If our pixel is still the local extrema when compared to all of its closest neighbors across three scales, then we have an x and y position for this feature along with a scale value.

The pixels that are the local extrema on its own scale and both of the adjacent scales after Differences of Gaussian has been calculated are great feature candidates

We now will repeat this process for the next DoG image scale. This will allow us to find features that are local extremas at each of the scales we started out with.

SIDENOTE: The approach of starting at the smallest resolution (highest scale) and working towards larger resolutions (at lower scales) is referred to as coarse-to-fine searches. This is generally considered a performance optimization since operations on images at a smaller pixel resolution is faster than their larger counterparts.

We’ve made a lot of progress, but we’re still not done with finding our scale invariant features. At this point, we have a list of local maxima that are defined by an x and y position along with a scale value represented by σ. Chances are, our collection of potential feature points includes features which are not robust enough to be useful. The next several steps will help us weed out features which are not robust. In this spirit, there is usually a thresholding algorithm where if the extrema is less than a specific value it is rejected. This allows us to hold onto high-contrast features. The DoG method also has a tendency to highlight edges and, as we’ve already discussed, edges do not make for great features. As a mechanism to remove edge features, most scale-invariant algorithms will usually follow up with a pass that is similar to the Harris Corner detector to remove edge-features and concentrate on features which we can use reliably later on.

Once we have low-contrast and edge features filtered, we should have a robust list of features which include scale.

The middle of each circle is the location of a feature. The size of the circle corresponds to the scale which the feature was found. (Original Image Source: Carlos Spitzer)

The above image illustrates the results of our Difference of Gaussian algorithm. The center of each circle is the location of a feature. The size of the circle corresponds to the scale where the feature was found. The larger circles are pointing out features which are still visible even after the image has been blurred significantly. Smaller circles are showing us features which are only prominent in weaker blurs. Notice that a lot of the bark features are represented by very tiny circles — that’s because the detail in the bark is only visible at the lowest of scales.

Steps to the Difference of Gaussian Blob Detector

Much like we did with the Harris Corner detector, let’s summarize all of the steps involved in detecting scale-invariant features using the Difference of Gaussian Blob detector:

Convert the image to grayscale to focus on the intensity and not on individual color channels Create a Gaussian Image Pyramid by running a gaussian filter and subsampling Repeatedly run a gaussian blur filter through the Gaussian Image Pyramid to create additional scale steps Calculate the difference between each consecutive pair of blurred images Starting with the smallest image (largest scale), determine which pixel values are extremas (smallest and largest values) For every extrema pixel, compare it against neighbors in both the previous and following scales and only keep extremas that remain an extrema after comparing it the adjacent scales Filter out potential feature points where the extrema that do not meet a threshold value Filter out potential feature points that correspond to edges Resulting feature points should be robust and are available in scale-space

TLDR

The Difference of Gaussian (DoG) technique is a highly influential approach to detecting Blob features in a scale invariant manner. The process involves running Gaussian convolutions in various images and then subtracting adjacent images from each other. The resulting image is one that highlights edges and blob features. We then look at all of the pixels across all of the scales to find the local extremas — the pixels which represent the greatest amount of change. The last step is to filter out any unwanted features by using Threshold and edge detection techniques.

Sources and More Info