Still using the original, plain ole’ implementation of SIFT by David Lowe?

Well, according to Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval, you’re selling yourself (and your accuracy) short by using the original implementation.

Instead, you should be utilizing a simple extension to SIFT, called RootSIFT, that can be used to dramatically increase object recognition accuracy, quantization, and retrieval accuracy.

Whether you’re matching descriptors of regions surrounding keypoints, clusterings SIFT descriptors using k-means, or building a bag of visual words model, the RootSIFT extension can be used to improve your results.

Best of all, the RootSIFT extension sits on top of the original SIFT implementation and does not require changes to the original SIFT source code.

You do not have to recompile or modify your favorite SIFT implementation to utilize the benefits of RootSIFT.

So if you’re using SIFT regularly in your computer vision applications, but have yet to level-up to RootSIFT, read on.

This blog post will show you how to implement RootSIFT in Python and OpenCV — without (1) having to change a single line of code in the original OpenCV SIFT implementation and (2) without having to compile the entire library.

Sound interesting? Check out the rest of this blog post to learn how to implement RootSIFT in Python and OpenCV.

Looking for the source code to this post? Jump Right To The Downloads Section

OpenCV and Python versions:

In order to run this example, you’ll need Python 2.7 and OpenCV 2.4.X.

Why RootSIFT?

It is well known that when comparing histograms the Euclidean distance often yields inferior performance than when using the chi-squared distance or the Hellinger kernel [Arandjelovic et al. 2012].

And if this is the case why do we often use the Euclidean distance to compare SIFT descriptors when matching keypoints? Or clustering SIFT descriptors to form a codebook? Or quantizing SIFT descriptors to form a bag of visual words?

Remember, while the original SIFT papers discuss comparing descriptors using the Euclidean distance, SIFT is still a histogram itself — and wouldn’t other distance metrics offer greater accuracy?

It turns out, the answer is yes. And instead of comparing SIFT descriptors using a different metric we can instead modify the 128-dim descriptor returned from SIFT directly.

You see, Arandjelovic et al. suggest a simple algebraic extension to the SIFT descriptor itself, called RootSIFT, that allow SIFT descriptors to be “compared” using a Hellinger kernel — but still utilizing the Euclidean distance.

Here is the simple algorithm to extend SIFT to RootSIFT:

Step 1: Compute SIFT descriptors using your favorite SIFT library.

Compute SIFT descriptors using your favorite SIFT library. Step 2: L1-normalize each SIFT vector.

L1-normalize each SIFT vector. Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2-normalized.

That’s it!

It’s a simple extension. But this little modification can dramatically improve results, whether you’re matching keypoints, clustering SIFT descriptors, of quantizing to form a bag of visual words, Arandjelovic et al. have shown that RootSIFT can easily be used in all scenarios that SIFT is, while improving results.

In the rest of this blog post, I’ll show you how to implement RootSIFT using Python and OpenCV. Using this implementation, you’ll be able to incorporate RootSIFT into your own applications — and improve your results!

Implementing RootSIFT in Python and OpenCV

Open up your favorite editor, create a new file and name it rootsift.py , and let’s get started:

# import the necessary packages import numpy as np import cv2 class RootSIFT: def __init__(self): # initialize the SIFT feature extractor self.extractor = cv2.DescriptorExtractor_create("SIFT") def compute(self, image, kps, eps=1e-7): # compute SIFT descriptors (kps, descs) = self.extractor.compute(image, kps) # if there are no keypoints or descriptors, return an empty tuple if len(kps) == 0: return ([], None) # apply the Hellinger kernel by first L1-normalizing and taking the # square-root descs /= (descs.sum(axis=1, keepdims=True) + eps) descs = np.sqrt(descs) #descs /= (np.linalg.norm(descs, axis=1, ord=2) + eps) # return a tuple of the keypoints and descriptors return (kps, descs)

The first thing we’ll do is import our necessary packages. We’ll use NumPy for numerical processing and cv2 for our OpenCV bindings.

We then define our RootSIFT class on Line 5 and the constructer on Lines 6-8. The constructor simply initializes the OpenCV SIFT descriptor extractor.

The compute function on Line 10 then handles the computation of the RootSIFT descriptor. This function requires two arguments and an optional third argument.

The first argument to the compute function is the image that we want to extract RootSIFT descriptors from. The second argument is the list of keypoints, or local regions, from where the RootSIFT descriptors will be extracted. And finally, an epsilon variable, eps , is supplied to prevent any divide-by-zero errors.

From there, we extract the original SIFT descriptors on Line 12.

We make a check on Lines 15 and 16 — if there are no keypoints or descriptors, we simply return an empty tuple.

Converting the original SIFT descriptors to RootSIFT descriptors takes place on Lines 20-22.

We first L1-normalize each vector in the descs array (Line 20).

From there, we take the square root of each element in the SIFT vector (Line 21).

Lastly, all we have to do is return the tuple of keypoints and RootSIFT descriptors to the calling function on Line 25.

Running RootSIFT

To actually see RootSIFT in action, open up a new file, name it driver.py , and we’ll explore how to extract SIFT and RootSIFT descriptors from images:

# import the necessary packages from rootsift import RootSIFT import cv2 # load the image we are going to extract descriptors from and convert # it to grayscale image = cv2.imread("example.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # detect Difference of Gaussian keypoints in the image detector = cv2.FeatureDetector_create("SIFT") kps = detector.detect(gray) # extract normal SIFT descriptors extractor = cv2.DescriptorExtractor_create("SIFT") (kps, descs) = extractor.compute(gray, kps) print "SIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape) # extract RootSIFT descriptors rs = RootSIFT() (kps, descs) = rs.compute(gray, kps) print "RootSIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape)

On Lines 1 and 2 we import our RootSIFT descriptor along with our OpenCV bindings.

We then load our example image, convert it to grayscale, and detect Difference of Gaussian keypoints on Lines 7-12.

From there, we extract the original SIFT descriptors on Lines 15-17.

And we extract the RootSIFT descriptors on Lines 20-22.

To execute our script, simply issue the following command:

$ python driver.py

Your output should look like this:

SIFT: kps=1006, descriptors=(1006, 128) RootSIFT: kps=1006, descriptors=(1006, 128)

As you can see, we have extract 1,006 DoG keypoints. And for each keypoint we have extracted 128-dim SIFT and RootSIFT descriptors.

From here, you can take this RootSIFT implementation and apply it to your own applications, including keypoint and descriptor matching, clustering descriptors to form centroids, and quantizing to create a bag of visual words model — all of which we will cover in future posts.

Summary

In this blog post, I showed you how to extend the original OpenCV SIFT implementation by David Lowe to create the RootSIFT descriptor, a simple extension suggested by Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval.

The RootSIFT extension does not require you to modify the source of your favorite SIFT implementation — it simply sits on top of the original implementation.

The simple 4-step 3-step process to compute RootSIFT is:

Step 1: Compute SIFT descriptors using your favorite SIFT library.

Compute SIFT descriptors using your favorite SIFT library. Step 2: L1-normalize each SIFT vector.

L1-normalize each SIFT vector. Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2 normalized

No matter if you are using SIFT to match keypoints, form cluster centers using k-means, or quantize SIFT descriptors to form a bag of visual words, you should definitely consider utilizing RootSIFT rather than the original SIFT to improve your object retrieval accuracy.