The following problem appeared in a project in this Computer Vision Course (CS4670/5670, Spring 2015) at Cornell. In this article, a python implementation is going to be described. The description of the problem is taken (with some modifications) from the project description. The same problem appeared in this assignment problem as well. The images used for testing the algorithms implemented are mostly taken from these assignments / projects.

The Problem

In this project, we need to implement the problem of detect discriminating features in an image and find the best matching features in other images. The features should be reasonably invariant to translation, rotation, and illumination. The slides presented in class can be used as reference.

Description

The project has three parts: feature detection, feature description, and feature matching.

1. Feature detection

In this step, we need to identify points of interest in the image using the Harris corner detection method. The steps are as follows:

For each point in the image, consider a window of pixels around that point.

each point in the image, consider a window of pixels around that point. Compute the Harris matrix H for (the window around) that point, defined as where the summation is over all pixels p in the window. is the x derivative of the image at point p, the notation is similar for the y derivative.

for (the window around) that point, defined as where the summation is over all pixels p in the window. The weights are chosen to be circularly symmetric, a 9×9 Gaussian kernel with 0.5 sigma is chosen to achieve this.

are chosen to be circularly symmetric, a Note that H is a 2×2 matrix. To find interest points, first we need to compute the corner strength function

Once c is computed for every point in the image, we need to choose points where c is above a threshold .

. We also want c to be a local maximum in a 9×9 neighborhood (with non-maximum suppression ) .

in . In addition to computing the feature locations , we need to compute a canonical orientation for each feature, and then store this orientation (in degrees) in each feature element.

, we need to compute a canonical orientation for each feature, and then store this orientation (in degrees) in each feature element. To compute the canonical orientation at each pixel, we need to compute the gradient of the blurred image and use the angle of the gradient as orientation.

2. Feature description

Now that the points of interest are identified, the next step is to come up with a descriptor for the feature centered at each interest point . This descriptor will be the representation to be used to compare features in different images to see if they match .

are identified, the next step is to come up with a for the centered at each . This descriptor will be the representation to be used to compare in to see if they . In this article, we shall implement a simple descriptor , a 8×8 square window without orientation. This should be very easy to implement and should work well when the images we’re comparing are related by a translation. We also normalize the window to have zero mean and unit variance, in order to obtain illumination invariance .

, a without orientation. This should be very easy to implement and should work well when the images we’re comparing are related by a translation. We also the window to have zero mean and unit variance, in order to obtain . In order to obtain rotational invariance MOPS descriptor, by taking care of the orientation that is not discussed in this article for the time being.

3. Feature matching

Now that the features in the image are detected and described , the next step is to write code to match them, i.e., given a feature in one image , find the best matching feature in one or more other images.

in the are and , the next step is to write code to them, i.e., given a in one , find the in one or more other images. The simplest approach is the following: write a procedure that compares two features and outputs a distance between them. For example, we simply sum the absolute value of differences between the descriptor elements.

between them. For example, we simply sum the absolute value of differences between the descriptor elements. We then use this distance to compute the best match between a feature in one image and the set of features in another image by finding the one with the smallest distance. The distance used here is the Manhattan distance.

The following theory and math for the Harris Corner Detection will be used that’s taken from this youtube video.

The following figure shows the structure of the python code to implement the algorithm.

Feature Detection (with Harris Corner Detection): Results on a few images

The threshold to be used for the Harris Corner Detection is varied (as shown in the following animations in red, with the value of the threshold being 10^x , where x is shown (the common logarithm of the threshold is displayed).

to be used for the is varied (as shown in the following animations in red, with the value of the threshold being , where x is shown (the common logarithm of the threshold is displayed). The corner strength function with kappa=0.04 is used instead of the minimum eigenvalue (since it’s faster to compute).

As can be seen from the following animations, lesser and lesser corner features are detected when the threshold is increased.

The direction and magnitude of the feature is shown by the bounding (green) square’s angle with the horizontal and the size of the square respectively, computed from the gradient matrices.

Input Image

Harris Corner Features Detected for different threshold values (log10)



Input Image



The following figure shows the result of thresholding on

the Harris corner strength R values and the minimum eigenvalue for the Harris matrix respectively,

for each pixel, before applying non-maximum suppression (computing the local maximum).

The next animation shows the features detected after applying non-maximum suppression, with different threshold values.

Harris Corner Features Detected for different threshold values (log10)



Input Image



Harris Corner Features Detected for different threshold values (log10)

Input Image



Harris Corner Features Detected for different threshold values (log10)

Input Image



Harris Corner Features Detected for different threshold values (log10)



Computing Feature descriptors

In this article, we shall implement a very simple descriptor , a 8×8 square window without orientation. This is expected to work well when the images being compared are related by a translation .

, a without orientation. This is expected to work well when the images being compared are related by a . We also normalize the window to have zero mean and unit variance, in order to obtain illumination invariance .

the window to have zero mean and unit variance, in order to obtain . In order to obtain rotational invariance MOPS descriptor, by taking care of the orientation that is not discussed in this article for the time being.

Matching Images with Detected Features: Results on a few images

First the Harris corner features and the simple descriptors are computed for each of the images to be compared.

and the are computed for each of the images to be compared. Next, distance between each pair of corner feature descriptors is computed, by simply computing the sum the absolute value of differences between the descriptor elements.

between each pair of corner feature descriptors is computed, by simply computing the the between the elements. This distance is then used to compute the best match between a feature in one image and the set of features in another image by finding the one with the smallest distance.

is then used to compute the between a feature in one image and the set of features in another image by finding the one with the The following examples show how the matching works with simple feature descriptors around the Harris corners for images obtained using mutual translations.

Input images (one is a translation of the other)



Harris Corner Features Detected for the images



Matched Features with minimum sum of absolute distance

Input images



Harris Corner Features Detected for the images

Matched Features with minimum sum of absolute distance



The following example shows the input images to be compared being created more complex transformations (not only translation) and as expected, the simple feature descriptor does not work well in this case, as shown. We need some feature descriptors like SIFT to obtain robustness against rotation and scaling too.

Input images



Harris Corner Features Detected for the images



Matched Features with minimum sum of absolute distance

