1. Introduction

When talking about distances, we usually mean the shortest : for instance, if a point X is said to be at distance D of a polygon P, we generally assume that D is the distance from X to the nearest point of P. The same logic applies for polygons : if two polygons A and B are at some distance from each other, we commonly understand that distance as the shortest one between any point of A and any point of B. Formally, this is called a minimin function, because the distance D between A and B is given by :

(eq. 1)

This equation reads like a computer program : « for every point a of A, find its smallest distance to any point b of B ; finally, keep the smallest distance found among all points a ».

That definition of distance between polygons can become quite unsatisfactory for some applications ; let's see for example fig. 1. We could say the triangles are close to each other considering their shortest distance, shown by their red vertices. However, we would naturally expect that a small distance between these polygons means that no point of one polygon is far from the other polygon. In this sense, the two polygons shown in fig. 1 are not so close, as their furthest points, shown in blue, could actually be very far away from the other polygon. Clearly, the shortest distance is totally independent of each polygonal shape.

Figure 1 : The shortest distance doesn't consider the whole shape.



Another example is given by fig. 2, where we have the same two triangles at the same shortest distance than in fig. 1, but in different position. It's quite obvious that the shortest distance concept carries very low informative content, as the distance value did not change from the previous case, while something did change with the objects.

Figure 2 : The shortest distance doesn't account for the position of the objects.



As we'll see in the next section, in spite of its apparent complexity, the Hausdorff distance does capture these subtleties, ignored by the shortest distance.



2. What is Hausdorff distance ?

Named after Felix Hausdorff (1868-1942), Hausdorff distance is the « maximum distance of a set to the nearest point in the other set » [Rote91]. More formally, Hausdorff distance from set A to set B is a maximin function, defined as

(eq. 2)

where a and b are points of sets A and B respectively, and d(a, b) is any metric between these points ; for simplicity, we'll take d(a, b) as the Euclidian distance between a and b. If for instance A and B are two sets of points, a brute force algorithm would be :





Brute force algorithm : 1. h = 0

2. for every point a i of A,

2.1 shortest = Inf ;

2.2 for every point b j of B

d ij = d (a i , b j )

if d ij < shortest then

shortest = d ij

2.3 if shortest > h then

h = shortest

Figure 3 : Hausdorff distance on point sets.



This is illustrated in fig. 3 : just click on the arrow to see the basic steps of this computation. This algorithm obviously runs in O(n m) time, with n and m the number of points in each set.



It should be noted that Hausdorff distance is oriented (we could say asymmetric as well), which means that most of times h(A, B) is not equal to h(B, A). This general condition also holds for the example of fig. 3, as h(A, B) = d(a1, b1), while h(B, A) = d(b2, a1). This asymmetry is a property of maximin functions, while minimin functions are symmetric.



A more general definition of Hausdorff distance would be :

H (A, B) = max { h (A, B), h (B, A) } (eq. 3)

which defines the Hausdorff distance between A and B, while eq. 2 applied to Hausdorff distance from A to B (also called directed Hausdorff distance). The two distances h(A, B) and h(B, A) are sometimes termed as forward and backward Hausdorff distances of A to B. Although the terminology is not stable yet among authors, eq. 3 is usually meant when talking about Hausdorff distance. Unless otherwise mentionned, from now on we will also refer to eq. 3 when saying "Hausdorff distance".

If sets A and B are made of lines or polygons instead of single points, then H(A, B) applies to all defining points of these lines or polygons, and not only to their vertices. The brute force algorithm could no longer be used for computing Hausdorff distance between such sets, as they involve an infinite number of points.

So, what about the polygons of fig. 1 ? Remember, some of their points were close, but not all of them. Hausdorff distance gives an interesting measure of their mutual proximity, by indicating the maximal distance between any point of one polygon to the other polygon. Better than the shortest distance, which applied only to one point of each polygon, irrespective of all other points of the polygons.







Figure 4 : Hausdorff distance shown around extremum of each triangles of fig. 1. Each circle has a radius of H( P 1 , P 2 ).



The other concern was the insensitivity of the shortest distance to the position of the polygons. We saw that this distance doesn't consider at all the disposition of the polygons. Here again, Hausdorff distance has the advantage of being sensitive to position, as shown in fig.5.







Figure 5 : Hausdorff distance for the triangles of fig. 4 at the same shortest distance, but in different position.



3. Computing Hausdorff distance between convex polygons 3.1 Assumptions

Throughout the rest of our discussion, we assume the following facts about polygons A and B :

Polygons A and B are simple convex polygons ;





Polygons A and B are disjoint from each other, that is :



- they don't intersect together ;

- no polygon contains the other.









3.2 Lemmas

The algorithm explained in the next section is based on three geometric observations, presented here. In order to simplify the text, we assume two points a and b that belong respectively to polygons A and B, such that :





d (a, b) = h (A, B)





In simple words, a is the furthest point of polygon A relative to polygon B, while b is the closest point of polygon B relative to polygon A.

Lemma 1a : The perpendicular to ab at a is a supporting line of A, and A is on the same side as B relative to that line. Proof of lemma 1a





Lemma 1b : The perpendicular to ab at b is a supporting line of B, and a and B are on different sides relative to that line. Proof of lemma 1b





Lemma 2 : There is a vertex x of A such that the distance from x to B is equal to h (A, B). Proof of lemma 2





Lemma 3 : Let b i be the closest point of B from a vertex a i of A. If µ is the moving direction (clockwise or counterclockwise) from b i to b i+1 then, for a complete cycle through all vertices of A, µ changes no more than twice. Proof of lemma 3







3.3 Algorithm

The algorithm presented here was proposed by [Atallah83]. Its basic strategy is to compute successively h(A,B) and h(B, A) ; because of lemma 2, there is no need to query every point of the starting polygon, but only its vertices.





An important fact used by this algorithm is that a closest point can only be a vertex of the target polygon, or the foot z of a line perpendicular to one of its edges.



This fact suggests a function to check for the existence of a possible closest point. Given a source point a and a target edge defined by a point b 1 and a vertex b 2 :



Function z = CheckForClosePoint (a, b 1 , b 2 ) :



Compute the position z where the line that passes through b 1 and b 2 crosses its perpendicular through a ;

if z is between b 1 b 2 then return z ;

else compute at b 2 a line P perpendicular to the line ab 2 ;

if P is a supporting line of B then return b 2 ;

else return NULL.







That function obviously uses lemma 1b to decide whether or not the closest point of B might be located on the target edge, that should be close to a. It also supposes that the source point a and b 2 are not located on different sides of the perpendicular to [b 1 b 2 ] at b 1 , accordingly to lemma 3.



Now we are ready for the main algorithm ; the vertices of both polygons are presumed to be enumerated counterclockwise :







Algorithm for computing h(A, B) : 1. From a 1 , find the closest point b 1 and compute d 1 = d ( a 1 , b 1 )

2. h(A, B) = d 1

3. for each vertex a i of A,

3.1 if a i+1 is to the left of a i b i

find b i+1 , scanning B counterclockwise with CheckForClosePoint from b i

if a i+1 is to the right of a i b i

find b i+1 , scanning B clockwise with CheckForClosePoint from b i

if a i+1 is anywhere on a i b i

b i+1 = b i

3.2 Compute d i+1 = d (a i+1 , b i+1 )

3.3 h (A, B) = max { h (A, B), d i+1 }



Complexity

If polygons A and B respectively have n and m vertices, then :

Step 1 can clearly be done in O(m) time ;



Step 2 takes constant time O(1) ;



Step 3 will be executed (n-1) times, that is O(n) ;



Step 3.1 will not be executed in total more than O(2m). This is a consequence of lemma 3, which guarantees that polygon B can not be scanned more than twice ;



more than O(2m). This is a consequence of lemma 3, which guarantees that polygon B can not be scanned more than twice ; Steps 3.2 and 3.3 are done in constant time O(1) ;



O(m) + O(n) + O(2m) = O(n+m)



To find H(A, B), the algorithm needs to executed twice ; the total complexity for computing Hausdorff distance then stays linear to O(n+m).





3.5 Interactive applet

This applet illustrates the algorithm for computing h(A,B). You only need to draw two polygons, and then press the "step" or "run" button. Left click to define a new vertex, and close the polygon by clicking near the first vertex. Polygon A is the first one you draw, in green, while polygon B appears next, in red.



The algorithm was slightly modified to make it more appealing visually. Even if this algorithm is intended for two polygons totally separated from each other, it also works when B is inside A. However, it won't work if A is inside of B, or when A and B are partially intersecting. You're allowed anyway to try these cases to see what happens !



When defining your polygons, you will see a yellow area that indicates where you can add the next vertex, so the polygon keeps convex. The applet won't let you define a non-convex polygon.



Please notice that the first time you draw the second half of a polygon, you will have to wait a few seconds until the Jama package loads.







Java Applet

4. Application examples

One of the main application of the Hausdorff distance is image matching, used for instance in image analysis, visual navigation of robots, computer-assisted surgery, etc. Basically, the Hausdorff metric will serve to check if a template image is present in a test image ; the lower the distance value, the best the match. That method gives interesting results, even in presence of noise or occlusion (when the target is partially hidden).



Say the small image below is our template, and the large one is the test image :





We want to find if the small image is present, and where, in the large image. The first step is to extract the edges of both images, so to work with binary sets of points, lines or polygons :





Edge extraction is usually done with one of the many edge detectors known in image processing, such as Canny edge detector, Laplacian, Sobel, etc. After applying Rucklidge's algorithm that minimizes Hausdorff distance between two images, the computer found a best match :







For this example, at least 50 % of the template points had to lie within 1 pixel of a test image point, and vice versa. Some scaling and skew were also allowed, to prevent rejection due to a different viewing angle of the template in the test image (these images and results come from Michael Leventon's pages). Other algorithms might allow more complicated geometric transformations for registering the template on the test image.



In spite of my interest for the topic, an online demo is definitely beyond the scope of this Web project ! So here are some Web resources about image matching with Hausdorff distance :





Glossary

References