Finding rectangles

This post is based on a part of my masters thesis. The topic of my thesis was OCR of historical documents. A problem that came up there was the following:

Given a binary image, find the largest axis aligned rectangle that consists only of foreground pixels.

These largest rectangles can be used, for instance, to find columns in a page of text. Although in that case one would use large rectangles of background pixels.

Here is an example image,

.

White pixels are background and blue is the foreground. The rectangle with the largest area is indicated in red. The images you encounter in practical application will be much larger than this example, so efficiency is going to be important.

Specification

Let's start with the types of images and rectangles

type Image = [ [ Bool ] ] data Rect = Rect { left , top , width , height :: Int } deriving ( Eq , Ord , Show )

And some properties of them,

imWidth , imHeight :: Image -> Int imHeight = length imWidth ( x : _ ) = length x imWidth [ ] = 0 area , perimeter :: Rect -> Int area rect = width rect * height rect perimeter rect = 2 * width rect + 2 * height rect

I will say that an image 'contains' a rectangle if all pixels inside the rectangle are foreground pixels.

contains :: Image -> Rect -> Bool contains im ( Rect x y w h ) = and pixelsInRect where pixelsInRect = concatMap cols ( rows im ) rows = take h . drop y . ( ++ repeat [ ] ) cols = take w . drop x . ( ++ repeat False )

Now the obvious, stupid, way of finding the largest rectangle is to enumerate all rectangles in the image, and pick the largest from that list:

allRects :: Image -> [ Rect ] allRects im = filter ( im `contains` ) rects where rects = [ Rect x y w h | x <- [ 0 .. iw ] , y <- [ 0 .. ih ] , w <- [ 1 .. iw - x ] , h <- [ 1 .. ih - y ] ] iw = imWidth im ih = imHeight im

For now, I will take 'largest rectangle' to mean one with the maximal area. I will come back to this choice soon.

largestRect spec :: Image -> Rect largestRect spec = maximalRectBy area . allRects maximalRectBy :: Ord a => ( Rect -> a ) -> [ Rect ] -> Rect maximalRectBy f = maximumBy ( comparing f `mappend` compare ) . ( noRect : ) where noRect = Rect 0 0 0 0

The above code should hopefully be easy to understand. It will find the correct answer for the above example:

λ> largestRect spec example Rect { left = 3 , top = 2 , width = 4 , height = 5 }

Of course largestRect spec is horribly slow. In an n by n image there are O(n4) rectangles to consider, and checking if one is contained in the image takes O(n2) work, for a total of O(n6).

What is 'largest'?

Before continuing, let's determine what it means for a rectangle to be the largest. We could compare the area of rectangles, as we did before. But it is equally valid to look for the rectangle with the largest perimeter.

Can we pick the maximum according to any arbitrary function f :: (Rect -> a) ? Not all of these functions will correspond to the intuitive notion of 'largest'. For example f = negate . area will actually lead to the smallest rectangle. In general there is going to be no efficient way of finding the rectangle that maximizes f . All we could do is optimize contains , to get an O(n4) algorithm.

We should therefore restrict f to be monotonic. What I mean by monotonic is that f x >= f y whenever rectangle x contains rectangle y . In QuickCheck code:

prop_isMonotonic :: Ord a => ( Rect -> a ) -> Property prop_isMonotonic f = property $ \ x y -> x `rectContains` y ==> f x >= f y rectContains :: Rect -> Rect -> Bool rectContains ( Rect x y w h ) ( Rect x' y' w' h' ) = x <= x' && y <= y' && x + w >= x' + w' && y + h >= y' + h'

Area is a monotonic function, and so is perimeter. But you could also add weird constraints. For example, only consider rectangles that are at least 10 pixels tall, or only rectangles that contain the point (123,456).

Maximizing a monotonic function, as opposed to just any function, means that we can skip a lot of rectangles. In particular, whenever rectangle x contains rectangle y , rectangle y doesn't need to be considered. I will call rectangles in the image that are not contained in other (larger) rectangles maximal. The strategy for finding the largest rectangle is then simply to enumerate only the maximal rectangles, and pick the best of those:

largestRect fast :: Image -> Rect largestRect fast = maximalRectBy area . allMaximalRects

For each maximal rectangle there is (trivially) a monotonic function that is maximal for that rectangle. So we can't do any better without taking the specific function f into account.

Machinery

To find maximal rectangles, we are first of all going to need some machinery for working with images. In particular, zipping images together,

zip2d :: [ [ a ] ] -> [ [ b ] ] -> [ [ ( a , b ) ] ] zip2d = zipWith zip zipWith2d :: ( a -> b -> c ) -> [ [ a ] ] -> [ [ b ] ] -> [ [ c ] ] zipWith2d = zipWith . zipWith zipWith2d4 :: ( a -> b -> c -> d -> e ) -> [ [ a ] ] -> [ [ b ] ] -> [ [ c ] ] -> [ [ d ] ] -> [ [ e ] ] zipWith2d4 = zipWith4 . zipWith4

And accumulating/scanning over images. This scanning can be done in four directions. Each scanX function takes a function to apply, and the initial value to use just outside the image. The scans that I use here are slightly different from scanl and scanr , because the output will have the same size as the input, instead of being one element larger.

scanLeftward , scanRightward , scanUpward , scanDownward :: ( a -> b -> b ) -> b -> [ [ a ] ] -> [ [ b ] ] scanLeftward f z = map ( init . scanr f z ) scanRightward f z = map ( tail . scanl ( flip f ) z ) scanUpward f z = init . scanr ( \ as bs -> zipWith f as bs ) ( repeat z ) scanDownward f z = tail . scanl ( \ as bs -> zipWith f bs as ) ( repeat z )

Here is an example of a scan that calculates the x-coordinate of each pixel,

let x = scanRightward ( \ a x -> x + 1 ) ( -1 ) a x = .

And the y-coordinates are of course

let y = scanDownward ( \ a x -> x + 1 ) ( -1 ) a y = .

Finding lines

If we were looking for one-dimensional images, then a 'rectangle' would just be a single line of pixels. Now each pixel is contained in at most one maximal line of foreground pixels. To find the coordinates of this line, we just need to know the left and right endpoints.

For a foreground pixel, the left endpoint of the line it is in is the same as the left endpoint of its left neighbor. On the other hand, a background pixel is not in any foreground line. So the left endpoint of all lines to the right of it will be at least x+1 , where x is the x-coordinate of the background pixel. In both these cases information flows from left to right; and so the left endpoint for all pixels can be determined with a rightward scan.

Unsurprisingly, we can find the right endpoints of all foreground lines with a leftward scan. Now let's do this for all lines in the image. Notice that we need the x coordinates defined previously:

let l = scanRightward ( \ ( a , x ) l -> if a then l else x + 1 ) 0 ( zip2d a x ) l = let r = scanLeftward ( \ ( a , x ) r -> if a then r else x ) ( imWidth a ) ( zip2d a x ) r =

In the images I have marked the left and right endpoints of the foreground lines in red. Note also, the values in the background pixels are not important, and you should just ignore them.

Vertically we can of course do the same thing, giving top and bottom endpoints:

let t = scanDownward ( \ ( a , y ) t -> if a then t else y + 1 ) 0 ( zip2d a y ) t = let b = scanUpward ( \ ( a , y ) b -> if a then b else y ) ( imHeight a ) ( zip2d a y ) b =

However, combining these left/right/top/bottom line endpoints does not yet give rectangles containing only foreground pixels. Rather, it gives something like a cross. For example using the endpoints for (6,4) leads to the following incorrect rectangle,

.

In fact, there are many rectangles around this point (6,4):

,

and before looking at the area (or whatever function we are maximizing) there is way no telling which is the best one.

If there was some way to find just a single maximal rectangle for each pixel, then we would have an O(n2) algorithm. Assuming of course that we do find all maximal rectangles.

Finding maximal rectangles

Suppose that Rect x y w h is a maximal rectangle. What does that mean? First of all, one of the points above the rectangle, (x,y-1),(x+1,y-1),..,(x+w-1,y-1) , must not be the a foreground pixel. Because if all these points are foreground, then the rectangle could be extended upwards, and it would not be maximal. So, suppose that (u,y-1) is a background pixel (or outside the image). Then (u,y) is the top endpoint of the vertical line that contains (u,y+h-1) .

If we start from (u,v) , we can recover the height of a maximal rectangle using the top endpoint image t . Just take t!!(u,v) as the top coordinate, and u+1 as the bottom. This image illustrates the idea:

.

Here the green point (u,v) has the red top endpoint, and it gives the height and vertical position of the yellow maximal rectangle.

Then to make this vertical line into a maximal rectangle, we just extend it horizontally as far as possible:

.

For this last step, we need to know the first background pixel that will be encountered when extending the rectangle to the left. That is the maximum value of all left endpoints in the rows t,t+1,..,b-1 . This maximum can again be determined with a scan over the image:

let lt = scanDownward ( \ ( a , l ) lt -> if a then max l lt else minBound ) minBound ( zip2d a l ) lt =

For extending to the right the story is exactly the same, only taking the minimum right endpoint instead:

let rt = scanDownward ( \ ( a , r ) rt -> if a then min r rt else maxBound ) maxBound ( zip2d a r ) rt =

Now we have all the ingredients for finding maximal rectangles:

For a foreground pixel ( u , v ) :

: Take as top t !! ( u , v )

Take as left lt !! ( u , v )

Take as right rt !! ( u , v )

Take as bottom v + 1 .

Every maximal rectangle can be found in this way. However, not all rectangles we get in this way are maximal. In particular, they could potentially still be extended downward. However, for finding the largest rectangle, it doesn't matter if we also see some non-maximal ones. There might also be duplicates, which again does not matter.

So now finishing up is just a matter of putting all the steps together in a function:

allMaximalRects :: Image -> [ Rect ] allMaximalRects a = catMaybes . concat $ zipWith2d4 mkRect lt rt t y where x = scanRightward ( \ _ x -> x + 1 ) ( -1 ) a y = scanDownward ( \ _ y -> y + 1 ) ( -1 ) a l = scanRightward ( \ ( a , x ) l -> if a then l else x + 1 ) 0 ( zip2d a x ) r = scanLeftward ( \ ( a , x ) r -> if a then r else x ) ( imWidth a ) ( zip2d a x ) t = scanDownward ( \ ( a , y ) t -> if a then t else y + 1 ) 0 ( zip2d a y ) lt = scanDownward ( \ ( a , l ) lt -> if a then max l lt else minBound ) minBound ( zip2d a l ) rt = scanDownward ( \ ( a , r ) rt -> if a then min r rt else maxBound ) maxBound ( zip2d a r ) mkRect l r t y | l /= minBound = Just $ Rect l t ( r - l ) ( y - t + 1 ) | otherwise = Nothing

A quick QuickCheck shows that largestRect fast finds the same answer as the slow specification:

prop fast_spec = forAll genImage $ \ a -> largestRect spec a == largestRect fast a

λ> quickCheck prop fast_spec +++ OK , passed 100 tests .

Conclusion

It is possible to find all maximal rectangles that consist entirely of foreground pixels in an n*n image in O(n2) time. That is linear in the number of pixels. Obviously it is not possible to do any better in general.

You may wonder whether this method also works in higher dimensions. And the answer to that question is no. The reason is that there can be more than O(n3) maximal cubes in a three dimensional image. In fact, there can be at least O(n(d-1)2) maximal hypercubes in d dimensions. Just generalize this image to 3D:

. Or click here for a 3D version.