Thus, downsampling categorical labels consists of defining windows on an image and selecting an exemplar from that block. A common method is to choose the exemplar by picking among the most frequent pixels in a block, also known as finding the mode. The most obvious means of accomplishing this involves counting the frequency of each label, which is easily accomplished in a high performance language like C. However, Python loops are very slow which makes this method untenable without the use of C extensions (Cython) which can make a project more cumbersome to maintain and requires specialized knowledge. Here I present a method COUNTLESS that computes the mode of four unsigned integers without counting along with a Numpy implementation useful for generating 2x downsamples of labeled images.

COUNTLESS Algorithm

The simplest 2D downsampling problem to solve is the four pixel 2x2 image. A 2x2 image can be summarized by its single most frequent pixel to achieve a 2x reduction on each side. Sticking to even sized images for now, a larger image can be divided into 2x2 blocks, and if this process is repeated independently across each block, this will result in an overall 2x reduction of the whole image.

The 2x2 image consists of five problems listed in Figure 1. In 1(a), 1(c), and 1(e), all the pixels are in the most frequent class and are thus valid solutions. 1(b) and 1(d) require a more sophisticated approach. Notably, in all of the five cases, choosing a random pixel is more likely than not to be in the majority, which shows why striding can be an acceptable, though not perfect, approach. The key insight that forms the foundation of the algorithm is that in all cases, if two pixels match, they are in the majority.

Figure 2. The Five Cases. Capital letters A,B,C,D refer to the identity of a non-zero pixel. (a) All the same. (b) Two the same. (c) Two pairs the same. (d) Three the same. (e) All different.

In the following text, capital letters A,B,C,D refer to a pixel location’s non-zero value. We define the comparison operator PICK(A,B) that generates either a real pixel value or zero.

PICK(A,B) := A if A == B else 0 EQN. 1

In frameworks like Python/numpy where True and False are represented as one and zero respectively, this can be implemented numerically as:

PICK(A,B) := A * (A == B) EQN. 2

Next, let’s consider the various interactions between A, B, & C. In the following table, the notation AB means PICK(A,B) and lower case letters refer to a particular value, so a repeated letter ‘a’ in two columns means for example that both pixels are red.

Pixel PICK(X,Y)

A B C AB BC AC

a a a => a a a

a b a => 0 0 a

a a b => a 0 0

b a a => 0 a 0

a b c => 0 0 0 <-- Only fully zeroed row TABLE 1. PICK(X,Y) (denoted XY) interactions between A, B, and C

Table 1 shows that the only the majority pixel or zero of A, B, and C will appear as the result of PICK operations. In the case that A, B, and C are all different, all PICKs will return zero. This makes the problem of pixel selection amenable to a simple logical. A, B, and C all being different corresponds to either case 1(b) or 1(e) in Figure 1, with D being in the majority in the 1(b) case. If the case is 1(b), that means D is an acceptable solution. If the case is 1(e), there is no majority pixel and D is also an acceptable solution.

Therefore, when A, B, or C match, choose the match, and when none of them do, choose D. This can be expressed in a computer language with a short circuiting logical OR (||) as:

MODE(A,B,C,D) := PICK(A,B) || PICK(B,C) || PICK(A,C) || D EQN. 3

We can implement logical OR numerically as:

LOGICAL_OR(X,Y) := X + (X == 0) * Y EQN. 4

EQN. 3 and EQN. 4 will handle all unsigned integer values correctly except for zero. Therefore, in cases where zero is a valid pixel, we can add one to the image at the beginning of the algorithm and subtract one before returning the result.

The 2x2 approach can be easily extended to cover any even dimensioned image. Simply cover the image with non-overlapping 2x2 blocks and solve the mode within each of them to generate the downsampled image. However, we must still deal with odd images, where the edge is not perfectly covered by a 2x2 block.

Luckily, there is a simple solution. For any odd image, mirror the edge to generate an even image. There are two cases: a corner (1x1) and an edge (2x1 or 1x2). Mirroring a corner will generate case 1(a), which will lead to that same pixel being drawn. Mirroring an edge will lead to either case 1(a) if the pixels are the same or else 1(c), both of which will be handled correctly by COUNTLESS.

Implementation

Figure 3. Results of applying COUNTLESS. (a) A source image of 128 x 128 pixels is reduced to (b) an image of 64 x 64 pixels.

Implementation of COUNTLESS in Numpy is straightforward. First the image must be divided into a covering of 2x2 blocks. This is represented by creating four 2D arrays, a, b, c, & d, each conceptually representing its eponymous pixel from Figure 1, but executed in parallel across each 2x2 block in the image. This is accomplished by striding (2,2) offset by (0,0), (0,1), (1,0), and (1,1) from the upper-left corner. Next, we begin computing the result using COUNTLESS. Numpy does not support logical OR, but it does support bitwise OR. Luckily, according to Table 1, the values resulting from PICK(A,B), PICK(A,C), and PICK(B,C) will be either a single, possibly duplicated, value or zero. Therefore, in this special case, bitwise OR behaves the same as logical OR, saving us several operations that would otherwise be required in EQN. 4. Listing 1 shows the implementation:

Listing 1. The simplest implementation of countless that doesn’t handle black pixels.

This implementation will work for most cases, but it has an important failure mode. If the matching pixels are zeros, we’ll choose D by accident as the result will look the same as the last row in Table 1. Unfortunately, this problem can’t be completely eliminated when working with finite integer representations, but we can get very close.

The strategy is to add one to the image before executing COUNTLESS and subtract one after. This turns zeros into ones and makes the algorithm work correctly, but it causes an overflow for the maximum valued integer (255 for uint8s, 65,535 for uint16, et cetera). However, casting to the next largest data type before adding one eliminates the overflow effect (i.e. casting a uint8 array to uint16). On current hardware, this method is feasible up to uint64. The zero problem is completely eliminated for uint8, uint16, uint32, but not uint64. This means the algorithm will fail if your labels include 2⁶⁴-1 which is about 1.84 x 10¹⁹. For many uses, this should be acceptable. Cast back to the original data type after subtracting one. For coding schemes which treat a maximum uint64 as a special flag, simply change the offset sufficiently to account for it.

Listing 2. zero_corrected_countless.py: simplest_countless.py updated to handle black pixels correctly.

One last thing, we’ve added a few operations to account for the zero label, but that hurts performance. We can recover some of it by noticing that ab and ac both multiply by a. We can simplify that multiplication to remove an operation.

Listing 3. zero_corrected_countless.py augmented with an algebraic simplification to slightly improve performance.

Performance

In order to ascertain how fast this algorithm is, a comparison suite was developed and run on a dual core 2.8 GHz, i7 Macbook Pro (circa 2014), 256 kB L2 cache, 4MB L3 cache. While the algorithm was developed for segmentation labels, ordinary photographs are included to demonstrate how the algorithms perform when the data aren’t nicely uniform. Max pooling, averaging, and striding were included for speed comparisons even though they are unsuitable for this task.

I used Python 3.6.2 with numpy-1.13.3 and clang-802.0.42 for the following experiments.

The tested algorithms:

striding: Pick every other pixel.

Pick every other pixel. counting: Count the frequency of each label's occurrence.

Count the frequency of each label's occurrence. simplest_countless: simplest_countless.py

simplest_countless.py quick_countless: simplest_countless.py + algebraic simplification

simplest_countless.py + algebraic simplification zero_corrected_countless: zero_corrected_countless.py

zero_corrected_countless.py countless: countless.py

countless.py countless_if: Using if statements instead of fancy tricks

These algorithms were also tested even though they are inappropriate for handling segmentation to provide a point of comparison for other image processing algorithms:

downsample_with_averaging: Average pixels in the 2x2 window.

Average pixels in the 2x2 window. downsample_with_max_pooling: Pick highest value pixel in 2x2 window.

Pick highest value pixel in 2x2 window. ndzoom: Use the scipy function ndimage.interpolation.zoom to downscale 2x2.

The code used to test the algorithms can be found here.

After this article was published Aleks Zlateski contributed a Knight’s Landing (KNL) vectorized version of bitwise COUNTLESS that is reported to have run at 1 GPx/sec, 4 GB/sec on random int32 data. It can be found on Github.

Trial 1–Segmentation of Neural Tissue

Figure 4. Integer labels representing a segmentation of a neural tissue micrograph.

RGB Segmentation is a 1024x1024 image of pixel labels assigned by a convolutional neural network amusingly looking at neural tissue. This is the kind of image countless was designed with in mind. Each pixel is an RGB triad that taken together represents a single unsigned integer. For example, (R,G,B): (15, 1, 0) represents 271 (15 + 1 * 256).

In Table 2, while it doesn’t fulfill our criteria of choosing the most frequent pixel, striding is clearly the speed demon. It seems to be (sensibly) limited only by memory bandwidth. Naïve counting runs at only 38 kPx/sec, meaning that it takes about 27.6 seconds to compute a single image.

Various versions of the countless algorithm clock in across a wide range from 986 kPx/sec to 38.59 MPx/sec, beating counting handily. countless_if is actually a variation the counting implementation that uses if statements to test if two pixels match. The major differences in performance between the other countless variants depends on whether we handle zero correctly (a 37% difference between countless and quick_countless and 39% between simplest_countless and zero_corrected_countless) and whether a multiplication is simplified away (a 13.8% speedup between simplest_countless and quick_countless and a 15.6% speedup between zero_corrected_countless and countless).

Comparing countless, the fastest comprehensive variant of the algorithm with two other common approaches to downsampling, it comes out to be about 1.7x slower than averaging and 3.1x slower than max pooling. If we were to process images that do not contain zero as a valid pixel, the relative differences would be 1.3x slower and 2.3x slower respectively.