In this tutorial, we will compare the performance of the forEach method of the Mat class to other ways of accessing and transforming pixel values in OpenCV. We will show how forEach is much faster than naively using the at method or even efficiently using pointer arithmetic.

There are hidden gems inside OpenCV that are sometimes not very well known. One of these hidden gems is the forEach method of the Mat class that utilizes all the cores on your machine to apply any function at every pixel.

Let us first define a function complicatedThreshold. It takes in an RGB pixel value and applies a complicated threshold to it.

// Define a pixel typedef Point3_<uint8_t> Pixel; // A complicated threshold is defined so // a non-trivial amount of computation // is done at each pixel. void complicatedThreshold(Pixel &pixel) { if (pow(double(pixel.x)/10,2.5) > 100) { pixel.x = 255; pixel.y = 255; pixel.z = 255; } else { pixel.x = 0; pixel.y = 0; pixel.z = 0; } }

This function is computationally much heavier compared to a simple threshold. This way we are not just testing pixel access time but also how forEach uses all the cores when each pixel operation is computationally heavy.

To access the cpp file described in this post and all code in this blog, please subscribe to our newsletter here

Next, we will go over four different ways of applying this function to every pixel in an image and examine the relative performance.

Method 1 : Naive Pixel Access Using the at Method

The Mat class has a convenient method called at to access a pixel at location (row, column) in the image. The following code uses the at method to access every pixel and applies complicatedThreshold to it.

// Naive pixel access // Loop over all rows for (int r = 0; r < image.rows; r++) { // Loop over all columns for ( int c = 0; c < image.cols; c++) { // Obtain pixel at (r, c) Pixel pixel = image.at<Pixel>(r, c); // Apply complicatedTreshold complicatedThreshold(pixel); // Put result back image.at<Pixel>(r, c) = pixel; } }

The above method is considered inefficient because the location of a pixel in memory is being calculated every time we call the at method. This involves a multiplication operation. The fact that the pixels are located in a contiguous block of memory is not used.

Method 2 : Pixel Access Using Pointer Arithmetic

In OpenCV, all pixels in a row are stored in one continuous block of memory. If the Mat object is created using the create, ALL pixels are stored in one contiguous block of memory. Since we are reading the image from disk and imread uses the create method, we can simply loop over all pixels using simple pointer arithmetic that does not require a multiplication.

The code is shown below.

// Using pointer arithmetic // Get pointer to first pixel Pixel* pixel = image1.ptr<Pixel>(0,0); // Mat objects created using the create method are stored // in one continous memory block. const Pixel* endPixel = pixel + image1.cols * image1.rows; // Loop over all pixels for (; pixel != endPixel; pixel++) { complicatedThreshold(*pixel); }

Method 3 : Using forEach

The forEach method of the Mat class, takes in a function operator. The usage is

void cv::Mat::forEach (const Functor &operation)

The easiest way to understand the above usage is by way of an example shown below. We define a function object ( Operator ) for use with forEach.

// Parallel execution with function object. struct Operator { void operator ()(Pixel &pixel, const int * position) const { // Perform a simple threshold operation complicatedThreshold(pixel); } };

Calling forEach is straightforward and is done in just one line of code

// Call forEach image2.forEach<Pixel>(Operator());

Method 4 : Using forEach with C++11 Lambda

Some of you are looking at Method 3, shaking your head in disgust and shouting, “lambda, Lambda, LAMBDA!”

Well, here you go, C++11 junkie!

image3.forEach<Pixel> ( [](Pixel &pixel, const int * position) -> void { complicatedThreshold(pixel); } );

Comparing Performance of forEach

The function complicatedThreshold was applied to all pixels of a large image of size 9000 x 6750 five times in a row. The 2.5 GHz Intel Core i7 processor, used in the experiment, has four cores. The following timings were obtained. Note that using forEach made the code about five times faster than using Naive Pixel Access or Pointer Arithmetic method.

Method Type Time ( milliseconds ) Naive Pixel Access 6656 Pointer Arithmetic 6575 forEach 1221 forEach (C++11 Lambda) 1272

I have been writing code in OpenCV for more than a decade and whenever I had to write optimized code that accessed a pixel, I used pointer arithmetic instead of the naive at method. However, while writing this blog post, I was shocked to find there does not seem to be much of difference between the two methods even for large images.

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in all the posts of this blog, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now