Transform-reduce is a pattern in which a set of data is first modified by applying a transformation on each of the elements and then it is reduced to a single value. In C++, this can be implemented straightforwardly with std::transform and std::accumulate . In C++17, an alternative for std::accumulate is available; std::reduce sums a range of elements just like std::accumulate , except that it does so out of order. That means you cannot use it with operators that are not communicative or associative (including overloads of operator+ that don’t exhibit these properties). On the other hand, there is yet another algorithm called std::transform_reduce that applies a functor to all the elements of a range and then reduces them, all in an out of order manner. And then, there are also parallel versions of these algorithms. In this post, I will try to compare the performance of these possible alternatives for implementing transform-reduce.

Let’s start with a problem: we have a list of order prices on which we need to apply a discount; if the order price is greater then 100, then a 20% discount applies. We need to compute the total sum of all orders after discounts have been applied.

For convenience, we will use the following function to apply a discount on an order price:

inline double apply_discount(double const value) { return value > 100 ? 0.8 * value : value; } 1 2 3 4 inline double apply_discount ( double const value ) { return value > 100 ? 0.8 * value : value ; }

The traditional way to solve this problem is to first use std::transform to modify the elements of the range by applying the discount (with apply_discount ) and then summing all the resulted values with std::accumulate . That should look like the following:

template <typename C> long double transform_and_accummulate(C&& c) { std::transform(std::begin(c), std::end(c), std::begin(c), apply_discount); return std::accumulate(std::cbegin(c), std::cend(c), 0.0L); } 1 2 3 4 5 6 7 8 9 template < typename C > long double transform_and_accummulate ( C && c) { std::transform(std::begin(c), std::end(c), std::begin(c), apply_discount); return std : : accumulate ( std : : cbegin ( c ) , std : : cend ( c ) , 0.0L ) ; }

In C++17, we can replace std::accumulate with std::reduce , since summing the elements of the prices range can be done in any order; the result would be the same. std::reduce has various overloads but for this problem we need one that takes the range bound iterators and an initial value (and implicitly uses std::plus<> to sum the elements).

template <typename C> long double transform_and_reduce(C&& c) { std::transform(std::begin(c), std::end(c), std::begin(c), apply_discount); return std::reduce(std::cbegin(c), std::cend(c), 0.0L); } 1 2 3 4 5 6 7 8 9 template < typename C > long double transform_and_reduce ( C && c) { std::transform(std::begin(c), std::end(c), std::begin(c), apply_discount); return std : : reduce ( std : : cbegin ( c ) , std : : cend ( c ) , 0.0L ) ; }

C++17 also provides a parallel version of tens of algorithms, including std::transform and std::reduce but not for std::accumulate . The algorithms that do have parallel versions have overloads that take an execution policy. This can be one of:

std::execution::seq : execution of the algorithm is sequential;

: execution of the algorithm is sequential; std::execution::par : execution of the algorithm may be parallelized on the calling thread or on another thread;

: execution of the algorithm may be parallelized on the calling thread or on another thread; std::execution::par_unseq : execution of the algorithm may be parallelized, vectorized, or migrated across threads.

When using std::execution::seq , the execution is the same as when using an overload without an execution policy. On the other hand, std::execution::par and std::execution::par_unseq may execute the algorithm in parallel. par_unseq requires stronger quarantees than par ; the function calls are unsequenced with respect for each other. Because of that, it is not possible to perform vectorization unsafe operations, such as allocate or deallocate memory, acquire mutexes, use non-lockfree std::atomic specializations, when using this policy.

With this in mind, we can rewrite the transform_and_reduce function to also specify an execution policy, as follows:

template <typename C, class ExecutionPolicy> long double transform_and_reduce(C&& c, ExecutionPolicy&& policy) { std::transform(std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), std::begin(c), apply_discount); return std::reduce(std::forward<ExecutionPolicy>(policy), std::cbegin(c), std::cend(c), 0.0L); } 1 2 3 4 5 6 7 8 9 10 11 template < typename C , class ExecutionPolicy > long double transform_and_reduce ( C && c, ExecutionPolicy&& policy) { std::transform(std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), std::begin(c), apply_discount); return std : : reduce ( std : : forward < ExecutionPolicy > ( policy ) , std : : cbegin ( c ) , std : : cend ( c ) , 0.0L ) ; }

This can be, however, replaced with the C++17 new standard algorithm std::transform_reduce . This again has multiple overloads to support different scenarios and needs but the one we are interested in takes a policy, range bound iterators, an initial value, a binary functor to reduce the values (we can use std::plus<> ) and a unary functor to transform the range elements.

template <typename C, class ExecutionPolicy> long double transform_reduce(C&& c, ExecutionPolicy&& policy) { return std::transform_reduce( std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), 0.0L, std::plus<>(), apply_discount); } 1 2 3 4 5 6 7 8 9 10 template < typename C , class ExecutionPolicy > long double transform_reduce ( C && c, ExecutionPolicy&& policy) { return std::transform_reduce( std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), 0.0L, std::plus<>(), apply_discount); }

The question is, how do these perform compared to each other for various range sizes. To check that, I have written the following testing program. It generates vectors of random values, from 100 to 100 million elements, calls all these functions, will all the three possible execution policies, and prints the results.

inline void assert_are_equal(long double const d1, long double const d2, long double diff = 0.001) { if (std::abs(d1 - d2) > diff) std::terminate(); } int main() { std::random_device rd; auto seed_data = std::array<int, std::mt19937::state_size> {}; std::generate(std::begin(seed_data), std::end(seed_data), std::ref(rd)); std::seed_seq seq(std::begin(seed_data), std::end(seed_data)); auto eng = std::mt19937{ seq }; auto dist = std::uniform_real_distribution<>{ 1, 250 }; std::cout << std::left << std::setw(15) << "size" << std::left << std::setw(10) << "t+a/seq" << std::left << std::setw(10) << "t+r/seq" << std::left << std::setw(10) << "t+r/par" << std::left << std::setw(10) << "t+r/vec" << std::left << std::setw(10) << "t_r/sec" << std::left << std::setw(10) << "t_r/par" << std::left << std::setw(10) << "t_r/vec" << '

'; size_t size = 100; for (size_t i = 0; i < 13; ++i) { std::vector<double> v(size); std::generate_n(v.data(), size, [&eng, &dist]() {return dist(eng); }); auto v1 = v; auto s1 = std::chrono::high_resolution_clock::now(); auto r1 = transform_and_accummulate(v1); auto e1 = std::chrono::high_resolution_clock::now(); auto v2 = v; auto s2 = std::chrono::high_resolution_clock::now(); auto r2 = transform_and_reduce(v2, std::execution::seq); auto e2 = std::chrono::high_resolution_clock::now(); auto v3 = v; auto s3 = std::chrono::high_resolution_clock::now(); auto r3 = transform_and_reduce(v3, std::execution::par); auto e3 = std::chrono::high_resolution_clock::now(); auto v4 = v; auto s4 = std::chrono::high_resolution_clock::now(); auto r4 = transform_and_reduce(v4, std::execution::par_unseq); auto e4 = std::chrono::high_resolution_clock::now(); auto v5 = v; auto s5 = std::chrono::high_resolution_clock::now(); auto r5 = transform_reduce(v5, std::execution::seq); auto e5 = std::chrono::high_resolution_clock::now(); auto v6 = v; auto s6 = std::chrono::high_resolution_clock::now(); auto r6 = transform_reduce(v6, std::execution::par); auto e6 = std::chrono::high_resolution_clock::now(); auto v7 = v; auto s7 = std::chrono::high_resolution_clock::now(); auto r7 = transform_reduce(v7, std::execution::par_unseq); auto e7 = std::chrono::high_resolution_clock::now(); auto t1 = std::chrono::duration_cast<std::chrono::microseconds>(e1 - s1); auto t2 = std::chrono::duration_cast<std::chrono::microseconds>(e2 - s2); auto t3 = std::chrono::duration_cast<std::chrono::microseconds>(e3 - s3); auto t4 = std::chrono::duration_cast<std::chrono::microseconds>(e4 - s4); auto t5 = std::chrono::duration_cast<std::chrono::microseconds>(e5 - s5); auto t6 = std::chrono::duration_cast<std::chrono::microseconds>(e6 - s6); auto t7 = std::chrono::duration_cast<std::chrono::microseconds>(e7 - s7); assert_are_equal(r1, r2); assert_are_equal(r1, r3); assert_are_equal(r1, r4); assert_are_equal(r1, r5); assert_are_equal(r1, r6); assert_are_equal(r1, r7); std::cout << std::left << std::setw(15) << size << std::left << std::setw(10) << t1.count() << std::left << std::setw(10) << t2.count() << std::left << std::setw(10) << t3.count() << std::left << std::setw(10) << t4.count() << std::left << std::setw(10) << t5.count() << std::left << std::setw(10) << t6.count() << std::left << std::setw(10) << t7.count() << '

'; size *= (i % 2 == 0 ? 5 : 2); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 inline void assert_are_equal ( long double const d1 , long double const d2 , long double diff = 0.001 ) { if ( std : : abs ( d1 - d2 ) > diff ) std : : terminate ( ) ; } int main ( ) { std : : random_device rd ; auto seed_data = std : : array < int , std : : mt19937 : : state_size > { } ; std : : generate ( std : : begin ( seed_data ) , std : : end ( seed_data ) , std : : ref ( rd ) ) ; std : : seed_seq seq ( std : : begin ( seed_data ) , std : : end ( seed_data ) ) ; auto eng = std : : mt19937 { seq } ; auto dist = std : : uniform_real_distribution < > { 1 , 250 } ; std : : cout < < std : : left < < std : : setw ( 15 ) < < "size" < < std : : left < < std : : setw ( 10 ) < < "t+a/seq" < < std : : left < < std : : setw ( 10 ) < < "t+r/seq" < < std : : left < < std : : setw ( 10 ) < < "t+r/par" < < std : : left < < std : : setw ( 10 ) < < "t+r/vec" < < std : : left < < std : : setw ( 10 ) < < "t_r/sec" < < std : : left < < std : : setw ( 10 ) < < "t_r/par" < < std : : left < < std : : setw ( 10 ) < < "t_r/vec" < < '

' ; size_t size = 100 ; for ( size_t i = 0 ; i < 13 ; ++ i ) { std : : vector < double > v ( size ) ; std : : generate_n ( v . data ( ) , size , [ &eng, &dist]() {return dist(eng); } ) ; auto v1 = v ; auto s1 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r1 = transform_and_accummulate ( v1 ) ; auto e1 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v2 = v ; auto s2 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r2 = transform_and_reduce ( v2 , std : : execution : : seq ) ; auto e2 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v3 = v ; auto s3 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r3 = transform_and_reduce ( v3 , std : : execution : : par ) ; auto e3 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v4 = v ; auto s4 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r4 = transform_and_reduce ( v4 , std : : execution : : par_unseq ) ; auto e4 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v5 = v ; auto s5 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r5 = transform_reduce ( v5 , std : : execution : : seq ) ; auto e5 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v6 = v ; auto s6 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r6 = transform_reduce ( v6 , std : : execution : : par ) ; auto e6 = std : : chrono : : high_resolution_clock : : now ( ) ; auto v7 = v ; auto s7 = std : : chrono : : high_resolution_clock : : now ( ) ; auto r7 = transform_reduce ( v7 , std : : execution : : par_unseq ) ; auto e7 = std : : chrono : : high_resolution_clock : : now ( ) ; auto t1 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e1 - s1 ) ; auto t2 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e2 - s2 ) ; auto t3 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e3 - s3 ) ; auto t4 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e4 - s4 ) ; auto t5 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e5 - s5 ) ; auto t6 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e6 - s6 ) ; auto t7 = std : : chrono : : duration_cast < std : : chrono : : microseconds > ( e7 - s7 ) ; assert_are_equal ( r1 , r2 ) ; assert_are_equal ( r1 , r3 ) ; assert_are_equal ( r1 , r4 ) ; assert_are_equal ( r1 , r5 ) ; assert_are_equal ( r1 , r6 ) ; assert_are_equal ( r1 , r7 ) ; std : : cout < < std : : left < < std : : setw ( 15 ) < < size < < std : : left < < std : : setw ( 10 ) < < t1 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t2 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t3 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t4 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t5 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t6 . count ( ) < < std : : left < < std : : setw ( 10 ) < < t7 . count ( ) < < '

' ; size *= ( i % 2 == 0 ? 5 : 2 ) ; } }

Here is an output example (Visual Studio 2017 15.6, release built for x64):

size t+a/seq t+r/seq t+r/par t+r/vec t_r/sec t_r/par t_r/vec 100 0 0 89 6 0 3 2 500 2 1 37 20 1 7 6 1000 4 4 37 21 4 14 20 5000 22 23 52 64 22 32 36 10000 47 46 76 90 45 47 49 50000 229 238 170 177 220 132 136 100000 456 473 296 300 491 194 188 500000 2250 2688 922 890 2922 829 977 1000000 5200 5117 1618 1662 4604 1457 1461 5000000 24074 27139 9195 12550 24576 7296 7768 10000000 46948 52491 17609 26444 46819 23617 15381 50000000 237774 260770 92045 91120 234912 86378 101222 100000000 475621 526457 205715 184198 475033 180723 208145 1 2 3 4 5 6 7 8 9 10 11 12 13 14 size t + a / seq t + r / seq t + r / par t + r / vec t_r / sec t_r / par t_r / vec 100 0 0 89 6 0 3 2 500 2 1 37 20 1 7 6 1000 4 4 37 21 4 14 20 5000 22 23 52 64 22 32 36 10000 47 46 76 90 45 47 49 50000 229 238 170 177 220 132 136 100000 456 473 296 300 491 194 188 500000 2250 2688 922 890 2922 829 977 1000000 5200 5117 1618 1662 4604 1457 1461 5000000 24074 27139 9195 12550 24576 7296 7768 10000000 46948 52491 17609 26444 46819 23617 15381 50000000 237774 260770 92045 91120 234912 86378 101222 100000000 475621 526457 205715 184198 475033 180723 208145

What I can see from here is that:

until 50,000 elements std::transform + std::accumulate , sequential std::transform + std::reduce and std::transform_reduce have similar times

+ , sequential + and have similar times after 50,000 elements the parallel version of std::transform_reduce is performing the best, with parallel std::transform + std::reduce comming close

is performing the best, with parallel + comming close the par_unseq version of std::transform + std::reduce is slightly better than the par version after more than 10 million elements; that is not the case for std::transform_reduce , whose vectorized version is only better at around 10 million elements.

Of course, you could argue that the call to std::transform is not actually needed here and the discount can be applied while computing the sum. Although this has different semantics, these functions could be simply rewritten as follows:

template <typename C> long double transform_and_accummulate(C&& c) { return std::accumulate( std::cbegin(c), std::cend(c), 0.0L, [](auto const sum, auto const price) { return sum + apply_discount(price); }); } template <typename C, class ExecutionPolicy> long double transform_and_reduce(C&& c, ExecutionPolicy&& policy) { return std::reduce( std::forward<ExecutionPolicy>(policy), std::cbegin(c), std::cend(c), 0.0L, [](auto const sum, auto const price) { return sum + apply_discount(price); }); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 template < typename C > long double transform_and_accummulate ( C && c) { return std::accumulate( std::cbegin(c), std::cend(c), 0.0L, [](auto const sum, auto const price) { return sum + apply_discount(price); } ) ; } template < typename C , class ExecutionPolicy > long double transform_and_reduce ( C && c, ExecutionPolicy&& policy) { return std::reduce( std::forward<ExecutionPolicy>(policy), std::cbegin(c), std::cend(c), 0.0L, [](auto const sum, auto const price) { return sum + apply_discount(price); } ) ; }

In this case, however, you can not execute transform_and_reduce with the par or par_unseq policies because they would not yield correct results.

size t+a/seq t+r/seq 100 0 0 500 2 2 1000 4 4 5000 21 21 10000 43 42 50000 223 226 100000 433 449 500000 2455 2264 1000000 4774 4605 5000000 23687 23554 10000000 45510 45672 50000000 229429 229375 100000000 459254 460425 1 2 3 4 5 6 7 8 9 10 11 12 13 14 size t + a / seq t + r / seq 100 0 0 500 2 2 1000 4 4 5000 21 21 10000 43 42 50000 223 226 100000 433 449 500000 2455 2264 1000000 4774 4605 5000000 23687 23554 10000000 45510 45672 50000000 229429 229375 100000000 459254 460425

Let us complicate the problem a bit and consider a list of orders, each order having a quantity and price per item. We should again calculate the total orders price, by summing individual order prices (quantity * price) after applying a discount the same way we did earlier. We could use the following order structure:

struct order { int quantity; double price; }; 1 2 3 4 5 struct order { int quantity ; double price ; } ;

The implementation using std::transform + std::accumulate could look as follows:

template <typename C> long double transform_and_accumulate(C&& c) { std::transform( std::begin(c), std::end(c), std::begin(c), [](order const & o) { return order{ o.quantity, apply_discount(o.price) }; }); return std::accumulate( std::cbegin(c), std::cend(c), 0.0L, [](auto const & sum, auto const & o) { return sum + (o.quantity * o.price); }); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template < typename C > long double transform_and_accumulate ( C && c) { std::transform( std::begin(c), std::end(c), std::begin(c), [](order const & o) { return order{ o.quantity, apply_discount(o.price) }; } ) ; return std : : accumulate ( std : : cbegin ( c ) , std : : cend ( c ) , 0.0L , [ ] ( auto const & sum, auto const & o) { return sum + (o.quantity * o.price); } ) ; }

The alternative with std::transform + std::reduce is not as straight forward as it may seem. First of all, std::reduce cannot use the same binary functor as std::accumulate because of some key requirements:

T , the type of the initial value, must meet the requirements of MoveConstructible, and

, the type of the initial value, must meet the requirements of MoveConstructible, and binary_op(init, *first) , binary_op(*first, init) , binary_op(init, init) , and binary_op(*first, *first) must be convertible to T .

That means we must perform a certain trick to make std::reduce work:

the type of the initial value should be order and not long double , and

and not , and the return type of the binary operation should also be order ; this functor would actually return a new order value with the quantity being irrelevant (and set to zero) and the price being the accumulated total price.

However, this no longer makes it possible for std::reduce to execute in parallel and yield correct results. Therefore, the implementation in this case could be as follows:

template <typename C, class ExecutionPolicy> long double transform_and_reduce(C&& c, ExecutionPolicy&& policy) { std::transform( std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), std::begin(c), [](order const & o) { return order{ o.quantity, apply_discount(o.price) }; }); return std::reduce( std::cbegin(c), std::cend(c), order{ 0, 0.0L }, [](order const & sum, order const & o) { return order{ 0, sum.price + (o.quantity * o.price) }; }).price; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template < typename C , class ExecutionPolicy > long double transform_and_reduce ( C && c, ExecutionPolicy&& policy) { std::transform( std::forward<ExecutionPolicy>(policy), std::begin(c), std::end(c), std::begin(c), [](order const & o) { return order{ o.quantity, apply_discount(o.price) }; } ) ; return std : : reduce ( std : : cbegin ( c ) , std : : cend ( c ) , order { 0 , 0.0L } , [ ] ( order const & sum, order const & o) { return order{ 0, sum.price + (o.quantity * o.price) }; } ) . price ; }

This does not look great and it’s exactly where std::transform_reduce comes to the rescue. This standard algorithm allows us to supply a unary operation to transform each element of the input range; we can use a lambda that returns quantity * discount(price) . Then, we can use the binary functor std::plus<> to sum the results of the unary operation on an initial value. And this can all be done in parallel or even parallel unsequenced.

template <typename C, class ExecutionPolicy> long double transform_reduce(C&& c, ExecutionPolicy&& policy) { return std::transform_reduce( std::forward<ExecutionPolicy>(policy), std::cbegin(c), std::cend(c), 0.0L, std::plus<>(), [](order const & o) { return o.quantity * apply_discount(o.price); }); } 1 2 3 4 5 6 7 8 9 10 11 template < typename C , class ExecutionPolicy > long double transform_reduce ( C && c, ExecutionPolicy&& policy) { return std::transform_reduce( std::forward<ExecutionPolicy>(policy), std::cbegin(c), std::cend(c), 0.0L, std::plus<>(), [](order const & o) { return o.quantity * apply_discount(o.price); } ) ; }

The results in this case look like the following:

size t+a/seq t+r/seq t+r/par t+r/vec t_r/sec t_r/par t_r/vec 100 0 1 66 8 0 3 2 500 2 6 14 20 2 11 11 1000 5 12 54 27 4 20 18 5000 29 60 80 86 22 37 26 10000 60 128 135 205 45 56 51 50000 392 612 495 552 227 114 107 100000 590 1214 1786 1238 478 193 186 500000 3155 6279 4706 4187 2526 813 704 1000000 6709 12746 9185 9654 5522 1750 1374 5000000 31477 63989 42310 41512 26201 8015 7190 10000000 62533 128311 92530 81102 51302 16070 14188 50000000 313559 643133 427432 485140 270360 101922 81585 100000000 632420 1282368 877781 941224 599787 244742 284056 1 2 3 4 5 6 7 8 9 10 11 12 13 14 size t + a / seq t + r / seq t + r / par t + r / vec t_r / sec t_r / par t_r / vec 100 0 1 66 8 0 3 2 500 2 6 14 20 2 11 11 1000 5 12 54 27 4 20 18 5000 29 60 80 86 22 37 26 10000 60 128 135 205 45 56 51 50000 392 612 495 552 227 114 107 100000 590 1214 1786 1238 478 193 186 500000 3155 6279 4706 4187 2526 813 704 1000000 6709 12746 9185 9654 5522 1750 1374 5000000 31477 63989 42310 41512 26201 8015 7190 10000000 62533 128311 92530 81102 51302 16070 14188 50000000 313559 643133 427432 485140 270360 101922 81585 100000000 632420 1282368 877781 941224 599787 244742 284056

What we can see from here is that:

std::transform + std::accumulate performs much better than std::transform + std::reduce

+ performs much better than + std::transform_reduce performs better than any of the two, regardless it is sequential or parallel, after about 10000 elements

performs better than any of the two, regardless it is sequential or parallel, after about 10000 elements the parallel unsequenced version is better or much better comparing to sequential and parallel runs when the size of the input range is between 50,000 elements.

Conclusion

In C++17, there are various alternatives for implementing the transform-reduce pattern, sequentially, in parallel or even parallel and vectorized. These can accommodate different needs, but the performance may differ depending on the actual problem you are solving and the size of the input datasets. Therefore, you should use the one that fits your needs the best.

See also

Share this: Facebook

Twitter

Print

More

Email

Reddit



