Primary Motivation

My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded next_combination has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.

Single Threaded Permutation

First I’ll show you the example on using next_permutation in single threaded scenario. The while loop display std_permuted until next_permutation returned false when std_permuted is detected to be in descending order. That is “54321”.

void usage_of_next_perm() { std::string std_permuted = "12345"; do { std::cout << std_permuted << std::endl; } while (std::next_permutation(std_permuted.begin(), std_permuted.end())); }

Output is follows.

12345 12354 12435 12453 12534 12543 ... 54123 54132 54213 54231 54312 54321

Multi-Threaded Permutation

Next, I’ll show you on how to finding permutation in multithreaded scenario. There is a total of 120 different permutations in collection of 5 elements. We’ll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1st one is already computed by find_perm_by_idx in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1st thread finds 0th to 29th permutations, while 2nd thread starts from 30th to 59th permutations, and 3rd thread begins from 60th to 89 and so on. From 2nd thread and 3rd thread to start from 30th and 60th permutation respectively, we need to make use of find_perm_by_idx to find 30th and 60th permutation given their index, we’ll find the rest with next_permutation . Techincally, it is possible to use find_perm_by_idx to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because find_perm_by_idx is much much slower than next_permutation . As a word of caution, index_to_find must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, index_to_find type can store 0 to 10 but if it cannot store the factorial(n), find_perm_by_idx will not generate the permutation correctly. This applies to find_comb_by_idx as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And vector_type can be any collection type that supports push_back() and size() methods, like std::string . You can see I use std::string in my example.

I join my theads before starting the next thread because all_results is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to find_perm_by_idx . You can see I call the the slow find_perm_by_idx in the main thread. It is your job to try figure out on how to call find_perm_by_idx in the worker threads.

template<typename int_type, typename vector_type> vector_type find_perm_by_idx(int_type index_to_find, vector_type& original_vector); void usage_of_perm_by_idx() { uint64_t index_to_find = 0; std::string original_text = "12345"; std::string std_permuted = "12345"; std::vector all_results; size_t repeat_times = 30; for (size_t i=0; i<4; ++i) { std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text); std::thread th([permuted, &all_results, repeat_times] () mutable { // do your work on the permuted result, instead of pushing to vector all_results.push_back(permuted); for (size_t j=1; j<repeat_times; ++j) { std::next_permutation(permuted.begin(), permuted.end()); // do your work on the permuted result, instead of pushing to vector all_results.push_back(permuted); } }); th.join(); index_to_find += repeat_times; } }

I skip the output here, I have verified they are the same as generated by single threaded next_permutation .

Single Threaded Combination

Next, we have come to next_combination section. To compute the total combination count, we use compute_total_comb . total must be of type that is large enough to store the factorial(n).

void usage_of_next_comb() { std::string original_text = "123456"; std::string std_combined = "123"; uint64_t total = 0; if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total)) { std::cout << std_combined << std::endl; for (uint64_t i = 1; i < total; ++i) { stdcomb::next_combination(original_text.begin(), original_text.end(), std_combined.begin(), std_combined.end()); std::cout << std_combined << std::endl; } } }

Output is follows. If your collection is sorted like original_text , all generated combination will be sorted as well, as you can see below.

123 124 125 126 134 135 136 145 146 156 234 235 236 245 246 256 345 346 356 456

Multi-Threaded Combination

index_to_find must be of type that is large enough to store the factorial(n) and vector_type can be any collection type that supports push_back() and size() methods, like std::string . Similar to find_perm_by_idx mentioned above, it is possible to find all combinations with find_comb_by_idx but it is much slower compared to next_combination , so we resort to next_combination to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1st one is already computed by find_comb_by_idx in the main thread and passed to the thread.

template<typename int_type, typename vector_type> vector_type find_comb_by_idx(const uint32_t subset, int_type index_to_find, vector_type& original_vector); void usage_of_comb_by_idx() { uint64_t index_to_find = 0; std::string original_text = "123456"; std::string std_combined = "123"; std::vector all_results; size_t repeat_times = 5; for (size_t i = 0; i < 4; ++i) { std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(), index_to_find, original_text); std::thread th([original_text, combined, &all_results, repeat_times] () mutable { // do your work on the combined result, instead of pushing to vector all_results.push_back(combined); for (size_t j = 1; j < repeat_times; ++j) { stdcomb::next_combination(original_text.begin(), original_text.end(), combined.begin(), combined.end()); // do your work on the combined result, instead of pushing to vector all_results.push_back(combined); } }); th.join(); index_to_find += repeat_times; } }

Output is skipped here, afterall I have verified they are the same as generated by single threaded next_combination .