C++ Papers for Chicago: Part 1 - Concurrency

published at 14.09.2013 14:40 by Jens Weller

As I did write a series about the papers for Bristol, this is the start of the series for Chicago, as at the end of this month the C++ committee will meet again for standardization. I try to cover most papers from the July mailing, and those from the September (or pre-chicago) mailing. But this might take sometime, as I am also currently occupied with organizing my own C++ conference Meeting C++ 2013. This conference also provides me with the necessary funding for running this site and blog. And makes this series possible...

For the start, I will first take a look at a selection of papers from the domain of concurrency, before I go on and cover most or all other papers. The reason for this is, that in the last weeks I already have read a lot of them, either for preparing the blog post about resumable functions, or as a part of an overview of coming concurrency features in C++ in my boost/std thread training course.

I will not change the format, only that this first post will contain the papers about concurrency, maybe I'll keep this with the next posts too, so that each post contains papers from a certain area. This series will contain features of C++14, but even more features which might be part of C++1y or beyond. Also, all of those papers are proposals, the committee might not integrate all of them in a future version of C++. Some were already part of the last series, and have been now updated.

N3696 - Proposal to extend atomic with priority update functions

This proposal is about extending std::atomic from <atomic>, obviously with something called a priority update function. The goal of such a function is easily explained: only change(update) the value of the value of the atomic, if the predicate returns true. For this reason, the new value has to be compared with the old value. There are 3 different member functions for this proposed:

template<class V> T priority_update(T value, V predicate) T fetch_min(T value) T fetch_max(T value)

T is atomics value type in std::atomic<T>, the first version compares value with the stored value via the predicate. The two following versions use less<T> and greater<T> to compare. So, the value does not have to be compared for equality, it can be compared after a predicate the user chooses. The authors claim, that this can improve the performance of some concurrent algorithms.

N3710 - Specifying the absence of "out of thin air" results

This paper cares about the wording with atomics. There is a possibility for an implementation to speculate what a concurrent situation of loading and storing x / y on one thread, and y / x on the other will result to. No known implementation does such, but the standard should take care of such a situation. Also the author points out, that this situation is an actual problem in Java. The paper presents options to prohibit those "out of thin air" results, a load of memory_order_relaxed should be prevented from being reordered with a store.

N3711 - Taskgroups as a lower level C++ library solution to fork join parallelism

This paper deals with grouping tasks in a class task_group. The claim is, that this enables developers to write expressive and portable parallel code. The task_group proposal is build upon a common subset of the taskgroups of Microsofts Parallel Pattern Library(PPL) and Intels Threading Building Blocks(TBB). This proposal complements the parallel STL proposal by enabling arbitrary fork-join parallelism. Together with the parallel STL proposal this paper presents an alternative to a language level implementation of low-level fork-join parallelism (cilk_spawn/cilk_sync f.e.).

The class task_group offers the interface:

static const auto ignore_exceptions = implementation-defined; template<class ExceptionHandler> task_group(ExceptionHandler&& handler); ~task_group()nothrow; task_group(const task_group&) = delete; task_group& operator=(const task_group&) = delete; template<typename Function, typename Args...> void run(Function&& func, Args&&...args);

The paper specifies, that the destructor of the task_group shall call join for all pending tasks, unlike the destructor of std::thread calling std::terminate. Exceptions are handled over the given ExceptionHandler in the constructor.

N3712 - Policy based design for save destruction in concurrent containers

When implementing concurrent containers, e.g. containers supporting concurrent read/write on items, one of the problems to deal with is, to decide when its safe to delete an item. There are several solutions to this problem, and the authors conclude, that with policy based design, those solutions could be abstracted. This would allow the user to choose the right solution for his means. The paper goes into some details for possible solutions, and how they could fit into a policy based design. Also as an example a few concurrent containers using this solution are sketched in the paper.

N3721 - Improvements to std::future<T> and related APIs

This paper proposes to extend std::future with a few member or freestanding functions, allowing various operations.

Which are:

then

unwrap

is_ready

when_any / when_all

make_ready_future

The .then member function shall take a callable, which will be called with the resulting value of the future once its ready (calculated). In some cases, its useful to wrap a future into a future, unwrap lets you access the future inside the outer future. The authors claim that it is not easy to get this right for such nested futures (exception handling f.e.), such the standard should provide such functionality. The method bool is_ready lets the user query in a non blocking way if the future is ready. when_any and when_all represent freestanding functions, which have a range of futures as an argument, and return either when any future or all futures have been calculated. The function make_ready_future returns a future that is ready, and has the value given as an argument to it. Sometimes it is necessary to wrap a result into a future, so make_ready_future provides the corresponding make function.

N3722 - resumable functions

Well, this proposal is about how to deal with concurrency at the language level, and make a function or method resumable. Which means, that parts of its inner calls are executed concurrently, and the first of such call leads the resumable function to return. A resumable function is restricted in its returntype, it must return a valid future implementation, e.g. std::future<T>. I have already written a whole blog post about resumable functions, and there was also a very good talk about resumable functions at GoingNative 2013.

N3724 - A parallel Algorithms Library

This paper proposes to create a parallel version of the STL in the C++ ISO standard. The proposed library ads a policy to known STL Algorithms, specifying the parallel execution type such as GPU, vectorization, parallel or sequential. This paper is a follow up to N3554, which I already wrote a little more detailed about in the 2nd part of my bristol papers series.

N3731 - Executors and Schedulers, Revision 2

This paper tries to define a way to get executors and schedulers into the C++ standard. While for example futures and std::async or resumable functions are the frontend used by the user to access concurrency, executors and schedulers can be understand as the backend, the engine of concurrency. As concurrently executed work items such as tasks should not spawn a new thread each time they are executed, there is the need for some backend managing the execution of tasks. This can be represented through the executor pattern, where several implementations of the executor would allow the user to switch between different models of execution. The fundamental design is a base class that takes closures (aka tasks) and runs them, usually asynchronously. The paper defines a closure as std::function<void()> as a common interface. Running futures on the executer can be achieved via std::packaged_task. The proposal would like to add 4 different executors to the Standard:

thread_pool

serial_executor

loop_executor

inline_executor

Plus the base class executor. Each class is contained in a header <$name>.

N3732 - Value-oriented Concurrent Unordered Containers

This paper is about adding a concurrent variation of std::unordered_map to the standard. Currently std::unordered_map allows concurrent find operations, but not find, insert and erase concurrently. The authors propose a conceptual interface for the new container:

find

insert

exchange - insert, and replace an existing key,value in the map

erase

reduce - insert, but perform a reduction if the key already exists

All of those methods return a std::optional<mapped_type> with the previous value.

N3734 - Vector Programming - a Proposal for WG21

Thought I have seen it all, but this is actually a proposal worth looking at, as its an presentation exported to PDF. It features a lot of graphics and bullet points. Not sure how to summarize it, but it claims to present a possible syntax for vectorisation in C++. It covers vector-loops, elemental functions and array notation. The last might not be presented in Chicago, but as the paper defines them:

vector loops: loops executed in "vector order" and use vector instructions (unroll loops to SIMD)

elemental functions: compiled as if part of a vector loop

array notation: element-wise operations (foreach) on arrays using vector order semantics

I'll be honest and say that I'm not sure if a presentation is the right format for a proposal. Maybe its great to get started, as its a great way to give an overview. But this also makes it not very detailed, and I hope that this will be transfered into a more detailed paper in the future.

N3735 - On the difference between parallel loops and vector loops

Again a powerpoint presentation turned into a PDF. Second slide claims its not a proposal, but its listed as one (having a N#### Number makes it for me a proposal), so most people might treat it as one. Well, to be fair, this is a clarification on existing proposals. Which proposals? The author doesn't mention that totally unimportant fact. So, if you are interested in the title, the paper might be worth a look at, other wise, well its not a proposal...

N3747 - A Universal Model for Asynchronous Operations

This paper tries to find a universal model for async operations. It points to the papers for improving std::future and resumable functions, but in their version from the May mailing, not the current ones. The paper states that futures can be a poor choice for implementing async operations, and that a pure callback based approach can have its advantages. The author presents an approach to a unified model, usable with both models, futures and callbacks. The callback oriented approach is known for example from boost::asio. In performance critical application such as finance, it can be that the user would like to switch the underlying model of asynchronous operations. Futures do have a little overhead, the callback based approach can be a few microseconds more performant. The author looks very detailed at both models, and then presents the universal approach introduced into boost::asio with boost 1.54, handling callbacks, futures and resumable functions or coroutines.

Adding this universal model to the standard would only impact the library not the language, introducing two typedefs (handler_type and async_result).

N3750 - C++ ostream buffers

This paper is about the concern of synchronizing output streams in C++. While its currently guaranteed that this isn't producing a race condition, but the standard currently does not define means of synchronization. The paper names 3 previously submitted papers dealing with this or similar issues:

N3535 - C++ Stream mutexes

N3678 - C++ Stream guards

N3665 - Uninterleaved String output streaming

The paper claims that at the July meeting of WG21 (the concurrency subgroup), concerns were that buffering should be explicit. This is, what this paper presents as a solution, using a stream buffer for means of synchronization:

{ std::ostream_buffer bout(std::cout); bout.stream() << "Hello, " << "World!" << std::endl; }

The proposed ostream_buffer will automatically transfer its buffered content to a ostream when destroyed. Internally it buffers its content into a basic_ostringstream. This approach also ensures some output when exceptions are raised. The implementation either could use the proposed stream mutex from N3535 or leverage on the solution proposed in N3665 (using Posix file locks).

This is the end of part 1, Part 2 about core, concepts and evolution.

Join the Meeting C++ patreon community!

This and other posts on Meeting C++ are enabled by my supporters on patreon!