Urbana Proposals - C++17 insight? - Concurrency

published at 25.10.2014 01:17 by Jens Weller

A short series to give you an overview over the Papers submitted in the latest mailing for the C++ Committee Meeting in Urbana-Champaign in Illinois. At the beginning of November the C++ Committee will have its 3rd Meeting this year. As C++14 is now finished, the focus is clearly on the upcoming C++17 standard.

Still, I think it is important to understand, that not all of these papers aim for C++17, nor is a single proposal able to directly become a part of the C++ standard. The model which will define C++17, is that certain groups of papers will form technical specifications, which then later get adopted into the next standard. So, if you want to know what to expect from C++17, look at the technical specifications to have an overview. I expect that C++17 will take shape by next year, and that in 2016 last corrections will be applied to be able to release in 2017 C++17.

Also, there is a wide set of opinions about the release cycle of C++, some think that C++16 would be better, some think that every 5 years releasing a new standard is about right. Currently it seems that releasing a major standard followed by a minor standard in 2-3 years periods is the favored model.

One thing has changed since I started this series in 2013: today the proposals get a lot more attention. With isocpp publishing most of them even before the mailing, and others have followed the idea of listing a best of or an overview on papers they like.

Like the last series, I'd like to group the papers after the corresponding subgroups, I'm not sure if I have time to list all papers, so some postings might only contain the highlights. I'll start with Concurrency. You might also want to watch what Michael Wong thinks about the upcoming concurrency features in C++.

C++ Papers for Concurrency & Parallelism

The concurrency subgroup deals with parallelism and concurrency issues in standardizing C++. Threading, futures, executors, mutexes and many other features belong into this group. Also at CppCon at the last panel, there has been a great explanation for what concurrency and parallelism is actually: Concurrency is Basketball, while parallelism is track. So, Parallelism is the art to do the same thing in many parallel ways, while concurrency is to have parallel processes depend and communicate with each other.

This is the update to the current TS for Concurrency. The implementation of this TS is found under the namespace std::experimental::concurrency_v1. A technical specification is very detailed in the features of a certain field, so this document contains the code of the headers for concurrency and the corresponding definitions. Currently that is the header <future>, which will contain a lot more functionality in C++17, such as:

changes to future/shared_future

async

when_all

when_any

when_any_back

make_ready_future

make_exceptional_future

This paper is obviously about atomics. C++11 has brought atomics to the standard, this paper discusses the current issues with atomics, and tries to find a solution for some of them. The current issues include things like:

uninitialized state

structs comparing equal

compatibility with C

The first point is about non trivial default constructors and atomic: the standard also requires them to have an "uninitialized state", so that in the current implementation the default constructor is never called to achieve this. This was put into the standard to achieve compatibility with C. The second point addresses atomic_compare_exchange, and how to define that two compared structs are equal, part of this problem are possible padding bits added to a struct. The last point states that the committee wants to keep compatibility with C for atomics.

This is an update on Resumable Functions, a planned language feature for maybe C++17. Resumable functions need to build up on a lot of things, which are yet not in the standard, this paper discusses mostly a possible backend for resumable functions. So, this paper tries to answer the question, how to implement a driver for resumable functions.

The paper also contains the implementation expierence from Microsoft, which already has a working extension for resumable functions. The paper proposes the await and yield keywords/operators and an await-for. The authors define a resumable function as:

A function or a lambda is called resumable function or resumable lambda if a body of the function or lambda contains at least one suspend/resume point. Suspend/resume points are expressions with one or more await operators, yield statements or await-for statements. From this point on, we will use the term resumable function to refer to either resumable lambda or resumable function.

So, resumable functions are now also extending to lambdas, the paper has a lot of details about the needed mechanics for getting resumable functions right. I hope that this feature gets into C++17, but its going to be really difficult, as its needs to build up on other proposals, which are yet not in the C++ standard.

This paper deals with the challenges of concurrency in C, and is a draft of an actual academic paper (this format is rather rare for proposals). It mainly deals with the differences in the memory models of C and C++.

The motivation for this proposal comes from the HPC field, where working with very large arrays is a common thing. The author proposes to add support for such datastructures to the atomics section of the C++ Standard. Parallel algorithms executed on these arrays need to be able to guard sections against changes. This could be implemented with a atomic_array<T> class, which guards changes to the array through atomics. The paper presents a possible interface for this type.

The 4th revision on this basic building block for concurrency. The paper tries to define a simple framework for task execution. Executors define how a work item will be executed, for example there is a std::thread_pool_executor. The paper lists the following executors, each executes a work item differently:

thread_per_task_executor

Spawns a new thread for each executed item

thread_pool_executor

Items are executed in a thread pool

loop_executor

an executor which collects work items, and executes them when a call to loop, run_queued_closures or try_run_one_closure occurs.

serial_executor

all work items are executed serially.

system_executor

usually a global (singleton) executor which behaves like a thread pool. This is also the default executor .



The paper continues with function_wrapper, a needed class, as std::function does not provide support for moveable only types. Which for example would hinder the usage of packaged_task in such a context. Also a few thoughts on how a type erased interface for executors should look like are presented in the paper.

This paper defines the semantics of these lighter-weight modes of execution by defining certain kinds of execution agents (EAs). An EA is a mechanism that executes a particular thread of execution 1. There is a one-to-one correspondence between an EA and a thread of execution. std::thread is a simple way to create and manage a certain type of EA. The entity that is a running thread (i.e., what was created by std::thread) is an EA. See N4321 for a more detailed explanation of this terminology (and of the existing terminology in the standard).

Currently, there is proposed that a parallel algorithm collects exceptions in an exception_list, but if only one exception occurs, this is an overhead:

If an algorithm invocation throws only a single exception, then it should be allowed to propagate the singleton exception directly instead of returning it wrapped in an exception_list.

With C++11 smart pointers came into the C++ Standard, with C++14 make_unique enables C++ to be written without direct usage of new or delete. Smart pointers should hold dynamic allocations in C++, but yet in lock-free code the usage of smart pointers is not possible. The paper aims to provide a atomic version for the standard smart pointers:

atomic_unique_ptr

atomic_shared_ptr

atomic_weak_ptr

This is an alternative to std::atomic<unique/shared/weak_ptr<T> >, which the SG1 has decided against specializing std::atomic for the smart pointers. There are several arguments for this, the most important one seems to be, that the smart pointers don't always meet all requirements for std::atomic.

This paper tries to refine N4071, and mainly adds the transform reduce algorithm to it:

template<typename InIter, typename T, typename Reduce, typename Convert> T transform_reduce(InIter first, InIter last, T init, Reduce red_op, Convert conv_op)

The paper shows a short usage example:

double result = std::experimental::parallel::transform_reduce(

std::experimental::parallel::par,

std::begin(values),

std::end(values),

0.0,

std::plus<double>(),

[](Point r)

{ return r.x * r.y; });

This is a paper on SIMD semantics, it proposes a vector type which holds the array for SIMD operations. The paper relies to the Vc SIMD Library. The SIMD related papers provide an interesting overview on how to design a SIMD Library, yet it seems a long way till this is in a form that could make it into the standard IMHO.

This paper deals with how to design a mask type for SIMD:

This paper describes a template class for portable SIMD Mask types. Most importantly it shows how conditional code can be expressed with SIMD types. Different variants of a syntax for write-masking will be discussed.

It is proposed that waiting operations should be provided by way of synchronic objects that implement the atomic concept and are extended with synchronic operations on the underlying type. For this the template std::synchronic<T> is invented, which offers the 4 methods:

void store

T load_when_not_equal

T load_when_equal

void expect_update

This paper aims to add latches and barriers to the C++ Standard. std::latch, barrier and flex_barrier are proposed for the standard. The paper defines 3 concepts for this:

ArriveAndWaitable

arrive_and_wait()

Latch

arrive() wait() count_down(N)

Barrier arrive_and_wait() arrive_and_drop



Latch and Barrier both build up on ArriveAndWaitable.

The authors state that memory_order_consume seems to be the most obscure member in the C11 and C++11 memory_order enum. The authors discuss the best possible implementation for memory_order_consume, and why its not replaceable with memory_order_aquire, which has the overhead of fencing. Yet, no implementation has a efficient implementation of memory_order_consume, which the authors of this paper would like to change.

This paper deals with Out of thin Air (OOTA) values in the memory model.

This paper aims at adding two new concepts to "Latches and Barriers in C++" (4204):



self-destroying latches



flex-latch



So this is mainly an addition to N4204.



When talking about concurrency and parallelism there are many terms involved. Yet, often it is not clear how to define what such a term means, this paper aims at filling this hole. The term thread is ambiguous, but the paper gives definitions for the following terms:

thread of execution

std::thread

thread -> thread of execution

execution agent

Further the paper looks at how those terms are defined in WG14 (ISO C Standard).

This paper unifies two different lines of proposals: coroutines and resumable functions in a stackless fashion. The stackless is important, as otherwise on modern systems the creation of coroutines would be limited too much. The authors aim at an nearly unlimited number (billions) of possible coroutines on a system. For resumable functions, stackless coroutines are a possible implementation, a very elegant one. This proposal is the vision how stackless coroutines in the background could drive resumable functions.

The aim is to add a vector programming extension to C++. The proposal builds up on Intel Cilk and OpenMP 4.0, but favors the keyword based approach versa the pragma based version of OpenMP. The paper starts with describing the C++ constructs used, and which constraints apply to them (f.e. counted loops are for or ranged for loops). The paper proposes 3 main changes to the language:

array notations (in Part II, not yet included afaik)

SIMD Loops

SIMD Functions

A SIMD Loop is a simple for loop, which has the keyword simd added: for simd(...), the paper has no example code to clarify this. The authors plan to add simd as a new, context dependent keyword (like f.e. override), a SIMD enabled function could look like this:

void vec_add (float *r, float *op1, float *op2, int k) simd(uniform(r,op1,op2) linear(k:1)) simd{ r[k] = op1[k] + op2[k];}

The functions body is marked as simd and there is a block describing which variables have which role. This paper I think is a good step forward into getting SIMD into the standard, but its still at a very early stage.

This paper is actually not a proposal, but rather aims at establishing an overview over vector parallelism (SIMD), to enable further discussions. The paper describes 3 different models of execution for vector parallelism:

lockstep execution

wavefront execution

explicit-barrier execution

C++14 has added a new mutex type into C++: std::shared_timed_mutex. This paper now contains the wording for adding a new mutex type to the C++ standard: std::shared_mutex.

This document rivals the previous Executors implementation in N4143, the authors claim that

"this framework(N4143) is built around some deliberate design choices that make it unsuited to the execution of fine grained tasks and, in particular, asynchronous operations. Primary among these choices is the use of abstract base classes and type erasure, but it is not the only such issue. The sum effect of these choices is that the framework is unable to exploit the full potential of the C++ language."

This is actually interesting, this proposal shows an alternative approach for implementing executors and schedulers. The authors present "an alternative executors design that uses a lightweight, template-based policy approach." The already available implementation has reimplemented concepts from boost::asio with C++14.

This is a very interesting paper. It expresses concerns, that the paper N4232 leaves out stackless coroutines, which are still needed. The author presents a possible implementation of stackless coroutines as resumable lambdas. Those seem to share yield but not await with resumable functions.

A simple generator as a resumable lambda could look like this:

auto g = [n = int(10)]() resumable { std::cout << "Counting down from " << n << "

"; while (n > 0) { if(n == 1) return n; yield n; n--; } }

The execution would yield at yield. if the execution would reach the end of the lambda the implementation would throw a std::stop_iteration exception. This can be prevented by returning a value from the lambda.

Continue reading part 2: Proposals from Core, Modules, Networking, Reflection and undefined behavior

Join the Meeting C++ patreon community!

This and other posts on Meeting C++ are enabled by my supporters on patreon!