To address these temporary challenges and build toward the future, C++ must lay a foundation for controlling program execution. First, C++ must provide flexible facilities to control where and when work happens. This paper proposes a design for those facilities. After much discussion and collaboration , SG1 adopted this design by universal consensus at the Cologne meeting in 2019.

When we imagine the future of C++ programs, we envision elegant compositions of networked, asynchronous parallel computations accelerated by diverse hardware, ranging from tiny mobile devices to giant supercomputers. In the present, hardware diversity is greater than ever, but C++ programmers lack satisfying parallel programming tools for them. Industrial-strength concurrency primitives like std::thread and std::atomic are powerful but hazardous. std::async and std::future suffer from well-known problems. And the standard algorithms library, though parallelized, remains inflexible and non-composable.

This proposal defines requirements for two key components of execution: a work execution interface and a representation of work and their interrelationships. Respectively, these are executors and senders and receivers :

1.3 Executors Execute Work

As lightweight handles, executors impose uniform access to execution contexts.

Executors provide a uniform interface for work creation by abstracting underlying resources where work physically executes. The previous code example’s underlying resource was a thread pool. Other examples include SIMD units, GPU runtimes, or simply the current thread. In general, we call such resources execution contexts. As lightweight handles, executors impose uniform access to execution contexts. Uniformity enables control over where work executes, even when it is executed indirectly behind library interfaces.

The basic executor interface is the execute function through which clients execute work:

// obtain an executor executor auto ex = ... auto ex = ... // define our work as a nullary invocable invocable auto work = []{ cout << "My work" << endl; }; // execute our work via the execute customization point execute (ex, work); (ex, work);

On its own, execute is a primitive “fire-and-forget”-style interface. It accepts a single nullary invocable, and returns nothing to identify or interact with the work it creates. In this way, it trades convenience for universality. As a consequence, we expect most programmers to interact with executors via more convenient higher-level libraries, our envisioned asynchronous STL being such an example.

Consider how std::async could be extended to interoperate with executors enabling client control over execution:

template future > async(const Executor& ex, F&& f, Args&&... args) { // package up the work packaged_task work(forward (f), forward (args)...); // get the future auto result = work.get_future(); // execute work on the given executor execute (ex, move(work)); execution::(ex, move(work)); return result; }

The benefit of such an extension is that a client can select from among multiple thread pools to control exactly which pool std::async uses simply by providing a corresponding executor. Inconveniences of work packaging and submission become the library’s responsibility.

Authoring executors. Programmers author custom executor types by defining a type with an execute function. Consider the implementation of an executor whose execute function executes the client’s work “inline”:

struct inline_executor { // define execute template execute (F&& f) const noexcept { void(F&& f) const noexcept { std::invoke(std::forward (f)); } // enable comparisons auto operator<=>(const inline_executor&) const = default; };

Additionally, a comparison function determines whether two executor objects refer to the same underlying resource and therefore execute with equivalent semantics. Concepts executor and executor_of summarize these requirements. The former validates executors in isolation; the latter, when both executor and work are available.

Executor customization can accelerate execution or introduce novel behavior. The previous example demonstrated custom execution at the granularity of a new executor type, but finer-grained and coarser-grained customization techniques are also possible. These are executor properties and control structures, respectively.

Executor properties communicate optional behavioral requirements beyond the minimal contract of execute , and this proposal specifies several. We expect expert implementors to impose these requirements beneath higher-level abstractions. In principle, optional, dynamic data members or function parameters could communicate these requirements, but C++ requires the ability to introduce customization at compile time. Moreover, optional parameters lead to combinatorially many function variants.

Instead, statically-actionable properties factor such requirements and thereby avoid a combinatorial explosion of executor APIs. For example, consider the requirement to execute blocking work with priority. An unscalable design might embed these options into the execute interface by multiplying individual factors into separate functions: execute , blocking_execute , execute_with_priority , blocking_execute_with_priority , etc.

Executors avoid this unscalable situation by adopting P1393’s properties design based on require and prefer :

// obtain an executor executor auto ex = ...; auto ex = ...; // require the execute operation to block executor auto blocking_ex = std::require(ex, execution:: blocking . always ); auto blocking_ex = std::require(ex, execution::); // prefer to execute with a particular priority p executor auto blocking_ex_with_priority = std::prefer(blocking_ex, execution:: priority (p)); auto blocking_ex_with_priority = std::prefer(blocking_ex, execution::(p)); // execute my blocking, possibly prioritized work execute (blocking_ex_with_priority, work); execution::(blocking_ex_with_priority, work);

Each application of require or prefer transforms an executor into one with the requested property. In this example, if ex cannot be transformed into a blocking executor, the call to require will fail to compile. prefer is a weaker request used to communicate hints and consequently always succeeds because it may ignore the request.

Consider a version of std::async which never blocks the caller:

executor E, class F, class... Args> template auto really_async(const E& ex, F&& f, Args&&... args) { using namespace execution; // package up the work packaged_task work(forward (f), forward (args)...); // get the future auto result = work.get_future(); // execute the nonblocking work on the given executor execute (require(ex, blocking . never ), move(work)); (require(ex,), move(work)); return result; }

Such an enhancement could address a well-known hazard of std::async :

// confusingly, always blocks in the returned but discarded future's destructor std::async(foo); // *never* blocks really_async(foo);

Control structures permit customizations at a higher level of abstraction by allowing executors to “hook” them and is useful when an efficient implementation is possible on a particular execution context. The first such control structure this proposal defines is bulk_execute , which creates a group of function invocations in a single operation. This pattern permits a wide range of efficient implementations and is of fundamental importance to C++ programs and the standard library.

By default, bulk_execute invokes execute repeatedly, but repeatedly executing individual work items is inefficient at scale. Consequently, many platforms provide APIs that explicitly and efficiently execute bulk work. In such cases, a custom bulk_execute avoids inefficient platform interactions via direct access to these accelerated bulk APIs while also optimizing the use of scalar APIs.

bulk_execute receives an invocable and an invocation count. Consider a possible implementation:

struct simd_executor : inline_executor { // first, satisfy executor requirements via inheritance template bulk_execute (F f, size_t n) const { simd_sender(F f, size_t n) const { #pragma simd for(size_t i = 0; i != n; ++i) { std::invoke(f, i); } return {}; } };

To accelerate bulk_execute , simd_executor uses a SIMD loop.

bulk_execute should be used in cases where multiple pieces of work are available at once:

template void my_for_each(const Executor& ex, F f, Range rng) { // request bulk execution, receive a sender sender auto s = execution:: bulk_execute (ex, [=](size_t i) { auto s = execution::(ex, [=](size_t i) { f(rng[i]); }, std::ranges::size(rng)); // initiate execution and wait for it to complete sync_wait (s); execution::(s); }