Skip to content

Commit b2bf5b7

Browse files
authored
A few fixes for the R5 revision, based on Ruslan's feedback. (#28)
1 parent 47376af commit b2bf5b7

File tree

1 file changed

+29
-41
lines changed

1 file changed

+29
-41
lines changed

paper_framework_sources/p2079_system_execution_context.bs

Lines changed: 29 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Editor: Lee Howes, lwh@fb.com
1111
Lucian Radu Teodorescu, lucteo@lucteo.ro
1212
Audience: SG1, LEWG
1313
URL: http://wg21.link/P2079R4
14-
Abstract: A standard execution context based on the facilities in [[P2300R9]] that implements parallel-forward-progress to
14+
Abstract: A standard execution context based on the facilities in [[P2300R10]] that implements parallel-forward-progress to
1515
maximise portability. A set of <code>system_context</code>`s share an underlying shared thread pool implementation, and may
1616
provide an interface to an OS-provided system thread pool.
1717
Markup Shorthands: markdown yes
@@ -52,15 +52,15 @@ Markup Shorthands: markdown yes
5252
- First revision
5353

5454
# Introduction # {#introduction}
55-
[[P2300R9]] describes a rounded set of primitives for asynchronous and parallel execution that give a firm grounding for the future.
55+
[[P2300R10]] describes a rounded set of primitives for asynchronous and parallel execution that give a firm grounding for the future.
5656
However, the paper lacks a standard execution context and scheduler.
5757
It has been broadly accepted that we need some sort of standard scheduler.
5858

5959
As part of [[P3109R0]], `system_context` was voted as a must-have for the initial release of senders/receivers.
6060
It provides a convenient and scalable way of spawning concurrent work for the users of senders/receivers.
6161

6262
As noted in [[P2079R1]], an earlier revision of this paper, the `static_thread_pool` included in later revisions of [[P0443R14]] had many shortcomings.
63-
This was removed from [[P2300R9]] based on that and other input.
63+
This was removed from [[P2300R10]] based on that and other input.
6464

6565
One of the biggest problems with local thread pools is that they lead to CPU oversubscription.
6666
This introduces a performance problem for complex systems that are composed from many independent parts.
@@ -72,10 +72,9 @@ This can create several problems:
7272
* oversubscription because of different thread pools
7373
* problems with nested parallel loops (one parallel loop is called from the other)
7474
* problems related to interaction between different parallel engines
75-
* other performance problems
7675
* etc.
7776

78-
To solve these problems we propose a shared parallel execution context that:
77+
To solve these problems we propose a parallel execution context that:
7978
* can be shared between multiple parts of the application
8079
* does not suffer from oversubscription
8180
* can integrate with the OS scheduler
@@ -84,7 +83,6 @@ To solve these problems we propose a shared parallel execution context that:
8483
## Design overview ## {#design_overview}
8584

8685
The system context is a parallel execution context of undefined size, supporting explicitly *parallel forward progress*.
87-
By requiring only parallel forward progress, any created parallel context is able to be a view onto the underlying shared global context. (TODO: this phrase is not quite clear; we should probably remove)
8886

8987
The execution resources of the system context are envisioned to be shared across all binaries in the same process.
9088
System scheduler works best with CPU-intensive workloads, and thus, limiting oversubscription is a key goal.
@@ -104,15 +102,14 @@ Other key concerns of this design are:
104102
# Examples # {#examples}
105103
As a simple parallel scheduler we can use it locally, and `sync_wait` on the work to make sure that it is complete.
106104
With forward progress delegation this would also allow the scheduler to delegate work to the blocked thread.
107-
This example is derived from the Hello World example in [[P2300R9]]. Note that it only adds a well-defined context
105+
This example is derived from the Hello World example in [[P2300R10]]. Note that it only adds a well-defined context
108106
object, and queries that for the scheduler.
109107
Everything else is unchanged about the example.
110108

111109
```cpp
112-
using namespace std::execution;
110+
using namespace = std::execution;
113111

114-
system_context ctx;
115-
scheduler auto sch = ctx.scheduler();
112+
scheduler auto sch = get_system_scheduler();
116113

117114
sender auto begin = schedule(sch);
118115
sender auto hi = then(begin, []{
@@ -121,32 +118,33 @@ sender auto hi = then(begin, []{
121118
});
122119
sender auto add_42 = then(hi, [](int arg) { return arg + 42; });
123120

124-
auto [i] = this_thread::sync_wait(add_42).value();
121+
auto [i] = std::this_thread::sync_wait(add_42).value();
125122
```
126123

127-
We can structure the same thing using `execution::on`, which better matches structured concurrency:
124+
We can structure the same thing using `on`, which better matches structured concurrency:
128125
```cpp
129126
using namespace std::execution;
130127

131-
system_context ctx;
132-
scheduler auto sch = ctx.scheduler();
128+
scheduler auto sch = get_system_scheduler();
133129

134130
sender auto hi = then(just(), []{
135131
std::cout << "Hello world! Have an int.";
136132
return 13;
137133
});
138134
sender auto add_42 = then(hi, [](int arg) { return arg + 42; });
139135

140-
auto [i] = this_thread::sync_wait(on(sch, add_42)).value();
136+
auto [i] = std::this_thread::sync_wait(on(sch, add_42)).value();
141137
```
142138

143139
The `system_scheduler` customises `bulk`, so we can use `bulk` dependent on the scheduler.
144140
Here we use it in structured form using the parameterless `get_scheduler` that retrieves the scheduler from the receiver, combined with `on`:
145141
```cpp
142+
using namespace std::execution;
143+
146144
auto bar() {
147145
return
148-
ex::let_value(
149-
ex::get_scheduler(), // Fetch scheduler from receiver.
146+
let_value(
147+
read_env(get_scheduler), // Fetch scheduler from receiver.
150148
[](auto current_sched) {
151149
return bulk(
152150
current_sched.schedule(),
@@ -159,13 +157,9 @@ auto bar() {
159157

160158
void foo()
161159
{
162-
using namespace std::execution;
163-
164-
system_context ctx;
165-
166-
auto [i] = this_thread::sync_wait(
160+
auto [i] = std::this_thread::sync_wait(
167161
on(
168-
ctx.scheduler(), // Start bar on the system_scheduler
162+
get_system_scheduler(), // Start bar on the system_scheduler
169163
bar())) // and propagate it through the receivers
170164
.value();
171165
}
@@ -178,13 +172,11 @@ In this case we assume it has no threads of its own and has to take over the mai
178172
```cpp
179173
using namespace std::execution;
180174

181-
system_context ctx;
182-
183175
int result = 0;
184176

185177
{
186178
async_scope scope;
187-
scheduler auto sch = ctx.scheduler();
179+
scheduler auto sch = get_system_scheduler();
188180

189181
sender auto work =
190182
then(just(), [&](auto sched) {
@@ -215,7 +207,7 @@ int result = 0;
215207
terminal_scope.spawn(
216208
scope.on_empty() | then([](my_os::exit(ctx))));
217209
my_os::drive(ctx);
218-
this_thread::sync_wait(terminal_scope);
210+
std::this_thread::sync_wait(terminal_scope);
219211
};
220212

221213
// The scope ensured that all work is safely joined, so result contains 13
@@ -249,23 +241,19 @@ public:
249241

250242
class <i>impl-defined-system_sender</i> { // exposition only
251243
public:
252-
friend pair&lt;std::execution::system_scheduler, delegatee_scheduler> tag_invoke(
253-
std::execution::get_completion_scheduler_t&lt;set_value_t>,
254-
const system_scheduler&) noexcept;
255-
friend pair&lt;std::execution::system_scheduler, delegatee_scheduler> tag_invoke(
256-
std::execution::get_completion_scheduler_t&lt;set_stopped_t>,
257-
const system_scheduler&) noexcept;
244+
system_scheduler query(get_completion_scheduler_t&lt;set_value_t>) const noexcept;
245+
system_scheduler query(get_completion_scheduler_t&lt;set_stopped_t>) const noexcept;
258246

259247
template&lt;receiver R>
260-
requires receiver_of<R>
248+
requires receiver_of&lt;R>
261249
<i>impl-defined-operation_state</i> connect(R&&) && noexcept(std::is_nothrow_constructible_v&lt;std::remove_cvref_t&lt;R>, R>);
262250
};
263251
</pre>
264252

265253
- `get_system_scheduler()` returns a scheduler that provides a view on some underlying execution context supporting *parallel forward progress*, with at least one thread of execution (which may be the main thread).
266254
- two objects returned by `get_system_scheduler()` may share the same execution context.
267255
If work submitted by one can consume the underlying thread pool, that can block progress of another.
268-
- if `Sch` is the type of objects returned by `get_system_scheduler()`, then:
256+
- if `Sch` is the type of object returned by `get_system_scheduler()`, then:
269257
- `Sch` is implementation-defined, but must be nameable.
270258
- `Sch` models the `scheduler` concept.
271259
- `Sch` implements the `get_forward_progress_guarantee` query to return `parallel`.
@@ -403,13 +391,13 @@ The paper considers compile-time replaceability as not being a valid option beca
403391
design principles of a `system_context`, i.e. having one, shared, application-wide execution context, which avoids
404392
oversubscription.
405393

406-
Replaceability is also part of the [[P2900R7]] proposal for the contract-violation handler.
394+
Replaceability is also part of the [[P2900R8]] proposal for the contract-violation handler.
407395
The paper proposes that whether the handler is replaceable to be implementation-defined.
408396
If an implementation chooses to support replaceability, it shall be done similar to replacing the global `operator new` and `operator delete` (link-time replaceability).
409397

410398
The feedback we received from Microsoft, is that they are not interested in supporting replaceability on their platforms.
411399
They would prefer that we offer implementations an option to not implement replaceabiilty.
412-
Moreover, for systems for which replaceability is supported they would prefer to make the replaceabiilty mechanism to be implementation defined.
400+
Moreover, for systems where replaceability is supported they would prefer to make the replaceabiilty mechanism to be implementation defined.
413401

414402
The authors disagree with the idea that replaceability is not needed for Windows platforms (or other platforms that provide an OS scheduler).
415403
The OS sheduler is optimized for certain workloads, and it's not the best choice for all workloads.
@@ -426,9 +414,9 @@ However, in accordance with the feedback, the paper proposes the following:
426414
* the replaceability mechanism (if the implementation decides to support it), including the interfaces that a backend should implement is implementation-defined.
427415

428416
During the development of this paper, we received constant feedback that the replaceability mechanism should be standardized, even if we standardize just the interfaces that a backend needs to implement (leaving the replaceability mechanism to be implementation-defined).
429-
However, as time went by, more and more people agreed that standardizing an API for replaceability is problematic.
417+
However, as time went by, more and more people think that agreeing on the same replaceability API shape is going to be problematic.
430418
Here are a few reasons why:
431-
* Different standard library vendors have different needs; if the replaceability API is too generic to cover all the needs, we compromize on performance. Example:
419+
* Different standard library vendors might have different needs; if the replaceability API is too generic to cover all the needs, we compromize on performance. Example:
432420
* for a simple `schedule` operation, some implementations would want cancellation in the backend, some would not (cancellation is better to be handled in the frontend).
433421
* including cancellation in the replaceability API would satisfy the needs for those who want cancellation but that would add extra performance penalties
434422
* in general, including a runtime environment in the backend may be costly, and some implementations may not need it
@@ -543,7 +531,7 @@ implementation-defined-system_scheduler get_scheduler(priority_t priority);
543531

544532
This approach would offer priorities at scheduler granularity and apply to large sections of a program at once.
545533

546-
The other approach, which matches the receiver query approach taken elsewhere in [[P2300R9]] is to add a `get_priority()` query on the receiver, which, if available, passes a priority to the scheduler in the same way that we pass an `allocator` or a `stop_token`.
534+
The other approach, which matches the receiver query approach taken elsewhere in [[P2300R10]] is to add a `get_priority()` query on the receiver, which, if available, passes a priority to the scheduler in the same way that we pass an `allocator` or a `stop_token`.
547535
This would work at task granularity, for each `schedule()` call that we connect a receiver to we might pass a different priority.
548536

549537
In either case we can add the priority in a separate paper.
@@ -561,7 +549,7 @@ A few key points of the implementation:
561549
* Uses preallocated storage on the host side, so that the default implementation doesn't need to allocate memory on the heap when adding new work to `system_scheduler`.
562550
* Guarantees a lifetime of at least the duration of `main()`.
563551
* As the default implementation is created outside of the host part, it can be shared between multiple binaries in the same process.
564-
* uses `libdispatch` on MacOS; uses a `static_thread_pool`-based implementation as a default on other platforms.
552+
* uses a `static_thread_pool`-based implementation as a default on generic platforms (we have a patch that uses `libdispatch` as default implementaiton on MacOS; as the time of writing this paper revision, the patch is not yet merged on the mainline).
565553

566554
## Addressing received feedback ## {#addressing_feedback}
567555

0 commit comments

Comments
 (0)