You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[[P2300R9]] describes a rounded set of primitives for asynchronous and parallel execution that give a firm grounding for the future.
55
+
[[P2300R10]] describes a rounded set of primitives for asynchronous and parallel execution that give a firm grounding for the future.
56
56
However, the paper lacks a standard execution context and scheduler.
57
57
It has been broadly accepted that we need some sort of standard scheduler.
58
58
59
59
As part of [[P3109R0]], `system_context` was voted as a must-have for the initial release of senders/receivers.
60
60
It provides a convenient and scalable way of spawning concurrent work for the users of senders/receivers.
61
61
62
62
As noted in [[P2079R1]], an earlier revision of this paper, the `static_thread_pool` included in later revisions of [[P0443R14]] had many shortcomings.
63
-
This was removed from [[P2300R9]] based on that and other input.
63
+
This was removed from [[P2300R10]] based on that and other input.
64
64
65
65
One of the biggest problems with local thread pools is that they lead to CPU oversubscription.
66
66
This introduces a performance problem for complex systems that are composed from many independent parts.
@@ -72,10 +72,9 @@ This can create several problems:
72
72
* oversubscription because of different thread pools
73
73
* problems with nested parallel loops (one parallel loop is called from the other)
74
74
* problems related to interaction between different parallel engines
75
-
* other performance problems
76
75
* etc.
77
76
78
-
To solve these problems we propose a shared parallel execution context that:
77
+
To solve these problems we propose a parallel execution context that:
79
78
* can be shared between multiple parts of the application
80
79
* does not suffer from oversubscription
81
80
* can integrate with the OS scheduler
@@ -84,7 +83,6 @@ To solve these problems we propose a shared parallel execution context that:
84
83
## Design overview ## {#design_overview}
85
84
86
85
The system context is a parallel execution context of undefined size, supporting explicitly *parallel forward progress*.
87
-
By requiring only parallel forward progress, any created parallel context is able to be a view onto the underlying shared global context. (TODO: this phrase is not quite clear; we should probably remove)
88
86
89
87
The execution resources of the system context are envisioned to be shared across all binaries in the same process.
90
88
System scheduler works best with CPU-intensive workloads, and thus, limiting oversubscription is a key goal.
@@ -104,15 +102,14 @@ Other key concerns of this design are:
104
102
# Examples # {#examples}
105
103
As a simple parallel scheduler we can use it locally, and `sync_wait` on the work to make sure that it is complete.
106
104
With forward progress delegation this would also allow the scheduler to delegate work to the blocked thread.
107
-
This example is derived from the Hello World example in [[P2300R9]]. Note that it only adds a well-defined context
105
+
This example is derived from the Hello World example in [[P2300R10]]. Note that it only adds a well-defined context
108
106
object, and queries that for the scheduler.
109
107
Everything else is unchanged about the example.
110
108
111
109
```cpp
112
-
using namespace std::execution;
110
+
using namespace = std::execution;
113
111
114
-
system_context ctx;
115
-
scheduler auto sch = ctx.scheduler();
112
+
scheduler auto sch = get_system_scheduler();
116
113
117
114
sender auto begin = schedule(sch);
118
115
sender auto hi = then(begin, []{
@@ -121,32 +118,33 @@ sender auto hi = then(begin, []{
- `get_system_scheduler()` returns a scheduler that provides a view on some underlying execution context supporting *parallel forward progress*, with at least one thread of execution (which may be the main thread).
266
254
- two objects returned by `get_system_scheduler()` may share the same execution context.
267
255
If work submitted by one can consume the underlying thread pool, that can block progress of another.
268
-
- if `Sch` is the type of objects returned by `get_system_scheduler()`, then:
256
+
- if `Sch` is the type of object returned by `get_system_scheduler()`, then:
269
257
- `Sch` is implementation-defined, but must be nameable.
270
258
- `Sch` models the `scheduler` concept.
271
259
- `Sch` implements the `get_forward_progress_guarantee` query to return `parallel`.
@@ -403,13 +391,13 @@ The paper considers compile-time replaceability as not being a valid option beca
403
391
design principles of a `system_context`, i.e. having one, shared, application-wide execution context, which avoids
404
392
oversubscription.
405
393
406
-
Replaceability is also part of the [[P2900R7]] proposal for the contract-violation handler.
394
+
Replaceability is also part of the [[P2900R8]] proposal for the contract-violation handler.
407
395
The paper proposes that whether the handler is replaceable to be implementation-defined.
408
396
If an implementation chooses to support replaceability, it shall be done similar to replacing the global `operator new` and `operator delete` (link-time replaceability).
409
397
410
398
The feedback we received from Microsoft, is that they are not interested in supporting replaceability on their platforms.
411
399
They would prefer that we offer implementations an option to not implement replaceabiilty.
412
-
Moreover, for systems for which replaceability is supported they would prefer to make the replaceabiilty mechanism to be implementation defined.
400
+
Moreover, for systems where replaceability is supported they would prefer to make the replaceabiilty mechanism to be implementation defined.
413
401
414
402
The authors disagree with the idea that replaceability is not needed for Windows platforms (or other platforms that provide an OS scheduler).
415
403
The OS sheduler is optimized for certain workloads, and it's not the best choice for all workloads.
@@ -426,9 +414,9 @@ However, in accordance with the feedback, the paper proposes the following:
426
414
* the replaceability mechanism (if the implementation decides to support it), including the interfaces that a backend should implement is implementation-defined.
427
415
428
416
During the development of this paper, we received constant feedback that the replaceability mechanism should be standardized, even if we standardize just the interfaces that a backend needs to implement (leaving the replaceability mechanism to be implementation-defined).
429
-
However, as time went by, more and more people agreed that standardizing an API for replaceability is problematic.
417
+
However, as time went by, more and more people think that agreeing on the same replaceability API shape is going to be problematic.
430
418
Here are a few reasons why:
431
-
* Different standard library vendors have different needs; if the replaceability API is too generic to cover all the needs, we compromize on performance. Example:
419
+
* Different standard library vendors might have different needs; if the replaceability API is too generic to cover all the needs, we compromize on performance. Example:
432
420
* for a simple `schedule` operation, some implementations would want cancellation in the backend, some would not (cancellation is better to be handled in the frontend).
433
421
* including cancellation in the replaceability API would satisfy the needs for those who want cancellation but that would add extra performance penalties
434
422
* in general, including a runtime environment in the backend may be costly, and some implementations may not need it
This approach would offer priorities at scheduler granularity and apply to large sections of a program at once.
545
533
546
-
The other approach, which matches the receiver query approach taken elsewhere in [[P2300R9]] is to add a `get_priority()` query on the receiver, which, if available, passes a priority to the scheduler in the same way that we pass an `allocator` or a `stop_token`.
534
+
The other approach, which matches the receiver query approach taken elsewhere in [[P2300R10]] is to add a `get_priority()` query on the receiver, which, if available, passes a priority to the scheduler in the same way that we pass an `allocator` or a `stop_token`.
547
535
This would work at task granularity, for each `schedule()` call that we connect a receiver to we might pass a different priority.
548
536
549
537
In either case we can add the priority in a separate paper.
@@ -561,7 +549,7 @@ A few key points of the implementation:
561
549
* Uses preallocated storage on the host side, so that the default implementation doesn't need to allocate memory on the heap when adding new work to `system_scheduler`.
562
550
* Guarantees a lifetime of at least the duration of `main()`.
563
551
* As the default implementation is created outside of the host part, it can be shared between multiple binaries in the same process.
564
-
* uses `libdispatch` on MacOS; uses a `static_thread_pool`-based implementation as a default on other platforms.
552
+
* uses a `static_thread_pool`-based implementation as a default on generic platforms (we have a patch that uses `libdispatch` as default implementaiton on MacOS; as the time of writing this paper revision, the patch is not yet merged on the mainline).
565
553
566
554
## Addressing received feedback ## {#addressing_feedback}
0 commit comments