-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Do not store last event for in-order queues #18277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ce2652e
to
dac0398
Compare
dac0398
to
375b895
Compare
a7b84a1
to
ce4ac8a
Compare
ce4ac8a
to
7ce77ca
Compare
7ce77ca
to
39c5740
Compare
39c5740
to
1e2bf93
Compare
e94b095
to
e6158b5
Compare
e6158b5
to
156335d
Compare
unless Host Tasks are used. Without Host Tasks, we can just rely on UR for ordering. Having no last event means that ext_oneapi_get_last_event() needs to submit a barrier to return an event to the user. Similarly, ext_oneapi_submit_barrier() now always submits a barrier, even for in-order queues. Whenever Host Tasks are used we need to start recording all events. This is needed because of how kernel submission synchronizes with Host Tasks. With a following scenario: q.host_task(); q.submit_kernel(); q.host_task(): The kernel won't even be submitted to UR until the first Host Task completes. To properly synchronize the second Host Task we need to keep the event describing kernel submission.
For opencl, always store the last event to support queue_empty(), just don't use it for synchronization
@intel/llvm-gatekeepers I belive this is ready to be merged. I'm not really happy about having to introduce a separate path for opencl (for queue_empty()) but I'll address it in separate PR. |
@intel/llvm-gatekeepers could you please merge this? |
We still need a review from @intel/sycl-graphs-reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins precommit is a known infrastructural issue. |
unless Host Tasks are used. Without Host Tasks, we can just rely on UR for ordering. Having no last event means that ext_oneapi_get_last_event() needs to submit a barrier to return an event to the user. Similarly, ext_oneapi_submit_barrier() now always submits a barrier, even for in-order queues. Whenever Host Tasks are used we need to start recording all events. This is needed because of how kernel submission synchronizes with Host Tasks. With a following scenario: q.host_task(); q.submit_kernel(); q.host_task(): The kernel won't even be submitted to UR until the first Host Task completes. To properly synchronize the second Host Task we need to keep the event describing kernel submission.
Optimizes the `enqueue()` function of sycl graphs to bypass the scheduler whenever possible and avoid creating events when not needed. * Refactors the executable graph `enqueue()` to have different paths depending on workload: * The direct path will be used when there are no host-tasks or accessor requirements in the graph and the execution dependencies are considered safe to bypass the scheduler. * The scheduler path will be used when there are requirements in the graph but no host-tasks or, if the execution dependencies require using the scheduler. * The multiple partitions path will be used when the graph contains `host-tasks` which requires scheduling multiple graph partitions. The implementation was also changed to avoid adding unnecessary event dependencies to partition executions and avoiding copying `CGData` when possible. * Extends the changes in #18277 to sycl graphs. This means that no implicit events will be created when using in-order queues and graphs without `host-tasks`. Also updates the handler to only request events from the graph `enqueue()` when they are needed.
Optimizes the `enqueue()` function of sycl graphs to bypass the scheduler whenever possible and avoid creating events when not needed. * Refactors the executable graph `enqueue()` to have different paths depending on workload: * The direct path will be used when there are no host-tasks or accessor requirements in the graph and the execution dependencies are considered safe to bypass the scheduler. * The scheduler path will be used when there are requirements in the graph but no host-tasks or, if the execution dependencies require using the scheduler. * The multiple partitions path will be used when the graph contains `host-tasks` which requires scheduling multiple graph partitions. The implementation was also changed to avoid adding unnecessary event dependencies to partition executions and avoiding copying `CGData` when possible. * Extends the changes in intel/llvm#18277 to sycl graphs. This means that no implicit events will be created when using in-order queues and graphs without `host-tasks`. Also updates the handler to only request events from the graph `enqueue()` when they are needed.
Optimizes the `enqueue()` function of sycl graphs to bypass the scheduler whenever possible and avoid creating events when not needed. * Refactors the executable graph `enqueue()` to have different paths depending on workload: * The direct path will be used when there are no host-tasks or accessor requirements in the graph and the execution dependencies are considered safe to bypass the scheduler. * The scheduler path will be used when there are requirements in the graph but no host-tasks or, if the execution dependencies require using the scheduler. * The multiple partitions path will be used when the graph contains `host-tasks` which requires scheduling multiple graph partitions. The implementation was also changed to avoid adding unnecessary event dependencies to partition executions and avoiding copying `CGData` when possible. * Extends the changes in intel/llvm#18277 to sycl graphs. This means that no implicit events will be created when using in-order queues and graphs without `host-tasks`. Also updates the handler to only request events from the graph `enqueue()` when they are needed.
unless Host Tasks are used.
Without Host Tasks, we can just rely on UR for ordering. Having no last event means that ext_oneapi_get_last_event() needs to submit a barrier to return an event to the user. Similarly, ext_oneapi_submit_barrier() now always submits a barrier, even for in-order queues.
Whenever Host Tasks are used we need to start recording all events. This is needed because of how kernel submission synchronizes with Host Tasks. With a following scenario:
q.host_task();
q.submit_kernel();
q.host_task():
The kernel won't even be submitted to UR until the first Host Task completes. To properly synchronize the second Host Task we need to keep the event describing kernel submission.