Skip to content

[SYCL] Do not store last event for in-order queues #18277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 21, 2025

Conversation

igchor
Copy link
Member

@igchor igchor commented Apr 30, 2025

unless Host Tasks are used.

Without Host Tasks, we can just rely on UR for ordering. Having no last event means that ext_oneapi_get_last_event() needs to submit a barrier to return an event to the user. Similarly, ext_oneapi_submit_barrier() now always submits a barrier, even for in-order queues.

Whenever Host Tasks are used we need to start recording all events. This is needed because of how kernel submission synchronizes with Host Tasks. With a following scenario:

q.host_task();
q.submit_kernel();
q.host_task():

The kernel won't even be submitted to UR until the first Host Task completes. To properly synchronize the second Host Task we need to keep the event describing kernel submission.

@igchor igchor temporarily deployed to WindowsCILock April 30, 2025 21:28 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from ce2652e to dac0398 Compare May 1, 2025 01:00
@igchor igchor force-pushed the in_order_queue_no_event branch from dac0398 to 375b895 Compare May 1, 2025 01:01
@igchor igchor temporarily deployed to WindowsCILock May 1, 2025 01:01 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 1, 2025 01:49 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from a7b84a1 to ce4ac8a Compare May 1, 2025 20:44
@igchor igchor temporarily deployed to WindowsCILock May 1, 2025 21:22 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from ce4ac8a to 7ce77ca Compare May 1, 2025 21:41
@igchor igchor force-pushed the in_order_queue_no_event branch from 7ce77ca to 39c5740 Compare May 1, 2025 21:42
@igchor igchor temporarily deployed to WindowsCILock May 1, 2025 21:43 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 1, 2025 22:21 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from 39c5740 to 1e2bf93 Compare May 2, 2025 01:35
@igchor igchor temporarily deployed to WindowsCILock May 2, 2025 02:07 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 2, 2025 02:17 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 19, 2025 22:18 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from e94b095 to e6158b5 Compare May 19, 2025 22:49
@igchor igchor temporarily deployed to WindowsCILock May 19, 2025 22:49 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 19, 2025 23:17 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 19, 2025 23:17 — with GitHub Actions Inactive
@igchor igchor force-pushed the in_order_queue_no_event branch from e6158b5 to 156335d Compare May 19, 2025 23:36
@igchor igchor temporarily deployed to WindowsCILock May 19, 2025 23:36 — with GitHub Actions Inactive
igchor added 6 commits May 19, 2025 23:46
unless Host Tasks are used.

Without Host Tasks, we can just rely on UR for ordering.
Having no last event means that ext_oneapi_get_last_event()
needs to submit a barrier to return an event to the user.
Similarly, ext_oneapi_submit_barrier() now always submits
a barrier, even for in-order queues.

Whenever Host Tasks are used we need to start recording
all events. This is needed because of how kernel submission
synchronizes with Host Tasks. With a following scenario:

q.host_task();
q.submit_kernel();
q.host_task():

The kernel won't even be submitted to UR until the first
Host Task completes. To properly synchronize the second
Host Task we need to keep the event describing kernel submission.
For opencl, always store the last event to support queue_empty(),
just don't use it for synchronization
@igchor igchor temporarily deployed to WindowsCILock May 20, 2025 00:03 — with GitHub Actions Inactive
@igchor igchor temporarily deployed to WindowsCILock May 20, 2025 00:03 — with GitHub Actions Inactive
@igchor
Copy link
Member Author

igchor commented May 20, 2025

@intel/llvm-gatekeepers I belive this is ready to be merged.

I'm not really happy about having to introduce a separate path for opencl (for queue_empty()) but I'll address it in separate PR.

@igchor
Copy link
Member Author

igchor commented May 20, 2025

@intel/llvm-gatekeepers could you please merge this?

@sarnex
Copy link
Contributor

sarnex commented May 20, 2025

We still need a review from @intel/sycl-graphs-reviewers

Copy link
Contributor

@reble reble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steffenlarsen
Copy link
Contributor

Jenkins precommit is a known infrastructural issue.

@steffenlarsen steffenlarsen merged commit 7cac7a1 into intel:sycl May 21, 2025
34 of 37 checks passed
igchor added a commit to igchor/llvm that referenced this pull request May 21, 2025
unless Host Tasks are used.

Without Host Tasks, we can just rely on UR for ordering. Having no last
event means that ext_oneapi_get_last_event() needs to submit a barrier
to return an event to the user. Similarly, ext_oneapi_submit_barrier()
now always submits a barrier, even for in-order queues.

Whenever Host Tasks are used we need to start recording all events. This
is needed because of how kernel submission synchronizes with Host Tasks.
With a following scenario:

q.host_task();
q.submit_kernel();
q.host_task():

The kernel won't even be submitted to UR until the first Host Task
completes. To properly synchronize the second Host Task we need to keep
the event describing kernel submission.
uditagarwal97 pushed a commit that referenced this pull request Jun 18, 2025
Optimizes the `enqueue()` function of sycl graphs to bypass the
scheduler whenever possible and avoid creating events when not needed.

* Refactors the executable graph `enqueue()` to have different paths
depending on workload:
* The direct path will be used when there are no host-tasks or accessor
requirements in the graph and the execution dependencies are considered
safe to bypass the scheduler.
* The scheduler path will be used when there are requirements in the
graph but no host-tasks or, if the execution dependencies require using
the scheduler.
* The multiple partitions path will be used when the graph contains
`host-tasks` which requires scheduling multiple graph partitions. The
implementation was also changed to avoid adding unnecessary event
dependencies to partition executions and avoiding copying `CGData` when
possible.
* Extends the changes in #18277 to
sycl graphs. This means that no implicit events will be created when
using in-order queues and graphs without `host-tasks`. Also updates the
handler to only request events from the graph `enqueue()` when they are
needed.
github-actions bot pushed a commit to oneapi-src/unified-runtime that referenced this pull request Jun 19, 2025
Optimizes the `enqueue()` function of sycl graphs to bypass the
scheduler whenever possible and avoid creating events when not needed.

* Refactors the executable graph `enqueue()` to have different paths
depending on workload:
* The direct path will be used when there are no host-tasks or accessor
requirements in the graph and the execution dependencies are considered
safe to bypass the scheduler.
* The scheduler path will be used when there are requirements in the
graph but no host-tasks or, if the execution dependencies require using
the scheduler.
* The multiple partitions path will be used when the graph contains
`host-tasks` which requires scheduling multiple graph partitions. The
implementation was also changed to avoid adding unnecessary event
dependencies to partition executions and avoiding copying `CGData` when
possible.
* Extends the changes in intel/llvm#18277 to
sycl graphs. This means that no implicit events will be created when
using in-order queues and graphs without `host-tasks`. Also updates the
handler to only request events from the graph `enqueue()` when they are
needed.
kbenzie pushed a commit to oneapi-src/unified-runtime that referenced this pull request Jun 19, 2025
Optimizes the `enqueue()` function of sycl graphs to bypass the
scheduler whenever possible and avoid creating events when not needed.

* Refactors the executable graph `enqueue()` to have different paths
depending on workload:
* The direct path will be used when there are no host-tasks or accessor
requirements in the graph and the execution dependencies are considered
safe to bypass the scheduler.
* The scheduler path will be used when there are requirements in the
graph but no host-tasks or, if the execution dependencies require using
the scheduler.
* The multiple partitions path will be used when the graph contains
`host-tasks` which requires scheduling multiple graph partitions. The
implementation was also changed to avoid adding unnecessary event
dependencies to partition executions and avoiding copying `CGData` when
possible.
* Extends the changes in intel/llvm#18277 to
sycl graphs. This means that no implicit events will be created when
using in-order queues and graphs without `host-tasks`. Also updates the
handler to only request events from the graph `enqueue()` when they are
needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants