-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Graph] Command Graph PoC #7627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix merge conflict
Co-authored-by: Ronan Keryell <ronan@keryell.fr>
Changes to common code from #6 which has already been reviewed and merged into the `sycl-graph-record-replay` branch. This patch should not contain anything specific to the record and replay API.
- getSyclObjImpl and createSyclObjFromImpl support added - Minor renaming to enable this. - Adds basic results validation to dotp test - Minor fixes to address warnings etc.
* Add RUN lines to tests so that tests are run by LIT * clang-format existing tests, and other minor cleanups * Add `graph-explicit-reduction.cpp` which shows fail from #24 by using the `sycl::ext::oneapi::property::queue::lazy_execution` property on a queue which uses a reduction outwith the graph building API
Refactor the command_graph and node classes so that we interface with the impl types rather than user exposed types, and just the interface lives in the public facing headers. This change also means we can use a `.cpp` file for implementation code rather than being header only. The motivation for these changes was trying to get graph submission through a handler, at which point only the `sycl::detail::queue_impl` class is available rather than `sycl::queue`
Update API to match the spec change from #26 to execute a graph via the handler rather than queue submit. This spec update includes queue shortcut functions, which i've added a new test for.
* PI Minor version bump for new flag * Document new PI property as comments * Make value next consecutive bit `1 << 5`, rather than `1 << 11`.
What is the status of this? I'm pretty sure that you'd need some specific reviewers if/when it's ready :) |
I'd also suggest to split the final implementation into several PRs. I think something like this would be reasonable:
|
This is an experimental implementation of the basic features as described in #5626 |
Thanks for your feedback. Yes, we were planning to split the final implementation into multiple PRs. And the stages you are suggesting make sense from my perspective. Implementing the Explicit API on top of SYCL (stage 1) is possible and something I have done before as an initial prototype. This would be without any benefits in terms of performance. What we have right now is a mix of stage 1 + 2 as a PoC implementation. |
* Graph submission now properly creates a host visible event on the command list allowing auxilliary resources to be cleaned up * executeCommandList slightly modified to block execution only for command lists not allowed to be batched.
- Enable submitting a sub-graph as part of a larger command_graph - Flag added to queue_impl to enable graph to be aware it is a sub-graph and delay flush - Adds an example whichuses a subgraph in the middle of a command_graph
[SYCL] handler::ext_oneapi_graph Update to reflect changes from #65 - In line with recent spec changes, rename handler and queue shortcut functions from exec_graph to ext_oneapi_graph - Also updated usage in the examples Co-authored-by: Ewan Crawford <ewan@codeplay.com>
- Add some unit tests for the command graph POC -Add missing specializations for lazy queue property
Adds the `sycl::property_list` to the constructor of `command_graph<modifiable>()` and `finalize()` to match spec change #67
Co-authored-by: Ben Tracy <ben.tracy@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good start, but I have a couple of initial comments.
Additionally, please adhere to the LLVM Coding Standard when it does not conflict with the style of the surrounding code. We have some places where this has slipped, but for new files it would be good to stay true to the coding standard, especially naming.
|
||
namespace detail { | ||
struct queue_impl; | ||
using queue_ptr = std::shared_ptr<queue_impl>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this alias is very descriptive. It's more of a queue impl pointer rather than a pointer to a queue. But are we really saving much by having this as opposed to just writing out std::shared_ptr<queue_impl>
the places it is used?
const detail::plugin &Plugin = getPlugin(); | ||
if (Plugin.getBackend() == backend::ext_oneapi_level_zero) | ||
Plugin.call<detail::PiApiKind::piQueueFlush>(getHandleRef()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I call wait on an event instead? Will it have flushed somewhere else or will I stall?
Also, do we need to check for L0 here? It would probably be better to stop the user from even passing the property to any other backend, as I suspect if/once another backend adds support for lazy execution this is easy to miss and will be hard to find.
} // namespace oneapi | ||
} // namespace ext | ||
} // __SYCL_INLINE_VER_NAMESPACE(_V1) | ||
} // namespace sycl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell the proposed extension states that these operations should be thread-safe, but they don't look to be it at the moment. Is that planned for a later patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll add a note that this is added later.
@@ -47,6 +47,7 @@ __SYCL_INLINE_VER_NAMESPACE(_V1) { | |||
#define SYCL_EXT_ONEAPI_FREE_FUNCTION_QUERIES 1 | |||
#define SYCL_EXT_ONEAPI_GROUP_ALGORITHMS 1 | |||
#define SYCL_EXT_ONEAPI_GROUP_SORT 1 | |||
#define SYCL_EXT_ONEAPI_LAZY_QUEUE 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not intended for use by the user, is there a reason to expose it as a feature?
Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
Implementation of Record & Replay API with tests Co-authored-by: Ben Tracy <ben.tracy@codeplay.com>
Replaced by #9375. The Feedback shared here has been addressed and incorporated. |
This patch adds a prototype implementation of the proposed command graph extension without major changes to the PI API and covers basic functionality.
The prototype relies on introducing a lazy execution property that postpones kernel execution for a specific queue until it is explicitly flushed. This property is not intended to be exposed to a user.