Skip to content

[SYCL][Graph] Command Graph PoC #7627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from
Closed

Conversation

reble
Copy link
Contributor

@reble reble commented Dec 2, 2022

This patch adds a prototype implementation of the proposed command graph extension without major changes to the PI API and covers basic functionality.

The prototype relies on introducing a lazy execution property that postpones kernel execution for a specific queue until it is explicitly flushed. This property is not intended to be exposed to a user.

reble and others added 29 commits October 13, 2022 07:00
Fix merge conflict
Co-authored-by: Ronan Keryell <ronan@keryell.fr>
Changes to common code from #6
which has already been reviewed and merged into the
`sycl-graph-record-replay` branch.

This patch should not contain anything specific to the record and
replay API.
- getSyclObjImpl and createSyclObjFromImpl support added
- Minor renaming to enable this.
- Adds basic results validation to dotp test
- Minor fixes to address warnings etc.
* Add RUN lines to tests so that tests are run by LIT
* clang-format existing tests, and other minor cleanups
* Add `graph-explicit-reduction.cpp` which shows fail from #24 by using the `sycl::ext::oneapi::property::queue::lazy_execution` property on a queue which uses a reduction outwith  the graph building API
Refactor the command_graph and node classes so that
we interface with the impl types rather than
user exposed types, and just the interface lives in the
public facing headers.

This change also means we can use a `.cpp` file for implementation
code rather than being header only.

The motivation for these changes was trying to get graph submission
through a handler, at which point only the `sycl::detail::queue_impl` class
is available rather than `sycl::queue`
Update API to match the spec change from #26
to execute a graph via the handler rather than queue submit.

This spec update includes queue shortcut functions, which i've added
a new test for.
* PI Minor version bump for new flag
* Document new PI property as comments
* Make value next consecutive bit `1 << 5`, rather
  than `1 << 11`.
@reble reble requested a review from a team as a code owner December 2, 2022 22:39
@reble reble requested a review from a team as a code owner December 2, 2022 22:39
@reble reble requested a review from aelovikov-intel December 2, 2022 22:39
@aelovikov-intel
Copy link
Contributor

What is the status of this? I'm pretty sure that you'd need some specific reviewers if/when it's ready :)

@aelovikov-intel
Copy link
Contributor

I'd also suggest to split the final implementation into several PRs. I think something like this would be reasonable:

  1. Explicitly constructed graph (not queue-recorded) with unittests, without execution support
  2. Plugin implementation, end-to-end test is possible at this moment
  3. Queue recording support. I'm not sure what would the best scenario to test it, but I'd think unittest would be a better match.

@reble
Copy link
Contributor Author

reble commented Dec 6, 2022

What is the status of this? I'm pretty sure that you'd need some specific reviewers if/when it's ready :)

This is an experimental implementation of the basic features as described in #5626
Tagging @steffenlarsen @smaslov-intel @againull since this PR is replacing the one on my fork. Where we started a discussion on the design. Going forward we would like to use a lazy execution property as a temporary solution to enable the command graph extension with minor changes to the PI API.

@reble
Copy link
Contributor Author

reble commented Dec 6, 2022

I'd also suggest to split the final implementation into several PRs. I think something like this would be reasonable:

  1. Explicitly constructed graph (not queue-recorded) with unittests, without execution support
  2. Plugin implementation, end-to-end test is possible at this moment
  3. Queue recording support. I'm not sure what would the best scenario to test it, but I'd think unittest would be a better match.

Thanks for your feedback. Yes, we were planning to split the final implementation into multiple PRs. And the stages you are suggesting make sense from my perspective. Implementing the Explicit API on top of SYCL (stage 1) is possible and something I have done before as an initial prototype. This would be without any benefits in terms of performance. What we have right now is a mix of stage 1 + 2 as a PoC implementation.

@steffenlarsen steffenlarsen self-requested a review December 6, 2022 08:22
* Graph submission now properly creates a host visible event on the command list allowing auxilliary resources to be cleaned up

* executeCommandList slightly modified to block execution only for command lists not allowed to be batched.
Bensuo and others added 5 commits December 21, 2022 09:10
- Enable submitting a sub-graph as part of a larger command_graph
- Flag added to queue_impl to enable graph to be aware it is a sub-graph and delay flush
- Adds an example whichuses a subgraph in the middle of a command_graph
[SYCL] handler::ext_oneapi_graph

Update to reflect changes from #65

- In line with recent spec changes, rename handler and queue shortcut functions from exec_graph to ext_oneapi_graph
- Also updated usage in the examples

Co-authored-by: Ewan Crawford <ewan@codeplay.com>
- Add some unit tests for the command graph POC
-Add missing specializations for lazy queue property
Adds the `sycl::property_list` to the constructor of
`command_graph<modifiable>()` and `finalize()` to
match spec change #67
Co-authored-by: Ben Tracy <ben.tracy@codeplay.com>
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start, but I have a couple of initial comments.

Additionally, please adhere to the LLVM Coding Standard when it does not conflict with the style of the surrounding code. We have some places where this has slipped, but for new files it would be good to stay true to the coding standard, especially naming.


namespace detail {
struct queue_impl;
using queue_ptr = std::shared_ptr<queue_impl>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this alias is very descriptive. It's more of a queue impl pointer rather than a pointer to a queue. But are we really saving much by having this as opposed to just writing out std::shared_ptr<queue_impl> the places it is used?

const detail::plugin &Plugin = getPlugin();
if (Plugin.getBackend() == backend::ext_oneapi_level_zero)
Plugin.call<detail::PiApiKind::piQueueFlush>(getHandleRef());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I call wait on an event instead? Will it have flushed somewhere else or will I stall?

Also, do we need to check for L0 here? It would probably be better to stop the user from even passing the property to any other backend, as I suspect if/once another backend adds support for lazy execution this is easy to miss and will be hard to find.

} // namespace oneapi
} // namespace ext
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
} // namespace sycl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell the proposed extension states that these operations should be thread-safe, but they don't look to be it at the moment. Is that planned for a later patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll add a note that this is added later.

@@ -47,6 +47,7 @@ __SYCL_INLINE_VER_NAMESPACE(_V1) {
#define SYCL_EXT_ONEAPI_FREE_FUNCTION_QUERIES 1
#define SYCL_EXT_ONEAPI_GROUP_ALGORITHMS 1
#define SYCL_EXT_ONEAPI_GROUP_SORT 1
#define SYCL_EXT_ONEAPI_LAZY_QUEUE 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not intended for use by the user, is there a reason to expose it as a feature?

Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
@reble reble marked this pull request as draft February 23, 2023 05:11
Implementation of Record & Replay API with tests

Co-authored-by: Ben Tracy <ben.tracy@codeplay.com>
@reble
Copy link
Contributor Author

reble commented May 9, 2023

Replaced by #9375.

The Feedback shared here has been addressed and incorporated.

@reble reble closed this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants