-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Graph] Add initial support for SYCL Graph (1/4) #9728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codeplays contributions to this work aren't attributed in the initial split commit of this PR, @EwanC (ewan@codeplay.com) & @Bensuo (ben.tracy@codeplay.com) should be named as co-authors in the merge commit when this goes in. |
The current approach uses the same LO PI event for all submissions of the same graph, and also doesn't use the wait events to enforce dependencies on the command-list submission. By doing these in the L0 adapter, we can remove the blocking queue wait from our graphs submission code in the runtime. Closes issue #139
FYI - https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md#merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part one of the code review.
Thanks a lot for the comments in the code. They are very helpful.
sycl/source/detail/queue_impl.hpp
Outdated
@@ -32,6 +32,8 @@ | |||
#include <sycl/queue.hpp> | |||
#include <sycl/stl.hpp> | |||
|
|||
#include "detail/graph_impl.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: technically, we can use forward declaration of graph_impl
instead of including the header with the full declaration. In this header we only use it as shared_ptr
type.
Using forward declaration will improve compile time. As this header is not included to the user's code, this change will help to build the DPC++ runtime faster, so it impacts only DPC++ developers, but not DPC++ users.
Similar to the approach applied to sycl/include/sycl/handler.hpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in reble#212
// MHostKernel will be used elsewhere. | ||
std::unique_ptr<HostKernelBase> *CopyHostKernel = | ||
new std::unique_ptr<HostKernelBase>(std::move(HostTask->MHostKernel)); | ||
std::shared_ptr<HostKernelBase> CopyHostKernel = HostTask->MHostKernel; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like someone from the runtime team to confirm that this change is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somebody in particular, or a group, we can tag for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@intel/llvm-reviewers-runtime is the team, but I would especially like @sergey-semenov's thoughts on this and the deep-copies and copy-ctor on CG
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking to Ben's comment on this in another resolved thread for reference https://github.com/intel/llvm/pull/9728/files#r1222976913
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems fine to me. As for CG copies, we might be able to share some parts like MArgsStorage between them (assuming my understanding that arguments are guaranteed to stay the same is correct).
/// An executable command-graph is not user constructable. | ||
command_graph() = delete; | ||
|
||
/// Constructor used by internal runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the modifiable command_graph constructor is private, whereas the executable command_graph constructor is public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in reble#219
If we pull in the code from these two commits it should fix two of the Windows CI fails:
It might be that the alignment part of 194 is stale though, and needs redone to reflect new upstream changes. |
With our change from storing a host-task in a command-group to a shared-pointer from a unique shared-pointer we didn't cover all the usages. When the native kernel is called in `DispatchNativeKernel`, the `void*` argument is currently casted to a unique_ptr and then freed at the end of the function. However, we set this `void*` to a raw pointer from the shared-pointer `get()`. This results in freeing the underlying memory held by the shared-pointer in `DispatchNativeKernel` before the shared-pointer's reference count is decremented to zero. I initially tried removing the `delete` in `DispatchNativeKernel`, which fixes `Basic/handler/run_on_host_intel.cpp` but doesn't fix `Basic/handler/run_on_host_intel2.cpp`, because user code can result in the command-group being deleted before the native kernel is run, causing the shared-pointer to decrement to a zero reference count and free the memory. The proposed fix in this patch creates a shared-pointer on the heap every time the host-task is enqueued, this increments the reference count of the shared-pointer for the lifetime it takes for the `DispatchNativeKernel` callback to run, and the heap shared-pointer is then freed at the end of `DispatchNativeKernel`. Fixes the following test-e2e tests for OpenCL CPU: ``` Basic/handler/run_on_host_intel.cpp Basic/handler/run_on_host_intel2.cpp DiscardEvents/discard_events_host_task.cpp ```
`test/abi/sycl_symbols_windows.dump` was invalidated by the last commit, regenerate from script
Failure in post-commit: |
This commit attempts to fix the exported symbols after intel#9728 by creating templateless base classes for the different graph class specializations. This is in line with how other classes export their symbols. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This commit attempts to fix the exported symbols after #9728 by creating templateless base classes for the different graph class specializations. This is in line with how other classes export their symbols. --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
The SYCL-Graphs extension code which landed in intel#9728 is a collaboration between Codeplay and Intel. When graphs specific code is touched, the teams across both Intel and Codeplay should be notified. For example, in intel#9977 no Codeplay developers were tagged (Not that we have any issues with that PR).
Initial SYCL-Graph Patch
This is the first patch of a series that adds support for an experimental command graph extension
A snapshot of the complete work can be seen in draft PR #9375 which has support all the specification defined ways of
adding nodes and edges to the graph, including both Explicit and Record & Replay graph construction. The two types of nodes currently implemented are kernel execution and memcpy commands.
See https://github.com/reble/llvm#implementation-status for the status of our total work.
Scope
This first patch focuses on ABI breaking changes and includes:
sycl::handler
constructor and graph related member variables.CG::CGTYPE::ExecCommandBuffer
.CGExecKernel::MHostKernel
member variable from a unique ptr to a shared ptr so that the command-group can be copied.unittests
tests.Following Split PRs
Future follow-up PRs with the remainder of our work on the extension will include:
Authors
Co-authored-by: Pablo Reble pablo.reble@intel.com
Co-authored-by: Julian Miller julian.miller@intel.com
Co-authored-by: Ben Tracy ben.tracy@codeplay.com
Co-authored-by: Ewan Crawford ewan@codeplay.com