Skip to content

[SYCL] Enable post-enqueue execution graph cleanup #5070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Dec 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
de35323
[DO NOT MERGE] Graph cleanup experiments
sergey-semenov Dec 1, 2021
0651e70
Fix read after free
sergey-semenov Dec 9, 2021
ef4b476
Fix post-enqueue and graph traversal cleanup conflict
sergey-semenov Dec 13, 2021
b0ce766
Disable post enqueue cleanup for unsupported commands
sergey-semenov Dec 13, 2021
99686b5
Add a workaround for spec const issue
sergey-semenov Dec 13, 2021
b6b371b
Turn on old cleanup
sergey-semenov Dec 13, 2021
ed4a785
Isolate nodes scheduled for post-enqueue cleanup
sergey-semenov Dec 14, 2021
307c2ae
Apply clang-format
sergey-semenov Dec 14, 2021
75f5ef7
Merge branch 'sycl' into graphcleanup2electricboogaloo
sergey-semenov Dec 14, 2021
8e3d9bd
Add deferred cleanup on scheduler destruction
sergey-semenov Dec 14, 2021
c9e5e36
Disable cleanup for some of the unit tests
sergey-semenov Dec 14, 2021
54fd4b4
Minor stylistic changes
sergey-semenov Dec 14, 2021
63524d1
Fix an issue with cleaning up nodes on their removal from leaves
sergey-semenov Dec 15, 2021
23da2d7
Merge remote-tracking branch 'origin/sycl' into graphcleanup2electric…
sergey-semenov Dec 15, 2021
2c1e939
Minor non-functional updates
sergey-semenov Dec 15, 2021
140fce7
Adjust queue flushing unit test to the new changes
sergey-semenov Dec 15, 2021
b4e9463
Apply clang-format
sergey-semenov Dec 15, 2021
909a634
Remove unnecessary finished command cleanup & cut down on stored events
sergey-semenov Dec 16, 2021
a2c3046
Remove Level Zero workaround in queue::wait
sergey-semenov Dec 16, 2021
271210d
Apply clang-format
sergey-semenov Dec 16, 2021
1e40a05
Apply some comments
sergey-semenov Dec 21, 2021
8d97039
Merge branch 'sycl' into graphcleanup2electricboogaloo
sergey-semenov Dec 21, 2021
c720274
Apply more comments
sergey-semenov Dec 21, 2021
abdcc62
Fix unused variable with assertions disabled
sergey-semenov Dec 21, 2021
f43f1f1
Add unit tests
sergey-semenov Dec 23, 2021
0f37897
Merge remote-tracking branch 'origin/sycl' into graphcleanup2electric…
sergey-semenov Dec 23, 2021
5275fdf
Non-functional test changes
sergey-semenov Dec 23, 2021
27a2235
Merge branch 'sycl' into graphcleanup2electricboogaloo
sergey-semenov Dec 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ variables in production code.</span>
| `SYCL_DEVICELIB_NO_FALLBACK` | Any(\*) | Disable loading and linking of device library images |
| `SYCL_PRINT_EXECUTION_GRAPH` | Described [below](#sycl_print_execution_graph-options) | Print execution graph to DOT text file. |
| `SYCL_DISABLE_EXECUTION_GRAPH_CLEANUP` | Any(\*) | Disable cleanup of finished command nodes at host-device synchronization points. |
| `SYCL_DISABLE_POST_ENQUEUE_CLEANUP` | Any(\*) | Disable cleanup of enqueued command nodes during submission. |
| `SYCL_THROW_ON_BLOCK` | Any(\*) | Throw an exception on attempt to wait for a blocked command. |
| `SYCL_DEVICELIB_INHIBIT_NATIVE` | String of device library extensions (separated by a whitespace) | Do not rely on device native support for devicelib extensions listed in this option. |
| `SYCL_PROGRAM_COMPILE_OPTIONS` | String of valid OpenCL compile options | Override compile options for all programs. |
Expand Down
1 change: 1 addition & 0 deletions sycl/include/CL/sycl/detail/cg.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ class CGExecKernel : public CG {
}

void clearStreams() { MStreams.clear(); }
bool hasStreams() { return !MStreams.empty(); }
};

/// "Copy memory" command group class.
Expand Down
1 change: 1 addition & 0 deletions sycl/source/detail/config.def
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

CONFIG(SYCL_PRINT_EXECUTION_GRAPH, 32, __SYCL_PRINT_EXECUTION_GRAPH)
CONFIG(SYCL_DISABLE_EXECUTION_GRAPH_CLEANUP, 1, __SYCL_DISABLE_EXECUTION_GRAPH_CLEANUP)
CONFIG(SYCL_DISABLE_POST_ENQUEUE_CLEANUP, 1, __SYCL_DISABLE_POST_ENQUEUE_CLEANUP)
CONFIG(SYCL_DEVICE_ALLOWLIST, 1024, __SYCL_DEVICE_ALLOWLIST)
CONFIG(SYCL_BE, 16, __SYCL_BE)
CONFIG(SYCL_PI_TRACE, 16, __SYCL_PI_TRACE)
Expand Down
7 changes: 6 additions & 1 deletion sycl/source/detail/device_image_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,14 @@ class device_image_impl {
std::lock_guard<std::mutex> Lock{MSpecConstAccessMtx};
if (nullptr == MSpecConstsBuffer && !MSpecConstsBlob.empty()) {
const detail::plugin &Plugin = getSyclObjImpl(MContext)->getPlugin();
// Uses PI_MEM_FLAGS_HOST_PTR_COPY instead of PI_MEM_FLAGS_HOST_PTR_USE
// since post-enqueue cleanup might trigger destruction of
// device_image_impl and, as a result, destruction of MSpecConstsBlob
// while MSpecConstsBuffer is still in use.
// TODO consider changing the lifetime of device_image_impl instead
memBufferCreateHelper(Plugin,
detail::getSyclObjImpl(MContext)->getHandleRef(),
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_COPY,
MSpecConstsBlob.size(), MSpecConstsBlob.data(),
&MSpecConstsBuffer, nullptr);
}
Expand Down
2 changes: 1 addition & 1 deletion sycl/source/detail/event_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,6 @@ std::vector<EventImplPtr> event_impl::getWaitList() {
}

void event_impl::flushIfNeeded(const QueueImplPtr &UserQueue) {
assert(MEvent != nullptr);
if (MIsFlushed)
return;

Expand All @@ -379,6 +378,7 @@ void event_impl::flushIfNeeded(const QueueImplPtr &UserQueue) {
return;

// Check if the task for this event has already been submitted.
assert(MEvent != nullptr);
pi_event_status Status = PI_EVENT_QUEUED;
getPlugin().call<PiApiKind::piEventGetInfo>(
MEvent, PI_EVENT_INFO_COMMAND_EXECUTION_STATUS, sizeof(pi_int32), &Status,
Expand Down
11 changes: 11 additions & 0 deletions sycl/source/detail/event_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,11 @@ class event_impl {
/// \return true if this event is discarded.
bool isDiscarded() const { return MState == HES_Discarded; }

void setNeedsCleanupAfterWait(bool NeedsCleanupAfterWait) {
MNeedsCleanupAfterWait = NeedsCleanupAfterWait;
}
bool needsCleanupAfterWait() { return MNeedsCleanupAfterWait; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be const?


private:
// When instrumentation is enabled emits trace event for event wait begin and
// returns the telemetry event generated for the wait
Expand Down Expand Up @@ -231,6 +236,12 @@ class event_impl {
// HostEventState enum.
std::atomic<int> MState;

// A temporary workaround for the current limitations of post enqueue graph
// cleanup. Indicates that the command associated with this event isn't
// handled by post enqueue cleanup yet and has to be deleted by cleanup after
// wait.
bool MNeedsCleanupAfterWait = false;

std::mutex MMutex;
};

Expand Down
101 changes: 37 additions & 64 deletions sycl/source/detail/queue_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,7 @@ event queue_impl::memset(const std::shared_ptr<detail::queue_impl> &Self,

event ResEvent = prepareUSMEvent(Self, NativeEvent);
// Track only if we won't be able to handle it with piQueueFinish.
// FIXME these events are stored for level zero until as a workaround, remove
// once piEventRelease no longer calls wait on the event in the plugin.
if (!MSupportOOO ||
getPlugin().getBackend() == backend::ext_oneapi_level_zero)
if (!MSupportOOO)
addSharedEvent(ResEvent);
return MDiscardEvents ? createDiscardedEvent() : ResEvent;
}
Expand All @@ -99,10 +96,7 @@ event queue_impl::memcpy(const std::shared_ptr<detail::queue_impl> &Self,

event ResEvent = prepareUSMEvent(Self, NativeEvent);
// Track only if we won't be able to handle it with piQueueFinish.
// FIXME these events are stored for level zero until as a workaround, remove
// once piEventRelease no longer calls wait on the event in the plugin.
if (!MSupportOOO ||
getPlugin().getBackend() == backend::ext_oneapi_level_zero)
if (!MSupportOOO)
addSharedEvent(ResEvent);
return MDiscardEvents ? createDiscardedEvent() : ResEvent;
}
Expand All @@ -125,29 +119,30 @@ event queue_impl::mem_advise(const std::shared_ptr<detail::queue_impl> &Self,

event ResEvent = prepareUSMEvent(Self, NativeEvent);
// Track only if we won't be able to handle it with piQueueFinish.
// FIXME these events are stored for level zero until as a workaround, remove
// once piEventRelease no longer calls wait on the event in the plugin.
if (!MSupportOOO ||
getPlugin().getBackend() == backend::ext_oneapi_level_zero)
if (!MSupportOOO)
addSharedEvent(ResEvent);
return MDiscardEvents ? createDiscardedEvent() : ResEvent;
}

void queue_impl::addEvent(const event &Event) {
EventImplPtr Eimpl = getSyclObjImpl(Event);
Command *Cmd = (Command *)(Eimpl->getCommand());
EventImplPtr EImpl = getSyclObjImpl(Event);
auto *Cmd = static_cast<Command *>(EImpl->getCommand());
if (!Cmd) {
// if there is no command on the event, we cannot track it with MEventsWeak
// as that will leave it with no owner. Track in MEventsShared only if we're
// unable to call piQueueFinish during wait.
// FIXME these events are stored for level zero until as a workaround,
// remove once piEventRelease no longer calls wait on the event in the
// plugin.
if (is_host() || !MSupportOOO ||
getPlugin().getBackend() == backend::ext_oneapi_level_zero)
if (is_host() || !MSupportOOO)
addSharedEvent(Event);
} else {
std::weak_ptr<event_impl> EventWeakPtr{Eimpl};
}
// As long as the queue supports piQueueFinish we only need to store events
// with command nodes in the following cases:
// 1. Unenqueued commands, since they aren't covered by piQueueFinish.
// 2. Kernels with streams, since they are not supported by post enqueue
// cleanup.
// 3. Host tasks, for both reasons.
else if (is_host() || !MSupportOOO || EImpl->getHandleRef() == nullptr ||
EImpl->needsCleanupAfterWait()) {
std::weak_ptr<event_impl> EventWeakPtr{EImpl};
std::lock_guard<std::mutex> Lock{MMutex};
MEventsWeak.push_back(std::move(EventWeakPtr));
}
Expand All @@ -157,10 +152,7 @@ void queue_impl::addEvent(const event &Event) {
/// but some events have no other owner. In this case,
/// addSharedEvent will have the queue track the events via a shared pointer.
void queue_impl::addSharedEvent(const event &Event) {
// FIXME The assertion should be corrected once the Level Zero workaround is
// removed.
assert(is_host() || !MSupportOOO ||
getPlugin().getBackend() == backend::ext_oneapi_level_zero);
assert(is_host() || !MSupportOOO);
std::lock_guard<std::mutex> Lock(MMutex);
// Events stored in MEventsShared are not released anywhere else aside from
// calls to queue::wait/wait_and_throw, which a user application might not
Expand Down Expand Up @@ -293,50 +285,31 @@ void queue_impl::wait(const detail::code_location &CodeLoc) {
// directly. Otherwise, only wait for unenqueued or host task events, starting
// from the latest submitted task in order to minimize total amount of calls,
// then handle the rest with piQueueFinish.
// TODO the new workflow has worse performance with Level Zero, keep the old
// behavior until this is addressed
if (!is_host() &&
getPlugin().getBackend() == backend::ext_oneapi_level_zero) {
const bool SupportsPiFinish = !is_host() && MSupportOOO;
for (auto EventImplWeakPtrIt = WeakEvents.rbegin();
EventImplWeakPtrIt != WeakEvents.rend(); ++EventImplWeakPtrIt) {
if (std::shared_ptr<event_impl> EventImplSharedPtr =
EventImplWeakPtrIt->lock()) {
// A nullptr PI event indicates that piQueueFinish will not cover it,
// either because it's a host task event or an unenqueued one.
if (!SupportsPiFinish || nullptr == EventImplSharedPtr->getHandleRef()) {
EventImplSharedPtr->wait(EventImplSharedPtr);
}
}
}
if (SupportsPiFinish) {
const detail::plugin &Plugin = getPlugin();
Plugin.call<detail::PiApiKind::piQueueFinish>(getHandleRef());
for (std::weak_ptr<event_impl> &EventImplWeakPtr : WeakEvents)
if (std::shared_ptr<event_impl> EventImplSharedPtr =
EventImplWeakPtr.lock())
EventImplSharedPtr->wait(EventImplSharedPtr);
if (EventImplSharedPtr->needsCleanupAfterWait())
EventImplSharedPtr->cleanupCommand(EventImplSharedPtr);
assert(SharedEvents.empty() && "Queues that support calling piQueueFinish "
"shouldn't have shared events");
} else {
for (event &Event : SharedEvents)
Event.wait();
} else {
bool SupportsPiFinish = !is_host() && MSupportOOO;
for (auto EventImplWeakPtrIt = WeakEvents.rbegin();
EventImplWeakPtrIt != WeakEvents.rend(); ++EventImplWeakPtrIt) {
if (std::shared_ptr<event_impl> EventImplSharedPtr =
EventImplWeakPtrIt->lock()) {
// A nullptr PI event indicates that piQueueFinish will not cover it,
// either because it's a host task event or an unenqueued one.
if (!SupportsPiFinish ||
nullptr == EventImplSharedPtr->getHandleRef()) {
EventImplSharedPtr->wait(EventImplSharedPtr);
}
}
}
if (SupportsPiFinish) {
const detail::plugin &Plugin = getPlugin();
Plugin.call<detail::PiApiKind::piQueueFinish>(getHandleRef());
for (std::weak_ptr<event_impl> &EventImplWeakPtr : WeakEvents)
if (std::shared_ptr<event_impl> EventImplSharedPtr =
EventImplWeakPtr.lock())
EventImplSharedPtr->cleanupCommand(EventImplSharedPtr);
// FIXME these events are stored for level zero until as a workaround,
// remove once piEventRelease no longer calls wait on the event in the
// plugin.
if (Plugin.getBackend() == backend::ext_oneapi_level_zero) {
SharedEvents.clear();
}
assert(SharedEvents.empty() &&
"Queues that support calling piQueueFinish "
"shouldn't have shared events");
} else {
for (event &Event : SharedEvents)
Event.wait();
}
}
#ifdef XPTI_ENABLE_INSTRUMENTATION
instrumentationEpilog(TelemetryEvent, Name, StreamID, IId);
Expand Down
Loading