[SYCL] Enable post-enqueue execution graph cleanup #5070

sergey-semenov · 2021-12-02T13:45:43Z

This patch contains the initial implementation of post-enqueue cleanup of
command nodes. This is primarily motivated by significant overhead of post-wait
cleanup in queue::wait when lowered to piQueueFinish, which is due to the fact
that we cannot alternate between waiting for singular events and cleaning up
their commands while other tasks are still executing on device.

Post-enqueue cleanup is performed for enqueued non-leaf nodes, so it can be
triggered by the enqueue process or by removing an enqueued node from leaves.
There are multiple exceptions in the initial implementation: host tasks
(currently cannot be cleaned up after enqueue), kernels with streams (stream
handling is tied to finished command cleanup) and CGs without dependencies
(deleted in addCG like before for now). Because of this, finished command
cleanup is still triggered in event::wait() unconditionally and in
queue::wait() for host tasks and kernels with streams.

In addition, this patch removes queue::wait workarounds for Level Zero that
were required to bypass finished command cleanup overhead.

sergey-semenov · 2021-12-02T13:45:51Z

/summary:run

A very rough implementation of cleaning up command nodes after they're enqueued and stop being leaves, alloca commands excluded. Handles only a subset of cases.

sergey-semenov · 2021-12-09T13:34:58Z

/summary:run

sergey-semenov · 2021-12-09T16:11:25Z

/summary:run

sergey-semenov · 2021-12-13T16:49:41Z

/summary:run

sergey-semenov · 2021-12-14T11:54:01Z

/summary:run

sergey-semenov · 2021-12-14T17:00:06Z

/summary:run

…boogaloo

sergey-semenov · 2021-12-15T15:05:51Z

/summary:run

sergey-semenov · 2021-12-15T15:36:00Z

/summary:run

sergey-semenov · 2021-12-15T16:10:43Z

/summary:run

sergey-semenov · 2021-12-16T13:47:35Z

/summary:run

romanovvlad

Did a partial review.
Please, look at the latest comment first.

sycl/doc/EnvironmentVariables.md

sycl/source/detail/device_image_impl.hpp

sycl/source/detail/queue_impl.cpp

sycl/source/detail/scheduler/commands.cpp

romanovvlad · 2021-12-17T18:43:04Z

sycl/source/detail/scheduler/graph_builder.cpp

    Cmd->MLeafCounter -= Record->MReadLeaves.remove(Cmd);
    Cmd->MLeafCounter -= Record->MWriteLeaves.remove(Cmd);
+    if (WasLeaf && Cmd->MLeafCounter == 0 && Cmd->isSuccessfullyEnqueued() &&


I'm a little bit lost.
Wouldn't it be simpler to:

Add readyForCleanUp method to Command which do all the required checks for a command

Add a "scheduler global" vector ToCleanUp which is guarded by a mutex. (can be later converted to thread local if needed)

In GraphProcessor::enqueue, check if a dep of a command is readyForCleanup(1.) and, if yes, add it to ToCleanUp vector. (Can have local cache if needed to reduce number of locks for accessing the vector).

In all* Scheduler high level entry points check if we have something in ToCleanUp vector and run cleanup.

?

I mainly wanted to keep the number of cleanup vector mutex locks to a minimum: currently either the graph or deferred cleanup vector mutex is locked once per attempted cleanup. I haven't considered that if needed, the same can be achieved by using a thread local member variable in scheduler. Also, if we use a mutex instead of a thread local variable, we still cannot minimize its locking (per 1 Scheduler entry point) with a local cache on the GraphProcessor::enqueue level. So I would prefer the current solution or a thread local member variable out of those three options.

I currently make the check inside Command::enqueue and populate the vector there since the function only reports that the command has been enqueued at some point rather than as part of this specific Command::enqueue call. We could return the second piece of information from Command::enqueue to move this logic to GraphProcessor::enqueue, but considering that that info is only needed for cleanup, I don't think it would be a significant improvement.

So, do you think that switching to a thread local member variable in scheduler and populating it in Command::enqueue and GraphBuilder::updateLeaves makes sense?

I don't think using thread local vars here is anything better. While it might end up less code, I don't think maintaining it will be easy.

I tend to agree. I suggest keeping the current solution for now then, unless any other reviewers disagree with that. @romanovvlad Feel free to continue the review post-merge so that we can revisit this topic if needed.

sycl/source/detail/queue_impl.cpp

sergey-semenov · 2021-12-21T09:01:20Z

@intel/llvm-reviewers-runtime Since Vlad is on vacation, could someone else please review and weigh in on some of the unresolved discussions (mainly, the one about the current solution vs using a thread local member vector of commands)?

sycl/source/detail/queue_impl.cpp

sycl/source/detail/scheduler/commands.hpp

sycl/source/detail/scheduler/graph_builder.cpp

sycl/source/detail/scheduler/scheduler.cpp

…boogaloo

sergey-semenov · 2021-12-23T14:34:22Z

@alexbatashev @intel/llvm-reviewers-runtime Added unit tests for the change. Please, take another look.

sergey-semenov · 2021-12-23T14:34:53Z

/verify with intel/llvm-test-suite#631

bader

sycl/doc/EnvironmentVariables.md looks good to me.

bader · 2021-12-23T18:56:53Z

Do you know why SYCL :: Tracing/pi_tracing_test.cpp fails?

bader · 2021-12-23T19:01:49Z

I suggest merging with recent sycl head, update intel/llvm-test-suite#631 with more fixes if necessary and restart Jenkins testing.
I think we can should clean pre-commit results for this PR.
After that, we can check summary job status again.

sergey-semenov · 2021-12-24T09:46:17Z

I think the tracing test fails in the "verify with" run because this branch has been merged with #5172, while the llvm-test-suite branch doesn't have intel/llvm-test-suite#637. Still, I agree that it's better to be safe here, updated both PRs.

sergey-semenov · 2021-12-24T09:47:11Z

/verify with intel/llvm-test-suite#631

bader · 2021-12-24T09:52:39Z

"Generate Doxygen documentation / build (pull_request)" will fail due to issues caused by pulling llorg commits. Please, ignore this failure.

sergey-semenov · 2021-12-24T12:17:16Z

/summary:run

s-kanaev · 2022-01-10T13:13:26Z

sycl/source/detail/event_impl.hpp

+  void setNeedsCleanupAfterWait(bool NeedsCleanupAfterWait) {
+    MNeedsCleanupAfterWait = NeedsCleanupAfterWait;
+  }
+  bool needsCleanupAfterWait() { return MNeedsCleanupAfterWait; }


shouldn't this be const?

This patch reverts performance regression for Level Zero backend introduced in intel#5070

[DO NOT MERGE] Graph cleanup experiments

de35323

A very rough implementation of cleaning up command nodes after they're enqueued and stop being leaves, alloca commands excluded. Handles only a subset of cases.

sergey-semenov force-pushed the graphcleanup2electricboogaloo branch from 2030398 to de35323 Compare December 9, 2021 13:26

Fix read after free

0651e70

sergey-semenov added 4 commits December 13, 2021 14:41

Fix post-enqueue and graph traversal cleanup conflict

ef4b476

Disable post enqueue cleanup for unsupported commands

b0ce766

Add a workaround for spec const issue

99686b5

Turn on old cleanup

b6b371b

sergey-semenov added 2 commits December 14, 2021 14:52

Isolate nodes scheduled for post-enqueue cleanup

ed4a785

Apply clang-format

307c2ae

sergey-semenov added 3 commits December 14, 2021 16:22

Merge branch 'sycl' into graphcleanup2electricboogaloo

75f5ef7

Add deferred cleanup on scheduler destruction

8e3d9bd

Disable cleanup for some of the unit tests

c9e5e36

sergey-semenov added 3 commits December 15, 2021 17:42

Minor stylistic changes

54fd4b4

Fix an issue with cleaning up nodes on their removal from leaves

63524d1

Merge remote-tracking branch 'origin/sycl' into graphcleanup2electric…

23da2d7

…boogaloo

Minor non-functional updates

2c1e939

Adjust queue flushing unit test to the new changes

140fce7

sergey-semenov added 3 commits December 15, 2021 19:11

Apply clang-format

b4e9463

Remove unnecessary finished command cleanup & cut down on stored events

909a634

Remove Level Zero workaround in queue::wait

a2c3046

romanovvlad reviewed Dec 17, 2021

View reviewed changes

sergey-semenov commented Dec 20, 2021

View reviewed changes

sycl/source/detail/queue_impl.cpp Outdated Show resolved Hide resolved

sergey-semenov added 2 commits December 21, 2021 12:55

Apply some comments

1e40a05

Merge branch 'sycl' into graphcleanup2electricboogaloo

8d97039

alexbatashev reviewed Dec 21, 2021

View reviewed changes

sergey-semenov added 5 commits December 21, 2021 15:16

Apply more comments

c720274

Fix unused variable with assertions disabled

abdcc62

Add unit tests

f43f1f1

Merge remote-tracking branch 'origin/sycl' into graphcleanup2electric…

0f37897

…boogaloo

Non-functional test changes

5275fdf

sergey-semenov requested a review from alexbatashev December 23, 2021 14:37

alexbatashev approved these changes Dec 23, 2021

View reviewed changes

bader reviewed Dec 23, 2021

View reviewed changes

bader approved these changes Dec 23, 2021

View reviewed changes

Merge branch 'sycl' into graphcleanup2electricboogaloo

27a2235

bader merged commit 6fd6098 into intel:sycl Dec 27, 2021

sergey-semenov deleted the graphcleanup2electricboogaloo branch December 27, 2021 13:22

sergey-semenov mentioned this pull request Dec 28, 2021

[SYCL] Restore queue::wait() changes for Level Zero #4325

Closed

s-kanaev reviewed Jan 10, 2022

View reviewed changes

dm-vodopyanov added a commit to dm-vodopyanov/llvm that referenced this pull request Jan 14, 2022

[SYCL] Revert performance regression for Level Zero

a0bb63b

This patch reverts performance regression for Level Zero backend introduced in intel#5070

dm-vodopyanov mentioned this pull request Jan 14, 2022

[SYCL] Revert performance regression for Level Zero #5313

Closed

[SYCL] Enable post-enqueue execution graph cleanup #5070

[SYCL] Enable post-enqueue execution graph cleanup #5070

Uh oh!

Conversation

sergey-semenov commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergey-semenov commented Dec 2, 2021

Uh oh!

sergey-semenov commented Dec 9, 2021

Uh oh!

sergey-semenov commented Dec 9, 2021

Uh oh!

sergey-semenov commented Dec 13, 2021

Uh oh!

sergey-semenov commented Dec 14, 2021

Uh oh!

sergey-semenov commented Dec 14, 2021

Uh oh!

sergey-semenov commented Dec 15, 2021

Uh oh!

sergey-semenov commented Dec 15, 2021

Uh oh!

sergey-semenov commented Dec 15, 2021

Uh oh!

sergey-semenov commented Dec 16, 2021

Uh oh!

romanovvlad left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanovvlad Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergey-semenov Dec 20, 2021

Choose a reason for hiding this comment

Uh oh!

alexbatashev Dec 21, 2021

Choose a reason for hiding this comment

Uh oh!

sergey-semenov Dec 23, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergey-semenov commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sergey-semenov commented Dec 23, 2021

Uh oh!

sergey-semenov commented Dec 23, 2021

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

bader commented Dec 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bader commented Dec 23, 2021

Uh oh!

sergey-semenov commented Dec 24, 2021

Uh oh!

sergey-semenov commented Dec 24, 2021

Uh oh!

bader commented Dec 24, 2021

sergey-semenov commented Dec 2, 2021 •

edited

Loading

romanovvlad left a comment •

edited

Loading

romanovvlad Dec 17, 2021 •

edited

Loading

sergey-semenov commented Dec 21, 2021 •

edited

Loading

bader commented Dec 23, 2021 •

edited

Loading