-
Notifications
You must be signed in to change notification settings - Fork 772
[SYCL] Reuse discarded L0 events in scope of command list #7256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Patch implements reset and reuse of Level Zero events in scope of cmd list. Same level zero events are reused inside different command lists in scope of the queue, we need new event only to switch between command lists. The scheme in scope of command list looks like this: Operation1 = zeCommantListAppendMemoryCopy (signal event1) zeCommandListAppendBarrier(wait for event1) zeCommandListAppendEventReset(event1) Operation2 = zeCommandListAppendMemoryCopy (signal event2) zeCommandListAppendBarrier(wait for event2) zeCommandListAppendEventReset(event2) Operation3 = zeCommandListAppendMemoryCopy (signal event1) If we switch to a different command list then we signal new event and insert a barrier into new command list waiting or that event. CmdList1: Operation1 = zeCommantListAppendMemoryCopy (signal event1) zeCommandListAppendBarrier(wait for event1) zeCommandListAppendEventReset(event1) zeCommandListAppendSignalEvent(NewEvent) CmdList2: zeCommandListAppendBarrier(wait for NewEvent)
This reverts commit 0448cb8.
442f937
to
41d518c
Compare
@smaslov-intel Could you please review. |
// events in the in-order queue with discard_events property. | ||
static const bool ReuseDiscardedEvents = [] { | ||
const char *ReuseDiscardedEventsFlag = | ||
std::getenv("SYCL_PI_LEVEL_ZERO_REUSE_DISCARDED_EVENTS"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document this env var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented.
// Create new pi_event but with the same ze_event_handle_t. We are going | ||
// to use this pi_event for the next command with discarded event. | ||
pi_event PiEvent; | ||
try { | ||
PiEvent = new _pi_event(LastCommandEvent->ZeEvent, | ||
LastCommandEvent->ZeEventPool, Context, | ||
PI_COMMAND_TYPE_USER, true); | ||
} catch (const std::bad_alloc &) { | ||
return PI_ERROR_OUT_OF_HOST_MEMORY; | ||
} catch (...) { | ||
return PI_ERROR_UNKNOWN; | ||
} | ||
|
||
if (LastCommandEvent->isHostVisible()) | ||
PiEvent->HostVisibleEvent = PiEvent; | ||
|
||
PI_CALL(addEventToCache(PiEvent)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code snippet does not logically belong to this method called "appendWaitAndReset", but I see how it is convenient to do it here where you know the "LastCommandEvent->ZeEvent". Maybe rename function to like "resetDiscardedEvent"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, renamed.
// any command but its ZeEvent is used by many pi_event objects. | ||
// Commands to wait and reset ZeEvent must be submitted to the queue before | ||
// calling this method. | ||
pi_result addEventToCache(pi_event Event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pi_result addEventToCache(pi_event Event); | |
pi_result addEventToQueueCache(pi_event Event); |
and rename in pi_context to "addEventToContextCache" (for readability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, renamed.
this->LastCommandEvent = CommandList->second.EventList.back(); | ||
if (ReuseDiscardedEvents && isInOrderQueue() && isDiscardEvents()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a helper query for multiple repeated conditions:
if (ReuseDiscardedEvents && isInOrderQueue() && isDiscardEvents()) { | |
if (doReuseDiscardedEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a helper method.
if (ReuseDiscardedEvents && isInOrderQueue() && isDiscardEvents()) { | ||
// If we have in-order queue with discarded events then we want to | ||
// treat this event as regular event. We insert a barrier in the next | ||
// command list to wait for this event. | ||
LastCommandEvent = HostVisibleEvent; | ||
} else { | ||
// For all other queues treat this as a special event and indicate no | ||
// cleanup is needed. | ||
PI_CALL(piEventReleaseInternal(HostVisibleEvent)); | ||
HostVisibleEvent->CleanedUp = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change to treat it "regular" in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can, I added TODO for this because it may require test changes as well.
// signal new event from the last immediate command list. We are going | ||
// to insert a barrier in the new command list waiting for that event. | ||
auto QueueGroup = CurQueue->getQueueGroup(UseCopyEngine); | ||
auto NextImmCmdList = QueueGroup.ImmCmdLists[QueueGroup.NextIndex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fragile to use "NextIndex" directly here, maybe add a boolean to getQueueIndex that we just want to know the index (but not advance it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, fixed as you suggested.
} | ||
} else { | ||
// Ensure LastCommandEvent's batch is submitted if it is differrent | ||
// from the one this command is going to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also comment that the "signalEventFromCmdListIfLastEventDiscarded" will be called at batch close if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
// If the last event is discarded then we already have a barrier waiting for | ||
// that event, so don't need to include the last command event into the wait | ||
// list. | ||
if (ReuseDiscardedEvents && CurQueue->isDiscardEvents() && | ||
CurQueue->LastCommandEvent->IsDiscarded) | ||
IncludeLastCommandEvent = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an optimization? Is it important? I'd prefer we don't do that (at least like this) because it relies on other places to properly add the barrier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is important, because we reset discarded event, so we can't add it into the wait list -> it will cause waiting for reset event. I added a clarifying comment. Unfortunately, currently I don't see how to get rid of this part.
This reverts commit b5a1c29.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SYCL :: GroupAlgorithm/SYCL2020/sort.cpp failure on HIP/CUDA is known. |
Patch implements reset and reuse of Level Zero events in scope of cmd list. Same level zero events are reused inside different command lists in scope of the queue, we need new event only to switch between command lists.
The scheme in scope of command list looks like this:
Operation1 = zeCommantListAppendMemoryCopy (signal event1)
zeCommandListAppendBarrier(wait for event1)
zeCommandListAppendEventReset(event1)
Operation2 = zeCommandListAppendMemoryCopy (signal event2)
zeCommandListAppendBarrier(wait for event2)
zeCommandListAppendEventReset(event2)
Operation3 = zeCommandListAppendMemoryCopy (signal event1)
If we switch to a different command list then we signal new event and insert a barrier into new command list waiting on that event.
CmdList1:
Operation1 = zeCommantListAppendMemoryCopy (signal event1)
zeCommandListAppendBarrier(wait for event1)
zeCommandListAppendEventReset(event1)
zeCommandListAppendSignalEvent(NewEvent)
CmdList2:
zeCommandListAppendBarrier(wait for NewEvent)