-
Notifications
You must be signed in to change notification settings - Fork 125
[CUDA][HIP] Add Event Caching #1538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2e18c74
to
c19b5ac
Compare
c3879cf
to
590897e
Compare
97e39dc
to
ed75dc7
Compare
Don't destroy events that are not needed anymore, but put them on a stack of its associated queue (If there is one). Use events from the stack before creating new ones. This caching mechanism ensures that the number of CUDA API calls gets significantly reduced for event creations and destructions.
Follow a similar logic as in #13f097c. Reduce the number of HIP API calls for event creations and destructions.
Make the retrieval and pushing of events onto the stack thread-safe by locking it with mutexes. Use these mutexes inside the queue, which is responsible for caching the events. Apply further minor changes, such as returning the result status for event releases directly, usage of the proper UR assertions and proper styling for variable names.
Copy thread-safety mechanism for event caching from CUDA to HIP. (#590897e9103c06ef1f1d86a6edf7c7eb525bd7d8) Use atomic deletion of cached events in queue destructor. Refactor minor styling/redundancy issues for the CUDA and HIP adapter.
Use a mutex to make the emptiness check and retrieval of cached events atomic.
0938386
to
ffa30c7
Compare
Return UR_RESULT_ERROR_INVALID_EVENT only at the end of a try catch block and UR_RESULT_SUCCESS as default.
Modify the return value inside urEventRelease.
Don't use a lock_guard inside another mutex locking function. This led to deadlocks for the CUDA and HIP adapters.
@@ -57,6 +61,8 @@ struct ur_queue_handle_t_ { | |||
std::mutex ComputeStreamMutex; | |||
std::mutex TransferStreamMutex; | |||
std::mutex BarrierMutex; | |||
// The event cache might be accessed in multiple threads. | |||
std::mutex CacheMutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be inclined to have this close in code to the CachedEvents
Small nit. Otherwise LGTM |
// Returns and removes an event from the CachedEvents stack. | ||
ur_event_handle_t get_cached_event() { | ||
std::lock_guard<std::mutex> CacheGuard(CacheMutex); | ||
assert(!CachedEvents.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konradkusiak97 spotted this. Need to change this to detail::ur::assertion
// Returns and removes an event from the CachedEvents stack. | ||
ur_event_handle_t get_cached_event() { | ||
std::lock_guard<std::mutex> CacheGuard(CacheMutex); | ||
assert(!CachedEvents.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konradkusiak97 spotted this. Need to change this to detail::ur::assertion
Closed since it did not yield significant improvement in benchmarking |
Don't destroy events that are not needed anymore, but put them on a stack
of its associated queue (If there is one).
Use events from the stack before creating new ones.
Make the operations for putting events onto the stack and retrieving them
atomic to ensure thread safety.
This caching mechanism ensures that the number of CUDA API calls gets
significantly reduced for event creations and destructions.