Skip to content

[CUDA][HIP] Add Event Caching #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

MartinWehking
Copy link

@MartinWehking MartinWehking commented Apr 22, 2024

Don't destroy events that are not needed anymore, but put them on a stack
of its associated queue (If there is one).

Use events from the stack before creating new ones.

Make the operations for putting events onto the stack and retrieving them
atomic to ensure thread safety.

This caching mechanism ensures that the number of CUDA API calls gets
significantly reduced for event creations and destructions.

@github-actions github-actions bot added cuda CUDA adapter specific issues hip HIP adapter specific issues labels Apr 22, 2024
@MartinWehking MartinWehking force-pushed the event_cuda_amd_reuse branch from c3879cf to 590897e Compare May 10, 2024 09:17
@MartinWehking MartinWehking force-pushed the event_cuda_amd_reuse branch from 97e39dc to ed75dc7 Compare May 10, 2024 14:44
Martin Wehking added 5 commits May 13, 2024 12:18
Don't destroy events that are not needed anymore, but put them on a stack
of its associated queue (If there is one).

Use events from the stack before creating new ones.

This caching mechanism ensures that the number of CUDA API calls gets
significantly reduced for event creations and destructions.
Follow a similar logic as in #13f097c.

Reduce the number of HIP API calls for event creations and destructions.
Make the retrieval and pushing of events onto the stack thread-safe
by locking it with mutexes.
Use these mutexes inside the queue, which is responsible for caching
the events.

Apply further minor changes, such as returning the result status
for event releases directly, usage of the proper UR assertions
and proper styling for variable names.
Copy thread-safety mechanism for event caching from CUDA to HIP.
(#590897e9103c06ef1f1d86a6edf7c7eb525bd7d8)

Use atomic deletion of cached events in queue destructor.

Refactor minor styling/redundancy issues for the CUDA and HIP adapter.
Use a mutex to make the emptiness check and retrieval of cached events
atomic.
@MartinWehking MartinWehking force-pushed the event_cuda_amd_reuse branch from 0938386 to ffa30c7 Compare May 13, 2024 11:30
Return UR_RESULT_ERROR_INVALID_EVENT only at the end of a try catch
block and UR_RESULT_SUCCESS as default.
Martin Wehking added 2 commits May 13, 2024 15:36
Modify the return value inside urEventRelease.
Don't use a lock_guard inside another mutex locking function.

This led to deadlocks for the CUDA and HIP adapters.
@MartinWehking MartinWehking changed the title Event cuda amd reuse Add Event Caching for CUDA and HIP May 14, 2024
@MartinWehking MartinWehking changed the title Add Event Caching for CUDA and HIP [CUDA][HIP] Add Event Caching May 14, 2024
@@ -57,6 +61,8 @@ struct ur_queue_handle_t_ {
std::mutex ComputeStreamMutex;
std::mutex TransferStreamMutex;
std::mutex BarrierMutex;
// The event cache might be accessed in multiple threads.
std::mutex CacheMutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be inclined to have this close in code to the CachedEvents

@hdelan
Copy link
Contributor

hdelan commented May 15, 2024

Small nit. Otherwise LGTM

// Returns and removes an event from the CachedEvents stack.
ur_event_handle_t get_cached_event() {
std::lock_guard<std::mutex> CacheGuard(CacheMutex);
assert(!CachedEvents.empty());
Copy link
Author

@MartinWehking MartinWehking May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konradkusiak97 spotted this. Need to change this to detail::ur::assertion

// Returns and removes an event from the CachedEvents stack.
ur_event_handle_t get_cached_event() {
std::lock_guard<std::mutex> CacheGuard(CacheMutex);
assert(!CachedEvents.empty());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konradkusiak97 spotted this. Need to change this to detail::ur::assertion

@MartinWehking
Copy link
Author

MartinWehking commented May 21, 2024

Closed since it did not yield significant improvement in benchmarking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA adapter specific issues hip HIP adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants