Skip to content

[UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. #17571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 10, 2025

Conversation

aarongreig
Copy link
Contributor

@aarongreig aarongreig commented Mar 21, 2025

In the cuda adapter the adapter struct itself is currently an extern global defined in adapter.cpp. This means fully tearing down the adapter is subject to the same destructor ordering as all other static and global variables, it's first in last out. This presents a problem because an application can declare a static sycl object like a buffer right up top before doing anything else, which results in the sycl object being destroyed after the cuda adapter struct.

The UR spec doesn't put the onus on users to keep their parent object lifetimes in order, i.e. there is no statement about "the context you use to create a ur_mem_handle_t must not be released until after the mem_handle". It's assumed (by omission rather than explicitly) that adapters will have their objects keep a reference to any parent objects alive for the duration of their own lifetime.

This change moves the cuda adapter structs ownership into a global shared_ptr, which allows child objects of the adapter to keep their own references to it alive past the point where its initial definition goes out of scope. Also adjusts how some other objects track parent object references so that the destructors correctly cascade back to the top: mem handle releases its context, which releases its adapter, which releases the platform + devices, etc.

All of this also applies to the hip adapter, although it seems something in hip itself prevents this change from fixing the static_buffer_dtor test - see #17571 (comment)

Fixes #17450

In the cuda adapter the adapter struct itself is currently an extern
global defined in adapter.cpp. This means fully tearing down the adapter
is subject to the same destructor ordering as all other static and
global variables, it's first in last out. This presents a problem
because an application can declare a static sycl object like a buffer
right up top before doing anything else, which results in the sycl
object being destroyed after the cuda adapter struct.

The UR spec doesn't put the onus on users to keep their parent object
lifetimes in order, i.e. there is no statement about "the context you
use to create a ur_mem_handle_t must not be released until after the
mem_handle". It's assumed (by omission rather than explicitly) that
adapters will have their objects keep a reference to any parent objects
alive for the duration of their own lifetime.

This change moves the cuda adapter structs ownership into a global
shared_ptr, which allows child objects of the adapter to keep their own
references to it alive past the point where its initial definition goes
out of scope. Also adjusts how some other objects track parent object
references so that the destructors correctly cascade back to the top:
mem handle releases its context, which releases its adapter, which
releases the platform + devices, etc.

Fixes intel#17450
@aarongreig aarongreig marked this pull request as ready for review March 21, 2025 16:58
@aarongreig aarongreig requested review from a team as code owners March 21, 2025 16:58
@aarongreig aarongreig requested review from Seanst98 and againull March 21, 2025 16:58
@aarongreig
Copy link
Contributor Author

this is a wider problem with the adapters (and spec to some extent) which I plan to address, but I think it's fine for this to go through as a stand-alone patch for now

@aarongreig aarongreig changed the title [UR][CUDA] Allow cuda adapter objects to block full adapter teardown. [UR][CUDA][HIP] Allow adapter objects to block full adapter teardown. Mar 25, 2025
Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl changes LGTM

npmiller added a commit to npmiller/llvm that referenced this pull request Apr 9, 2025
`urDeviceRetain` and `urDeviceRelease` are no-ops, so this patch removes
all uses.

Uses of it in the context will be removed in intel#17571
@npmiller npmiller requested a review from againull April 9, 2025 16:45
@aarongreig
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge

@martygrant martygrant merged commit 18e36c5 into intel:sycl Apr 10, 2025
32 checks passed
kbenzie pushed a commit to oneapi-src/unified-runtime that referenced this pull request Apr 10, 2025
In the cuda adapter the adapter struct itself is currently an extern
global defined in adapter.cpp. This means fully tearing down the adapter
is subject to the same destructor ordering as all other static and
global variables, it's first in last out. This presents a problem
because an application can declare a static sycl object like a buffer
right up top before doing anything else, which results in the sycl
object being destroyed after the cuda adapter struct.

The UR spec doesn't put the onus on users to keep their parent object
lifetimes in order, i.e. there is no statement about "the context you
use to create a ur_mem_handle_t must not be released until after the
mem_handle". It's assumed (by omission rather than explicitly) that
adapters will have their objects keep a reference to any parent objects
alive for the duration of their own lifetime.

This change moves the cuda adapter structs ownership into a global
shared_ptr, which allows child objects of the adapter to keep their own
references to it alive past the point where its initial definition goes
out of scope. Also adjusts how some other objects track parent object
references so that the destructors correctly cascade back to the top:
mem handle releases its context, which releases its adapter, which
releases the platform + devices, etc.

All of this also applies to the hip adapter, although it seems something
in hip itself prevents this change from fixing the static_buffer_dtor
test - see
intel/llvm#17571 (comment)

Fixes intel/llvm#17450
martygrant pushed a commit that referenced this pull request Apr 14, 2025
`urDeviceRetain` and `urDeviceRelease` are no-ops, so this patch removes
all uses.

Uses of it in the context will be removed in
#17571
kbenzie pushed a commit to oneapi-src/unified-runtime that referenced this pull request Apr 16, 2025
`urDeviceRetain` and `urDeviceRelease` are no-ops, so this patch removes
all uses.

Uses of it in the context will be removed in
intel/llvm#17571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SYCL][UR][CUDA] wrong order of the ~ur_device_handle_t_() destructor and a user-app static buffer destructor
4 participants