Skip to content

[SYCL][ROCm] Fix freeing USM managed pointer with NVIDIA #4123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions sycl/plugins/rocm/pi_rocm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4281,9 +4281,20 @@ pi_result rocm_piextUSMFree(pi_context context, void *ptr) {
ScopedContext active(context);
unsigned int type;
hipPointerAttribute_t hipPointerAttributeType;
result =
PI_CHECK_ERROR(hipPointerGetAttributes(&hipPointerAttributeType, ptr));
hipError_t ret = hipPointerGetAttributes(&hipPointerAttributeType, ptr);
type = hipPointerAttributeType.memoryType;
#ifdef __HIP_PLATFORM_NVIDIA__
// The NVIDIA hipPointerGetAttributes implementation doesn't know about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other calls to hipPointerGetAttributes which will continue to return error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I only ran into this call failing, but yeah it will happen with the others too, I'll update them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider doing this adjustment in the function itself, if feasible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick update on this patch, I'm currently looking at this and some of the other HIP workarounds in the ROCm plugin and figuring out if I can fix them in the HIP headers directly and submit the fixes to HIP upstream.

However I do think we'll probably still need the workarounds in the ROCm plugin, at least until the next HIP release and if my fixes get approved. So I'll come back to this patch in a bit and make it a proper workaround for all the other uses of hipPointerGetAttributes as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it changed up-stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FMarno did we end up submitting a patch upstream for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, but it's one of the things we are looking at in the next couple weeks. I'll write a note to notify you when we do something about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npmiller looks like AMD put a patch in for this last week ROCm/hipamd@88f1622. You should be able to do a proper fix now.

// managed pointers and will return hipErrorUnknown when encountering them,
// managed pointers are released just like device pointer so treat this as
// a device pointer. Note that other attributes of hipPointerAttributeType
// won't be set correctly here, but only the type is used in this function.
if (ret == hipErrorUnknown) {
ret = hipSuccess;
type = hipMemoryTypeDevice;
}
#endif
result = PI_CHECK_ERROR(ret);
assert(type == hipMemoryTypeDevice or type == hipMemoryTypeHost);
if (type == hipMemoryTypeDevice) {
result = PI_CHECK_ERROR(hipFree(ptr));
Expand Down