-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][CUDA] Support host-device memcpy2D #8181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sycl/plugins/cuda/pi_cuda.cpp
Outdated
? reinterpret_cast<CUdeviceptr>(src_ptr) | ||
: 0; | ||
cpyDesc.srcHost = (src_type == CU_MEMORYTYPE_HOST) ? src_ptr : nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this behavior is becoming more complex, would it make sense to split it into a helper function so we can avoid repeating the code? I can see two ways of doing it:
- Implement it as a function returning a triple, i.e. something like:
template<typename PtrT>
static std::tuple<CUmemorytype, CUdeviceptr, PtrT> getUSMHostOrDevicePtr(PtrT usm_ptr) { ... }
...
std::tie(cpyDesc.srcMemoryType, cpyDesc.srcDevice, cpyDesc.srcHost) = getUSMHostOrDevicePtr(src_ptr);
...
std::tie(cpyDesc.dstMemoryType, cpyDesc.dstDevice, cpyDesc.dstHost) = getUSMHostOrDevicePtr(dst_ptr);
- Do the same, but make it
void
and pass the members to set as pointers/references:
template<typename PtrT>
static void getUSMHostOrDevicePtr(PtrT usm_ptr, CUmemorytype *out_mem_type, CUdeviceptr *out_dev_ptr, PtrT *out_host_ptr) { ... }
...
getUSMHostOrDevicePtr(src_ptr, &cpyDesc.srcMemoryType, &cpyDesc.srcDevice, &cpyDesc.srcHost);
...
getUSMHostOrDevicePtr(dst_ptr, &cpyDesc.dstMemoryType, &cpyDesc.dstDevice, &cpyDesc.dstHost);
Note I use pointers above because I like to think it makes the intention clearer here, but I do not have a strong preference for one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I agree.
sycl/plugins/cuda/pi_cuda.cpp
Outdated
"ARRAY, UNIFIED types are not supported!"); | ||
|
||
// pointer not known to the CUDA subsystem (possibly a system allocated ptr) | ||
if (ret == CUDA_ERROR_INVALID_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to check only for two possible error codes from cuPointerGetAttribute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. I could classify them into CUDA_ERROR_INVALID_VALUE
since that is the only way to check for host-ptrs, CUDA_SUCCESS
or not that asserts through the existing macro. Good tip.
/verify with intel/llvm-test-suite#1597 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Addresses to support host-device memcpy2D copies