Description
TLDR: OpenCL adapter implementation of urEnqueueUSMFill
calls clEnqueueMemFillINTEL
for power-of-2 pattern size without checking destination memory alignment required by clEnqueueMemFillINTEL
.
Full Story:
I ran into this issue when trying to add sycl::queue::fill
test in intel/llvm#15991 (specific CI run failure: https://github.com/intel/llvm/actions/runs/11971631535/job/33695502538?pr=15991)
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Enqueue process failed.
Aborted (core dumped)
I will disable the OpenCL CPU backend in the e2e test, linking this issue in a comment, so it can be re-enabled when the problem is solved.
The minimal reproducer for the issue is:
#include <sycl/sycl.hpp>
#include <array>
constexpr size_t PatternSize{32}; // bytes
constexpr size_t NumElements{10};
int main() {
sycl::queue q{};
using T = std::array<uint8_t, PatternSize>;
T value{};
T *dptr{sycl::malloc_device<T>(NumElements, q)};
q.fill(dptr, value, NumElements).wait();
sycl::free(dptr, q);
return 0;
}
compiled and ran with:
clang++ -fsycl -fsycl-targets=spir64_x86_64 -o mini minimal.cpp
ONEAPI_DEVICE_SELECTOR=opencl:cpu ./mini
Debugging further with this UR change:
diff --git a/source/adapters/opencl/usm.cpp b/source/adapters/opencl/usm.cpp
index dfcc1dfa..ed18659e 100644
--- a/source/adapters/opencl/usm.cpp
+++ b/source/adapters/opencl/usm.cpp
@@ -276,12 +276,15 @@ UR_APIEXPORT ur_result_t UR_APICALL urEnqueueUSMFill(
cl_ext::getExtFuncFromContext<clEnqueueMemFillINTEL_fn>(
CLContext, cl_ext::ExtFuncPtrCache->clEnqueueMemFillINTELCache,
cl_ext::EnqueueMemFillName, &EnqueueMemFill));
-
- CL_RETURN_ON_FAILURE(
+ CLErr =
EnqueueMemFill(cl_adapter::cast<cl_command_queue>(hQueue), ptr,
pPattern, patternSize, size, numEventsInWaitList,
cl_adapter::cast<const cl_event *>(phEventWaitList),
- cl_adapter::cast<cl_event *>(phEvent)));
+ cl_adapter::cast<cl_event *>(phEvent));
+ std::cout << "EnqueueMemFillINTEL(patternSize=" << patternSize << ") return code " << CLErr << std::endl;
+ if (CLErr != CL_SUCCESS) {
+ return mapCLErrorToUR(CLErr);
+ }
return UR_RESULT_SUCCESS;
}
confirmed that it's the clEnqueueMemFillINTEL
call which sometimes returns -30 (CL_INVALID_VALUE
).
EnqueueMemFillINTEL(patternSize=32) return code -30
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Enqueue process failed.
Aborted (core dumped)
I noticed that in my build of UR / DPC++ this happens when the binary name is shorter than 24 characters, but stops happening when it is longer. Simply renaming the file changes the behaviour. In another build I got the opposite behaviour where short-named binary succeeds but long-named fails. I assume what happens is that long file name causes heap allocation for argv[0]
and shifts the memory layout, and thus alignment of the device allocation.
$ # short name fails
$ ./mini
EnqueueMemFillINTEL(patternSize=32) return code -30
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Enqueue process failed.
Aborted (core dumped)
$ # rename the file
$ cp mini mini12341234123412341234
$ # long name succeeds
$ ./mini12341234123412341234
EnqueueMemFillINTEL(patternSize=32) return code 0
I note that clEnqueueMemFillINTEL
as described here:
https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_unified_shared_memory.html
may return:
CL_INVALID_VALUE if dst_ptr is NULL, or if dst_ptr is not aligned to pattern_size bytes
so I assume this is what happens as I checked the other conditions for returning CL_INVALID_VALUE
are not met.
IIUC neither SYCL API nor UR API make any requirements about the destination memory alignment for their USM fill functions, therefore it is incorrect for the implementation to assume alignment. I think the solution here could be to check the alignment and take the other (slower) path which doesn't call clEnqueueMemFillINTEL
when the alignment requirement is not met.
Side note: something seems to be lost in error handling here as the user is informed neither about the error code (INVALID_VALUE) nor its origin (USM fill). There is only a generic exception thrown by the SYCL runtime.