Closed
Description
CUDA plugin fails to launch the kernel if the global-iteration size is greater than 65535 for only sycl::range/id
launches. It is perfectly fine with sycl::nd_range/nd_item
. No issues with L0 plugin on Intel devices. Didn't yet test this with HIP plugin.
To Reproduce
#include <sycl/sycl.hpp>
int main() {
sycl::queue q;
size_t N=65536;
size_t M=1;
q.parallel_for(sycl::range<2>{N, M}, [=](sycl::id<2> idx) {
});
Error:
PI CUDA ERROR:
Value: 1
Name: CUDA_ERROR_INVALID_VALUE
Description: invalid argument
Function: cuda_piEnqueueKernelLaunch
Source Location: ....llvm/sycl/plugins/cuda/pi_cuda.cpp:3186
Additional info:
The same kernel is fine with nd_range launch. Tried with using the flag -fno-sycl-id-queries-fit-in-int
and sycl::range/id
but seems to have no-effect.
#include <sycl/sycl.hpp>
int main() {
sycl::queue q;
size_t N=65536;
size_t M=1;
sycl::range<2> global(N, M);
sycl::range<2> local(32, 1);
q.parallel_for(sycl::nd_range<2>{global, local}, [=](sycl::nd_item<2> item) {
});
}
Git commit: 74af442
CUDA: 12.0