Skip to content

[SYCL][CUDA] parallel_for with sycl::range fails with limitations to 65535 #7854

Closed
@abagusetty

Description

@abagusetty

CUDA plugin fails to launch the kernel if the global-iteration size is greater than 65535 for only sycl::range/id launches. It is perfectly fine with sycl::nd_range/nd_item. No issues with L0 plugin on Intel devices. Didn't yet test this with HIP plugin.

To Reproduce

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q;
    size_t N=65536;
    size_t M=1;

    q.parallel_for(sycl::range<2>{N, M}, [=](sycl::id<2> idx) {
    	});

Error:

PI CUDA ERROR:
	Value:           1
	Name:            CUDA_ERROR_INVALID_VALUE
	Description:     invalid argument
	Function:        cuda_piEnqueueKernelLaunch
	Source Location: ....llvm/sycl/plugins/cuda/pi_cuda.cpp:3186

Additional info:
The same kernel is fine with nd_range launch. Tried with using the flag -fno-sycl-id-queries-fit-in-int and sycl::range/id but seems to have no-effect.

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q;
    size_t N=65536;
    size_t M=1;

    sycl::range<2> global(N, M);
    sycl::range<2> local(32, 1);
    q.parallel_for(sycl::nd_range<2>{global, local}, [=](sycl::nd_item<2> item) {
     	});
}

Git commit: 74af442
CUDA: 12.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions