Skip to content

[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts #4255

Closed as not planned
@masterleinad

Description

@masterleinad

Describe the bug
Running

#include <iostream>
#include <CL/sycl.hpp>

int main(int, char**) {
   cl::sycl::default_selector device_selector;
   cl::sycl::queue queue(device_selector);
   std::cout << "Running on "
             << queue.get_device().get_info<cl::sycl::info::device::name>()
             << "\n";
   size_t N = INT_MAX; //breaks for CUDA
   // size_t N = 5000000000; // breaks for Intel
   sycl::range<1> range(N+1);
   auto parallel_for_event = queue.submit([&](sycl::handler& cgh) {
     cgh.parallel_for(range, [=](sycl::item<1> /*item*/) {});
   });

   return 0;
}

deadlocks on CUDA devices or gives

C++ exception with description "PI backend failed. PI backend returns: -54 (CL_INVALID_WORK_GROUP_SIZE) -54 (CL_INVALID_WORK_GROUP_SIZE)" thrown in the test body.

on Intel GPUs when compiled and run via

clang++ -fsycl -fsycl-unnamed-lambda -fno-sycl-id-queries-fit-in-int -fsycl-targets=nvptx64-nvidia-cuda-sycldevice && ./a.out

resp.

clang++ -fsycl -fsycl-unnamed-lambda -fno-sycl-id-queries-fit-in-int dummy.cc && ./a.out

Environment:

  • OS: Linux
  • Target device and vendor: Intel GPU, NVIDIA GPU
  • DPC++ version: nightly release 20210621

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't workingruntimeRuntime library related issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions