[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts

**Describe the bug**
Running
```C++
#include <iostream>
#include <CL/sycl.hpp>

int main(int, char**) {
   cl::sycl::default_selector device_selector;
   cl::sycl::queue queue(device_selector);
   std::cout << "Running on "
             << queue.get_device().get_info<cl::sycl::info::device::name>()
             << "\n";
   size_t N = INT_MAX; //breaks for CUDA
   // size_t N = 5000000000; // breaks for Intel
   sycl::range<1> range(N+1);
   auto parallel_for_event = queue.submit([&](sycl::handler& cgh) {
     cgh.parallel_for(range, [=](sycl::item<1> /*item*/) {});
   });

   return 0;
}
```
deadlocks on CUDA devices or gives
```
C++ exception with description "PI backend failed. PI backend returns: -54 (CL_INVALID_WORK_GROUP_SIZE) -54 (CL_INVALID_WORK_GROUP_SIZE)" thrown in the test body.
```
on Intel GPUs when compiled and run via
```
clang++ -fsycl -fsycl-unnamed-lambda -fno-sycl-id-queries-fit-in-int -fsycl-targets=nvptx64-nvidia-cuda-sycldevice && ./a.out
```
resp.
```
clang++ -fsycl -fsycl-unnamed-lambda -fno-sycl-id-queries-fit-in-int dummy.cc && ./a.out
```

**Environment:**

- OS: Linux
- Target device and vendor: Intel GPU, NVIDIA GPU
- DPC++ version: nightly release 20210621

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts #4255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Level Zero] sycl::parallel_for with ranges larger than INT_MAX deadlocks or aborts #4255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions