[CUDA] Misleading error handling for hasExceededMaxRegistersPerBlock

Coming from #1299 which originally included a change of the error code, but upon further discussion with @GeorgeWeb we agreed the error handling improvement should be a separate PR, paired with correct handling of the changed error code in intel/llvm.

The issue is that when a user submits a kernel that has a reasonable work group size, but exceeds the available number of registers on the device, the `setKernelParams` function in `source/adapters/cuda/enqueue.cpp` returns `UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE` which faces user with the following error:
```
terminate called after throwing an instance of 'sycl::_V1::nd_range_error'
  what():  Non-uniform work-groups are not supported by the target device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)
```
even if they submit perfectly uniform work groups. This came up initially in https://github.com/intel/llvm/issues/12363 where the global and local sizes were 2048x788, 1024x1. **The error is simply wrong in this case - an "out of resources" error should be reported instead.**

Side note / another related but separate issue:
It also seems the intel/llvm message for `UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE` is misleading in any case, because "invalid work group size" does not always mean non-uniform work-groups, it could be e.g. exceeding the max work group size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Misleading error handling for hasExceededMaxRegistersPerBlock #1308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDA] Misleading error handling for hasExceededMaxRegistersPerBlock #1308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions