Description
Coming from #1299 which originally included a change of the error code, but upon further discussion with @GeorgeWeb we agreed the error handling improvement should be a separate PR, paired with correct handling of the changed error code in intel/llvm.
The issue is that when a user submits a kernel that has a reasonable work group size, but exceeds the available number of registers on the device, the setKernelParams
function in source/adapters/cuda/enqueue.cpp
returns UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE
which faces user with the following error:
terminate called after throwing an instance of 'sycl::_V1::nd_range_error'
what(): Non-uniform work-groups are not supported by the target device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)
even if they submit perfectly uniform work groups. This came up initially in intel/llvm#12363 where the global and local sizes were 2048x788, 1024x1. The error is simply wrong in this case - an "out of resources" error should be reported instead.
Side note / another related but separate issue:
It also seems the intel/llvm message for UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE
is misleading in any case, because "invalid work group size" does not always mean non-uniform work-groups, it could be e.g. exceeding the max work group size.