Skip to content

[CUDA] Misleading error handling for hasExceededMaxRegistersPerBlock #1308

Open
@rafbiels

Description

@rafbiels

Coming from #1299 which originally included a change of the error code, but upon further discussion with @GeorgeWeb we agreed the error handling improvement should be a separate PR, paired with correct handling of the changed error code in intel/llvm.

The issue is that when a user submits a kernel that has a reasonable work group size, but exceeds the available number of registers on the device, the setKernelParams function in source/adapters/cuda/enqueue.cpp returns UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE which faces user with the following error:

terminate called after throwing an instance of 'sycl::_V1::nd_range_error'
  what():  Non-uniform work-groups are not supported by the target device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)

even if they submit perfectly uniform work groups. This came up initially in intel/llvm#12363 where the global and local sizes were 2048x788, 1024x1. The error is simply wrong in this case - an "out of resources" error should be reported instead.

Side note / another related but separate issue:
It also seems the intel/llvm message for UR_RESULT_ERROR_INVALID_WORK_GROUP_SIZE is misleading in any case, because "invalid work group size" does not always mean non-uniform work-groups, it could be e.g. exceeding the max work group size.

Metadata

Metadata

Assignees

Labels

cudaCUDA adapter specific issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions