Skip to content

[CUDA] Max local mem size check should return OUT_OF_RESOURCES #1322

Open
@rafbiels

Description

@rafbiels

Building on top of intel/llvm#12604 + #1318 which adds handleOutOfResources to dpcpp and returns UR_RESULT_ERROR_OUT_OF_RESOURCES, the local mem size check:

if (LocalSize > static_cast<uint32_t>(Device->getMaxCapacityLocalMem())) {
setErrorMessage("Excessive allocation of local memory on the device",
UR_RESULT_ERROR_ADAPTER_SPECIFIC);
return UR_RESULT_ERROR_ADAPTER_SPECIFIC;
}

should also return UR_RESULT_ERROR_OUT_OF_RESOURCES and have dedicated error handling case added in handleOutOfResources.

Right now submitting a kernel with too large local mem size results in:

Native API failed. Native API returns: -996 (The plugin has emitted a backend specific error)
Excessive allocation of local memory on the device
 -996 (The plugin has emitted a backend specific error)

which does contain a helpful exception message, but wrapped in generic and confusing "backend specific error" messages and the unhelpful code -996. Having this returning ERROR_OUT_OF_RESOURCES would make it easier for us to cover in the troubleshooting guide, and for users to find it with web search engines.

Metadata

Metadata

Assignees

Labels

cudaCUDA adapter specific issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions