Open
Description
Building on top of intel/llvm#12604 + #1318 which adds handleOutOfResources
to dpcpp and returns UR_RESULT_ERROR_OUT_OF_RESOURCES
, the local mem size check:
unified-runtime/source/adapters/cuda/enqueue.cpp
Lines 294 to 298 in f086f36
should also return
UR_RESULT_ERROR_OUT_OF_RESOURCES
and have dedicated error handling case added in handleOutOfResources
.
Right now submitting a kernel with too large local mem size results in:
Native API failed. Native API returns: -996 (The plugin has emitted a backend specific error)
Excessive allocation of local memory on the device
-996 (The plugin has emitted a backend specific error)
which does contain a helpful exception message, but wrapped in generic and confusing "backend specific error" messages and the unhelpful code -996. Having this returning ERROR_OUT_OF_RESOURCES
would make it easier for us to cover in the troubleshooting guide, and for users to find it with web search engines.