Skip to content

[RFC] Fixing dangling pointers caused by urCommandBufferRelease #1898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Jul 25, 2024

Some functions in the UR leaves dangling pointers, such as commandBufferReleaseInternal, which is called by urCommandBufferReleaseExp. Coverity has detected that on intel/llvm, when urCommandBufferReleaseExp is called from the pi with tracing enabled, dangling pointers left by urCommandBufferReleaseExp are used again here (specifically, the pi_ext_command_buffer/ur_exp_command_buffer_handle_t fed into Args, which was already freed in urCommandBufferRelease).

The best course of action to fix this Coverity hit, to the best of my knowledge, is to fix the dangling pointers left by e.g. urCommandBufferReleaseExp. This PR contains a potential fix by taking references of ur_exp_command_buffer_handle_t when calling urCommandBufferReleases instead, and setting the pointers to nullptr after they are deleted. This would allow code using urCommandBufferReleaseExp to actually catch that the command buffer has been freed.

Unfortunately, this requires the API to be changed, as I've changed the function signature of urCommandBufferReleaseExp. I'm also aware that changing only urCommandBufferRelease may result in asymmetry within the API. Thus, if this change was to be adopted, more functions may also have to be changed in a similar manner. Thus, I am opening this PR as an RFC to see what other people think of this, and to see if there are any potential issues with this approach.

Additionally, if anyone has better ideas for fixing/handling this issue/Coverity hit without changing the UR API, please feel free to let me know. From what I can tell, this is the course of action that'll result in the least amount of problems in the future.

The respective draft DPC++ testing PR is here: intel/llvm#14782

@github-actions github-actions bot added loader Loader related feature/bug level-zero L0 adapter specific issues cuda CUDA adapter specific issues hip HIP adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Jul 25, 2024
@github-actions github-actions bot added the native-cpu Native CPU adapter specific issues label Jul 25, 2024
@@ -8314,7 +8314,7 @@ urCommandBufferRetainExp(
/// - ::UR_RESULT_ERROR_OUT_OF_HOST_MEMORY
UR_APIEXPORT ur_result_t UR_APICALL
urCommandBufferReleaseExp(
ur_exp_command_buffer_handle_t hCommandBuffer ///< [in][release] Handle of the command-buffer object.
ur_exp_command_buffer_handle_t &hCommandBuffer ///< [in][release] Handle of the command-buffer object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is a C API, there can not be use of C++ references in it. Additionally, this header is generated from a spec description in yaml files which reside in scripts/core/*.yml.

if (!CommandBuffer->RefCount.decrementAndTest())
return UR_RESULT_SUCCESS;

delete CommandBuffer;
CommandBuffer = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the responsibility of the level zero adapter to zero out the handle pointer as part of the urCommandBufferReleaseExp entry point. The proper fix for this should be in the layer above which is calling this entry point.

@ianayl
Copy link
Contributor Author

ianayl commented Jul 29, 2024

With the removal of the PI layer in intel/llvm#14145, there would no longer be a need for such mechanism. Thus, I am closing this. Thanks for your time and patience regardless!

@ianayl ianayl closed this Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification cuda CUDA adapter specific issues hip HIP adapter specific issues level-zero L0 adapter specific issues loader Loader related feature/bug native-cpu Native CPU adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants