CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect.

According to cuda [documentation](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH_1g931a1af826d30fdad0b556da917f6705) the memory of the cudaKernelNodeParams struct as returned by cudaGraphKernelNodeGetParams is owned by the associated node. In the case here https://github.com/ggml-org/llama.cpp/blob/14dec0c2f29ae56917907dbf2eed6b19438d0a0e/ggml/src/ggml-cuda/ggml-cuda.cu#L2546 the nodes in question where created using stream capture, ie they are owned by the cuda/hip runtime.

we later modify this struct by replacing one of its pointer members with the address inside a block of memory we own:
https://github.com/ggml-org/llama.cpp/blob/14dec0c2f29ae56917907dbf2eed6b19438d0a0e/ggml/src/ggml-cuda/ggml-cuda.cu#L2565


We have thus modified the node by simply replacing a pointer to a member with memory owned by the runtime with a pointer to memory we own. **There is no way this results in well defined behavior** and indeed the cuda documentation prohibits this action, see the link to the documentation above:
> This memory remains valid until the node is destroyed or its parameters are modified, and should not be modified directly.

Presumably this happens to work on cuda right now either because the runtime happens to allocate the pointer we are updating as part of a larger block separately malloced and the pointer happens to not be the first address in the allocated block, or because the runtime simply is leaking this memory.

On hip, the runtime mallocs() memory all the kernel pointers separately and then frees() the memory when the node is destroyed. This of course causses an invalid free() when the runtime encounters the pointer we changed to point to our memory.

We could avoid this by not changing the pointer to memory we own, but to instead simply update the value it holds:

```
diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index d23686d1..1825124d 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -2562,7 +2562,7 @@ static void maintain_cuda_graph(ggml_backend_cuda_context * cuda_ctx, std::vecto
         for (size_t i = 0; i < cuda_ctx->cuda_graph->num_nodes; i++) {
             if(count(ggml_cuda_cpy_fn_ptrs.begin(), ggml_cuda_cpy_fn_ptrs.end(), cuda_ctx->cuda_graph->params[i].func) > 0) {
                 char ** updated_kernel_arg_ptr = cuda_ctx->cuda_graph->updated_kernel_arg.at(k++);
-                cuda_ctx->cuda_graph->params[i].kernelParams[1] = updated_kernel_arg_ptr;
+                *(void**)cuda_ctx->cuda_graph->params[i].kernelParams[1] = *(void**)updated_kernel_arg_ptr;
                 CUDA_CHECK(cudaGraphKernelNodeSetParams(cuda_ctx->cuda_graph->nodes[i], &cuda_ctx->cuda_graph->params[i]));
             }
         }
 ```

I have verfied by looking at the hip runtime code and discussion with an [amd engeneer](https://github.com/ROCm/clr/issues/138) that this is fine to do for hip, but this still violates the provision to not modify the parameters given by the cuda documentation and i have no idea if this is safe to do there. The only way to not violate the constraints given by the documentation would be assemble a cudaKernelNodeParams struct by hand from scratch in this possition:

https://github.com/ggml-org/llama.cpp/blob/14dec0c2f29ae56917907dbf2eed6b19438d0a0e/ggml/src/ggml-cuda/ggml-cuda.cu#L2564 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions