Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [YES] I am running the latest code. bc9d3e3
- [YES] I carefully followed the README.md.
- [YES] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [YES] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Correct uploading of contiguous 3D tensor data to GPU.
Current Behavior
ggml_cl_h2d_tensor_2d
uses offset
argument as byte offset in a call to clEnqueueWriteBuffer
. ggml_cl_transform_tensor
passes element count as offset
to ggml_cl_h2d_tensor_2d
. This corresponds to byte offset only if element size is exactly 1.
Also, I don't understand why ggml_cl_mul_f32
passes non-zero offset to ggml_cl_h2d_tensor_2d
.
Environment and Context
AMD GPU
Linux
Steps to Reproduce
- Pass 3D tensor with contiguous
GGML_TYPE_F16
orGGML_TYPE_F32
data toggml_cl_transform_tensor
. - Read data back from GPU memory or perform
ggml_cl_mul_mat
on that tensor. - Observe incorrect data or result.
Ping
Metadata
Metadata
Assignees
Labels
No labels