Question Regarding Operation offload & Graph Split #1143
Replies: 1 comment 1 reply
-
Hi, Nodes OP ADD 16(nodes), SUB 0(nodes) MUL 25(nodes) DIV 0(nodes0 But i see that my backend was only given nodes of below OP count for compute I was looking the code ggml_backend_sched_split_graph which has many iteration to assign nodes to backend with all method implemented at this function.Can you please guide me what part of implementation i should do to get all 16 ADD operation & 25 MUL operation to my custom backend compute function. I am really looking forward some explanation here, since we dont have any document on this. This is blocking my further progress. |
Beta Was this translation helpful? Give feedback.
-
Hi,
We have implemented GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL & GGML_OP_DIV and also we support
GGML_TYPE_F32 for our custom hardware. I was running llama-cli with --device my-custom-hardware vs none. I notice following
GGML_OP_MUL (45 tensors with this op offloaded to my backend) but total tensors for this OP are 67. I would like to know my
22 (67-45 with GGML_OP_MUL) are not offloaded -----Looking for this answer
Now during debugging i see ggml_backend_tensor_copy get copied from src to dst tensor ---This is one of example
Hence i opened up src tensor and following are my obsevation
p *src ----op = GGML_OP_RMS_NORM (this is copied to dst tensor which will be leaf for my customer hardware and use for Node operation)
p *src->src[0] op = GGML_OP_ADD
p *src->src[0]->src[0] ---- op = GGML_OP_MUL_MAT (this we dont support)
I see that one tensor for GGML_OP_ADD & 45 tensors for GGML_OP_MUL was offloaded to my backend
Now i did another experiment i have disable GGML_OP_MUL operation with my backend code. I see that with this change even
GGML_OP_ADD was also not offloaded to my backend-----I would like to know logic behind this (why after disabling GGML_OP_MUL it has disable GGML_OP_ADD also) .
Thanks alot!
Beta Was this translation helpful? Give feedback.
All reactions