Question Regarding Operation offload & Graph Split #1143

akapoor3518 · 2025-03-12T20:54:56Z

akapoor3518
Mar 12, 2025

Hi,
We have implemented GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL & GGML_OP_DIV and also we support
GGML_TYPE_F32 for our custom hardware. I was running llama-cli with --device my-custom-hardware vs none. I notice following
GGML_OP_MUL (45 tensors with this op offloaded to my backend) but total tensors for this OP are 67. I would like to know my
22 (67-45 with GGML_OP_MUL) are not offloaded -----Looking for this answer

Now during debugging i see ggml_backend_tensor_copy get copied from src to dst tensor ---This is one of example
Hence i opened up src tensor and following are my obsevation
p *src ----op = GGML_OP_RMS_NORM (this is copied to dst tensor which will be leaf for my customer hardware and use for Node operation)
p *src->src[0] op = GGML_OP_ADD
p *src->src[0]->src[0] ---- op = GGML_OP_MUL_MAT (this we dont support)

I see that one tensor for GGML_OP_ADD & 45 tensors for GGML_OP_MUL was offloaded to my backend

Now i did another experiment i have disable GGML_OP_MUL operation with my backend code. I see that with this change even
GGML_OP_ADD was also not offloaded to my backend-----I would like to know logic behind this (why after disabling GGML_OP_MUL it has disable GGML_OP_ADD also) .

Thanks alot!

akapoor3518 · 2025-03-17T23:37:34Z

akapoor3518
Mar 17, 2025
Author

Hi,
I am still waiting for my answer.I am running tinyllama-vo-5m-para.gguf & using llama-cli with one token generation. My Custom backend has support for following ( i have support backend support & offloaded API) operation:
following OP: GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL & GGML_OP_DIV)
At ggml_backend_sched_split_graph i had printed graph->nodes with op of my interest

Nodes OP ADD 16(nodes), SUB 0(nodes) MUL 25(nodes) DIV 0(nodes0

But i see that my backend was only given nodes of below OP count for compute
ADD 1 times
MUL 17 times

I was looking the code ggml_backend_sched_split_graph which has many iteration to assign nodes to backend with all method implemented at this function.Can you please guide me what part of implementation i should do to get all 16 ADD operation & 25 MUL operation to my custom backend compute function. I am really looking forward some explanation here, since we dont have any document on this. This is blocking my further progress.

1 reply

akapoor3518 Mar 17, 2025
Author

Hi @ggerganov its kind little complicated hence your guidance will help here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding Operation offload & Graph Split #1143

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Question Regarding Operation offload & Graph Split #1143

akapoor3518 Mar 12, 2025

Replies: 1 comment · 1 reply

akapoor3518 Mar 17, 2025 Author

akapoor3518 Mar 17, 2025 Author

akapoor3518
Mar 12, 2025

Replies: 1 comment 1 reply

akapoor3518
Mar 17, 2025
Author

akapoor3518 Mar 17, 2025
Author