Skip to content

Conversation

@max-krasnyansky
Copy link
Collaborator

Introduce fastdiv and fixtest-backend-ops failures for ADD/SUB/MUL
Thanks @chraac!
Subsumes #17042

Fixed inference with Qwen3-VL models that generate graphs with ne[1] == 0

@chraac
Copy link
Contributor

chraac commented Nov 10, 2025

Great work! I've tested the fix on my device and it works well.

One small thing: init_fastdiv_values is called for each binary operation during each calculation.
However, this should have minimal performance impact since it only contains constant shifts and mpys, so I think it's acceptable.

Looking ahead, have you considered maintaining a similar structure to store these values on the CPU? That could be optimal since the weight dimensions remain fixed throughout all stages.

Copy link
Contributor

@chraac chraac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

max-krasnyansky and others added 4 commits November 10, 2025 15:03
llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.
@max-krasnyansky max-krasnyansky marked this pull request as ready for review November 10, 2025 23:45
@max-krasnyansky
Copy link
Collaborator Author

Great work! I've tested the fix on my device and it works well.

One small thing: init_fastdiv_values is called for each binary operation during each calculation. However, this should have minimal performance impact since it only contains constant shifts and mpys, so I think it's acceptable.

Looking ahead, have you considered maintaining a similar structure to store these values on the CPU? That could be optimal since the weight dimensions remain fixed throughout all stages.

Yeah, I did lots of profiling runs and the overall perf is the same as before but all test-backend-ops for ADD/SUB/MUl/ADD_ID are passing now. Definitely acceptable :)

And yes, let's think about caching those. I was thinking host at first as we discussed in the other PR. But maybe it makes sense to allocate a little bit of vtcm and cache them there.

@max-krasnyansky
Copy link
Collaborator Author

@lhez please take a look

Copy link
Collaborator

@lhez lhez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@max-krasnyansky max-krasnyansky merged commit c273d75 into ggml-org:master Nov 11, 2025
70 of 71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants