Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18401

For the motivation of this proposal, please refer to this discussion: ggml-org/ggml#1401

For demo purposes, only 2 backends are implemented in this PR:

  • CPU
  • Metal

Main differences

ggml_rope_ext ggml_rope_comp
One single API call Multiple composable calls
High-level input params like freq_base, YaRN scaling Lower-level input params, allow most of the static params to be customized by user code
Multiple kernels, one per mode One single kernel, templated by mode (mrope/vision) or controlled via input arg (ordering neox/normal)
I32 type for position F32 type for position; shape = 1D for text and 4D for m-rope
m-rope only supports neox ordering m-rope now supports both neox and normal
Does not support offset n_rot Allow offset n_rot [1]

[1] This is necessary because we may want to implement 2D-rope (vision mode) as 2 separated calls to ggml_rope_comp, one call with offset = 0 and the other with offset = n_rot/2. This is particularly useful for vision models like Pixtral, where the 2 parts of 2D-rope does not use the same freq configuration.

Performance

I still couldn't get a meaningful perf result due to --output csv not working with test-backend-ops. But at a glance via --output console, it provide the same performance as the existing ggml_rope_ext

@loci-dev loci-dev force-pushed the main branch 9 times, most recently from f14c301 to c7d40d0 Compare December 28, 2025 11:07
@loci-dev loci-dev force-pushed the main branch 5 times, most recently from f2e8c7f to b3f45e1 Compare December 29, 2025 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants