RoPE cache #887

ikawrakow · 2025-11-02T09:43:20Z

When performing the RoPE operation one needs to compute the rotation angles for each token in the batch and each Q and K attention head. When there are no per layer frequency factors, this computation only depends on the token position and the index within the attention head, so is exactly the same (per token) for every layer and each head. Hence, one could simply compute the cosine and sine of the rotation angles once per graph, and then reuse the result for all layers.

This PR implements this idea for a subset of the supported models (Qwen3, Qwen3-MoE, Ling/Ring, GPT-OSS, GLM-4.5-MoE).

We observe small but noticeable performance gains for PP and TG.

The PR needs a bit more work to add a command-line argument to enable/disable this feature as the implementation is only for the CUDA and CPU back-ends. Also, for now the implementation is only for the NEOX and NORM RoPE variants, so vision related RoPE variants still need to get implemented.

Still, putting it out there for testing.

It will also be interesting to see how long it will take until this optimization is fully independently discovered in mainline llama.cpp /s

When computing RoPE, the rotation angles in each layer are exactly the same, and only depend on the token positions (and other constant, model dependent parameters). So, I wonder, why don't we compute the angles just once and then reuse for the Q and K RoPE in each layer? This commit does it as a POC on the CPU, and uses it in the Qwen3-MoE compute graph.

Iwan Kawrakow added 9 commits November 1, 2025 15:58

cuda: neox works

d5e37dd

WIP

d825324

rope_cache: norm works

a814305

Fused rope+rope

2c8499e

Fused rope+rope (norm)

c7f6d7c

Fused rms+rms+rope+rope (neox) - not working

dd9dd11

WIP

9c9b409

Also qwen3

2078b40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RoPE cache #887

RoPE cache #887

ikawrakow commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RoPE cache #887

Are you sure you want to change the base?

RoPE cache #887

Conversation

ikawrakow commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants