Skip to content

llama: use a more efficient rope implementation#12434

Merged
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
rattus128:prs/llama-rope
Feb 13, 2026
Merged

llama: use a more efficient rope implementation#12434
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
rattus128:prs/llama-rope

Conversation

@rattus128
Copy link
Contributor

Get rid of the cat and unary negation and inplace add-cmul the two halves of the rope. Precompute -sin once at the start of the model rather than every transformer block.

This is slightly faster on both GPU and CPU bound setups.

I found this profiling ace-step dynamic_vram and the RoPE was the biggest thing on the flame graph.

Example test conditions:

RTX5090, linux, Ryzen 9600 underclocked to 2GHz (CPU bound)
Ace step 1.5 turbo 195s

Before:

LM sampling: 100%|██████████| 975/975 [00:15<00:00, 63.05it/s]
Requested to load ACEStep15
loaded completely; 25390.26 MB usable, 4565.35 MB loaded, full load: True
100%|██████████| 8/8 [00:00<00:00,  9.51it/s]
Requested to load AudioOobleckVAE
loaded completely;  321.70 MB loaded, full load: True
Prompt executed in 30.84 seconds

After:

LM sampling: 100%|██████████| 975/975 [00:14<00:00, 66.60it/s]
Requested to load ACEStep15
loaded completely; 25390.26 MB usable, 4565.35 MB loaded, full load: True
100%|██████████| 8/8 [00:00<00:00,  9.56it/s]
Requested to load AudioOobleckVAE
loaded completely;  321.70 MB loaded, full load: True
Prompt executed in 30.02 seconds

Get rid of the cat and unary negation and inplace add-cmul the two
halves of the rope. Precompute -sin once at the start of the model
rather than every transformer block.

This is slightly faster on both GPU and CPU bound setups.
@comfyanonymous comfyanonymous merged commit ae79e33 into Comfy-Org:master Feb 13, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants