fix kl mismatch by resetting prefix cache by hallerite · Pull Request #1650 · PrimeIntellect-ai/prime-rl

hallerite · 2026-01-24T01:17:11Z

After bumping verifiers in #1572, the kl mismatch increased, because Verifiers stopped returning prompt logprobs in PrimeIntellect-ai/verifiers#666 and vLLM defaults to prefix caching in that case, which needs to be reset after weight updates for correctness.

Note

Ensures KV cache invalidation when model weights/adapters change to prevent stale-prefix usage.

Resets prefix cache after update_weights and reload_weights RPCs in server.py
Adds POST /load_lora_adapter server endpoint wrapping vLLM’s loader; on success resets prefix cache and returns {status: ok}; propagates error responses
Client now calls /load_lora_adapter (was /v1/load_lora_adapter); updated docstrings/comments
Unit tests updated to reflect new endpoint and retry behavior

^{Written by Cursor Bugbot for commit 17539f9. This will update automatically on new commits. Configure here.}

src/prime_rl/utils/client.py

Jackmin801

Good catch! Can we modify the /update_weights and /reload_weights to do the reset instead of having a new path?

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/utils/client.py

Jackmin801 · 2026-01-24T02:20:32Z

src/prime_rl/inference/vllm/server.py

+
+
+@router.post("/load_lora_adapter")
+async def load_lora_adapter(request: Request):


can we use the basemodel dataclass definition so the swagger docs are correct? its useful for debug sometimes

Jackmin801

nice! lgtm

reset prefix cache

1a17c5f

samsja reviewed Jan 24, 2026

View reviewed changes

src/prime_rl/utils/client.py Outdated Show resolved Hide resolved

Jackmin801 reviewed Jan 24, 2026

View reviewed changes

cursor bot reviewed Jan 24, 2026

View reviewed changes

src/prime_rl/utils/client.py Outdated Show resolved Hide resolved

hallerite added 2 commits January 24, 2026 01:44

move prefix cache reset to server-side endpoints

33ba98d

fix test

cf271f2

hallerite changed the title ~~[WIP] fix kl mismatch by resetting prefix cache~~ fix kl mismatch by resetting prefix cache Jan 24, 2026

Jackmin801 reviewed Jan 24, 2026

View reviewed changes

Jackmin801 approved these changes Jan 24, 2026

View reviewed changes

use pydantic model

17539f9

samsja approved these changes Jan 24, 2026

View reviewed changes

hallerite merged commit e67022b into main Jan 24, 2026
8 checks passed

hallerite deleted the hallerite/reset_prefix_cache branch January 24, 2026 02:51

samsja pushed a commit that referenced this pull request Jan 24, 2026

fix kl mismatch by resetting prefix cache (#1650)

d217b2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix kl mismatch by resetting prefix cache#1650

fix kl mismatch by resetting prefix cache#1650
hallerite merged 4 commits intomainfrom
hallerite/reset_prefix_cache

hallerite commented Jan 24, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Jackmin801 left a comment

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Jackmin801 Jan 24, 2026

Uh oh!

Jackmin801 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@router.post("/load_lora_adapter")
		async def load_lora_adapter(request: Request):

Comments

Conversation

hallerite commented Jan 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jackmin801 Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented Jan 24, 2026 •

edited by cursor bot

Loading