Skip to content

Comments

fix kl mismatch by resetting prefix cache#1650

Merged
hallerite merged 4 commits intomainfrom
hallerite/reset_prefix_cache
Jan 24, 2026
Merged

fix kl mismatch by resetting prefix cache#1650
hallerite merged 4 commits intomainfrom
hallerite/reset_prefix_cache

Conversation

@hallerite
Copy link
Contributor

@hallerite hallerite commented Jan 24, 2026

After bumping verifiers in #1572, the kl mismatch increased, because Verifiers stopped returning prompt logprobs in PrimeIntellect-ai/verifiers#666 and vLLM defaults to prefix caching in that case, which needs to be reset after weight updates for correctness.


Note

Ensures KV cache invalidation when model weights/adapters change to prevent stale-prefix usage.

  • Resets prefix cache after update_weights and reload_weights RPCs in server.py
  • Adds POST /load_lora_adapter server endpoint wrapping vLLM’s loader; on success resets prefix cache and returns {status: ok}; propagates error responses
  • Client now calls /load_lora_adapter (was /v1/load_lora_adapter); updated docstrings/comments
  • Unit tests updated to reflect new endpoint and retry behavior

Written by Cursor Bugbot for commit 17539f9. This will update automatically on new commits. Configure here.

Copy link
Member

@Jackmin801 Jackmin801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Can we modify the /update_weights and /reload_weights to do the reset instead of having a new path?

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@hallerite hallerite changed the title [WIP] fix kl mismatch by resetting prefix cache fix kl mismatch by resetting prefix cache Jan 24, 2026


@router.post("/load_lora_adapter")
async def load_lora_adapter(request: Request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the basemodel dataclass definition so the swagger docs are correct? its useful for debug sometimes

Copy link
Member

@Jackmin801 Jackmin801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! lgtm

@hallerite hallerite merged commit e67022b into main Jan 24, 2026
8 checks passed
@hallerite hallerite deleted the hallerite/reset_prefix_cache branch January 24, 2026 02:51
samsja pushed a commit that referenced this pull request Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants