Integrate token-in route with vLLM v0.12.0 by mikasenghaas · Pull Request #1444 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2025-12-17T16:41:42Z

Contains recent changes on main + token-in PR (#1422) + migration to vLLM v0.12.0 for this endpoint and some general cleanup that should make future maintenance easier

Note

Adds a token-in chat completions endpoint to the vLLM server and promotes model.experimental.lora to model.lora, updating code, configs, and validations.

Inference (vLLM server):
- Add /v1/chat/completions/tokens endpoint with OpenAIServingChatWithTokens handler and wiring (router, validation, streaming, error handling).
- Patch init_app_state to register token-in handler; delegate worker proc via vLLM’s run_api_server_worker_proc.
- Adjust InferenceConfig: auto-set api_server_count to dp and force 1 when enable_lora=true.
Orchestrator:
- When trajectory_strategy="interleaved", enable token prompts in environments to avoid retokenization discrepancies.
Trainer/Config Refactor:
- Move model.experimental.lora to model.lora; remove ExperimentalConfig; update all usages (validators, model setup, ckpt/weight broadcast paths) in RL and SFT trainers and rl.py.
- Update CI/example TOMLs to [trainer.model.lora] and related fields.
Docs:
- Update trajectories guidance; streamline around chat template behavior.
Dependencies/Configs:
- Bump verifiers revision; minor example/config tweaks (env IDs, W&B names, parallel settings).
Changelog:
- Document model.lora move out of experimental.

^{Written by Cursor Bugbot for commit 1e512cc. This will update automatically on new commits. Configure here.}

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

…ed field is shown

This reverts commit 43c6a2b.

This reverts commit 9182191.

…mode

cursor · 2025-12-17T17:02:41Z

src/prime_rl/inference/config.py

+
+        if self.enable_lora:
+            self.api_server_count = 1  # LoRA requires only one API server
        return self


Bug: LoRA forces incompatible API server count

auto_setup_api_server_count first enforces api_server_count >= parallel.dp but then unconditionally sets api_server_count = 1 when enable_lora is true. This can produce an inconsistent configuration (e.g., parallel.dp > 1 with only one API server), which likely breaks the intended DP setup and contradicts existing LoRA-in-DP support implied elsewhere.

this is forced by vllm. but acc, should check, maybe its not an issue anymore with v0.12

* dont use enum setter for logprobs mode * fix: stale imports * update to torch 2.9 * init_app_state doesnt take vllm config anymore somehow * use runnable because of CUDAGraphWrapper * vllm now uses default seed 0 * fix import * moe venv * use mjun flash attn for torch 2.9 and up vllm version * Revert "moe venv" This reverts commit 8934ceb. * remove some todos * remove unused import * Apply suggestions from code review Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> * set very high max cpu loras to patch around areal lora hack * make flash attn optional and put uv sync extras everywhere * Integrate token-in route with vLLM v0.12.0 (#1444) * Move LoRA out of experimental section (#1440) * Add Bug Bot instructions for changelog enforcement (#1441) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * duplicate chat completinos endpoint into /generate * serve chat with token in functionality * use field to avoid misleading warning * nicer error msg * lock feature branch * make use tokens prompt configurable * use setter and print info * bump * include inference * do not print warning log (logs all the time) * bump * bump + bring back warning log * bump vf * bump vf * use dp=6 in wordle example * no deepcopy and no warning * do not tokenize on the server * add field names so that tokens is cached and no warning of unrecognized field is shown * bump vf * auto install * bump vf * bump vf + set vllm tokenize method * skip applying chat template * Revert "skip applying chat template" This reverts commit 43c6a2b. * Revert "do not tokenize on the server" This reverts commit 9182191. * bring back log * use route /v1/chat/completions/tokens * fix log * bump vf and make everything configurable * bump and more informative log * bump and make non-exact tokenization default * use token prompts by default * remove retokenization issue from docs * rename class * bump vf * fix auto asc setup for lora * bump vf * bump vf * bump vf * bring back setter * bump vf * bump vf to latest prime-rl * make custom routes v0.12.0 compatible * monkey patch api server worker proc again to enable multi api server mode --------- Co-authored-by: will brown <williambrown97@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> --------- Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Mika Senghaas <mail@mikasenghaas.de> Co-authored-by: will brown <williambrown97@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>

willccbb and others added 30 commits December 17, 2025 02:15

Move LoRA out of experimental section (#1440)

e259535

Add Bug Bot instructions for changelog enforcement (#1441)

05165f6

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

duplicate chat completinos endpoint into /generate

2ca61f8

serve chat with token in functionality

6eeb4e1

use field to avoid misleading warning

74821ea

nicer error msg

02485a1

lock feature branch

2d0bcab

make use tokens prompt configurable

429a462

use setter and print info

31f111b

bump

2666f41

include inference

dc45983

do not print warning log (logs all the time)

08b13d8

bump

eda2348

bump + bring back warning log

d2f530d

bump vf

5a41f94

bump vf

162f05f

use dp=6 in wordle example

84c00cb

no deepcopy and no warning

9a0fc7d

do not tokenize on the server

8fdd5e7

add field names so that tokens is cached and no warning of unrecogniz…

2b4052b

…ed field is shown

bump vf

7725066

auto install

e815f44

bump vf

eb99034

bump vf + set vllm tokenize method

37035e8

skip applying chat template

a339566

Revert "skip applying chat template"

bcd1ac4

This reverts commit 43c6a2b.

Revert "do not tokenize on the server"

035bcda

This reverts commit 9182191.

bring back log

2c0bd59

use route /v1/chat/completions/tokens

675a772

fix log

dbebae9

mikasenghaas added 17 commits December 17, 2025 07:47

bump vf and make everything configurable

44c493d

bump and more informative log

8ef375b

bump and make non-exact tokenization default

ddfb878

use token prompts by default

09355e4

remove retokenization issue from docs

21b85e6

rename class

8f7a090

bump vf

8e0c984

fix auto asc setup for lora

4ce7321

bump vf

17dd9ad

bump vf

6104f53

bump vf

3254f53

bring back setter

6a50b19

bump vf

4961e4d

bump vf to latest prime-rl

e627788

Merge branch 'tok-in-out' into upgrade-vllm

b827bfb

make custom routes v0.12.0 compatible

612b534

monkey patch api server worker proc again to enable multi api server …

1e512cc

…mode

mikasenghaas changed the base branch from main to upgrade-vllm December 17, 2025 16:41

mikasenghaas requested review from Jackmin801 and samsja December 17, 2025 16:54

mikasenghaas marked this pull request as ready for review December 17, 2025 16:57

cursor bot reviewed Dec 17, 2025

View reviewed changes

mikasenghaas merged commit 4051ad9 into upgrade-vllm Dec 17, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Integrate token-in route with vLLM v0.12.0#1444

Integrate token-in route with vLLM v0.12.0#1444
mikasenghaas merged 47 commits intoupgrade-vllmfrom
upgrade-vllm-with-tok-in

mikasenghaas commented Dec 17, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Dec 17, 2025

Uh oh!

mikasenghaas Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

mikasenghaas commented Dec 17, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Dec 17, 2025

Choose a reason for hiding this comment

Bug: LoRA forces incompatible API server count

Uh oh!

mikasenghaas Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikasenghaas commented Dec 17, 2025 •

edited by cursor bot

Loading