Upgrade vllm by Jackmin801 · Pull Request #1427 · PrimeIntellect-ai/prime-rl

Jackmin801 · 2025-12-13T08:07:04Z

Note

Upgrades vLLM to 0.12 with server/router migration and config updates, bumps PyTorch to 2.9, adds optional flash-attn extra, and switches CI/Docker/install to uv sync --all-extras.

Inference/server:
- Migrate to vLLM 0.12.0 APIs: use built-in router, engine_client, and run_api_server_worker_proc; register custom endpoints (/update_weights, /reload_weights, /init_broadcaster).
- Update Chat Completions with tokens flow to new tokenizer access, logits processors validation, prompt handling, and DP-rank propagation; refine logging/streaming paths; keep LoRA in-place monkey-patch.
- Filesystem worker now accesses model as model_runner.model.runnable.
Config:
- Add max_cpu_loras (default 100) and map to vLLM; change seed to required int default 0; minor field mapping updates.
Build/CI/docs:
- Use uv sync --all-extras in CPU/GPU/nightly workflows, Dockerfile, install script, and README (including FlashAttention note).
Dependencies:
- Bump: vllm→0.12.0, torch→2.9.0 (and torchaudio/torchvision), fastapi/starlette, pydantic/pydantic-core, triton, plus related packages.
- Introduce optional flash-attn extra (prebuilt wheel) and lockfile updates; remove direct flash-attn from core deps.

^{Written by Cursor Bugbot for commit fc84210. This will update automatically on new commits. Configure here.}

This reverts commit 8934ceb.

src/prime_rl/inference/config.py

src/prime_rl/inference/vllm/server.py

src/prime_rl/inference/vllm/worker/filesystem.py

pyproject.toml

src/prime_rl/inference/config.py

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

src/prime_rl/inference/vllm/server.py

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

pyproject.toml

scripts/install.sh

* Move LoRA out of experimental section (#1440) * Add Bug Bot instructions for changelog enforcement (#1441) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * duplicate chat completinos endpoint into /generate * serve chat with token in functionality * use field to avoid misleading warning * nicer error msg * lock feature branch * make use tokens prompt configurable * use setter and print info * bump * include inference * do not print warning log (logs all the time) * bump * bump + bring back warning log * bump vf * bump vf * use dp=6 in wordle example * no deepcopy and no warning * do not tokenize on the server * add field names so that tokens is cached and no warning of unrecognized field is shown * bump vf * auto install * bump vf * bump vf + set vllm tokenize method * skip applying chat template * Revert "skip applying chat template" This reverts commit 43c6a2b. * Revert "do not tokenize on the server" This reverts commit 9182191. * bring back log * use route /v1/chat/completions/tokens * fix log * bump vf and make everything configurable * bump and more informative log * bump and make non-exact tokenization default * use token prompts by default * remove retokenization issue from docs * rename class * bump vf * fix auto asc setup for lora * bump vf * bump vf * bump vf * bring back setter * bump vf * bump vf to latest prime-rl * make custom routes v0.12.0 compatible * monkey patch api server worker proc again to enable multi api server mode --------- Co-authored-by: will brown <williambrown97@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>

cursor · 2025-12-17T22:50:23Z

src/prime_rl/inference/config.py

        ),
-    ] = None
+    ] = 0



Bug: Config changes missing changelog entry (Bugbot Rules)

InferenceConfig adds max_cpu_loras and changes seed defaults/typing, which modifies configuration usage patterns, but CHANGELOG.md has no entry for these updates. This violates the PR rule requiring changelog updates for config field additions/behavior changes.

eh sure bro

Jackmin801

yee looks good. if pass tests shud be fine to merge

samsja

lgtm

This reverts commit 72c5f5f.

Jackmin801 added 10 commits December 13, 2025 06:09

dont use enum setter for logprobs mode

c9f7c5a

fix: stale imports

114c158

update to torch 2.9

87d8f18

init_app_state doesnt take vllm config anymore somehow

2d5c0ee

use runnable because of CUDAGraphWrapper

50b4717

vllm now uses default seed 0

02c1d2d

fix import

7c51c89

moe venv

8934ceb

use mjun flash attn for torch 2.9 and up vllm version

ce9fe3a

Revert "moe venv"

2e4c16f

This reverts commit 8934ceb.

Jackmin801 force-pushed the upgrade-vllm branch from a93faa5 to 2e4c16f Compare December 14, 2025 09:24

remove some todos

c1322b3

Jackmin801 marked this pull request as ready for review December 14, 2025 09:27

remove unused import

92d2895

cursor bot reviewed Dec 14, 2025

View reviewed changes

src/prime_rl/inference/config.py Show resolved Hide resolved

src/prime_rl/inference/vllm/server.py Outdated Show resolved Hide resolved

src/prime_rl/inference/vllm/worker/filesystem.py Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

Jackmin801 commented Dec 14, 2025

View reviewed changes

src/prime_rl/inference/config.py Outdated Show resolved Hide resolved

Apply suggestions from code review

cd3c297

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

cursor bot reviewed Dec 14, 2025

View reviewed changes

src/prime_rl/inference/vllm/server.py Outdated Show resolved Hide resolved

Jackmin801 and others added 2 commits December 16, 2025 19:47

Merge branch 'main' into upgrade-vllm

219b5a6

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

set very high max cpu loras to patch around areal lora hack

fe3b5c5

samsja reviewed Dec 17, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

make flash attn optional and put uv sync extras everywhere

d99258a

cursor bot reviewed Dec 17, 2025

View reviewed changes

scripts/install.sh Show resolved Hide resolved

mikasenghaas and others added 2 commits December 17, 2025 14:36

Merge branch 'main' into upgrade-vllm

fc84210

mikasenghaas requested a review from samsja December 17, 2025 22:42

cursor bot reviewed Dec 17, 2025

View reviewed changes

Jackmin801 commented Dec 17, 2025

View reviewed changes

samsja approved these changes Dec 18, 2025

View reviewed changes

samsja merged commit 72c5f5f into main Dec 18, 2025
6 checks passed

mikasenghaas added a commit that referenced this pull request Dec 19, 2025

Revert "Upgrade vllm (#1427)"

09a3f3c

This reverts commit 72c5f5f.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade vllm#1427

Upgrade vllm#1427
samsja merged 18 commits intomainfrom
upgrade-vllm

Jackmin801 commented Dec 13, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 17, 2025

Uh oh!

Jackmin801 Dec 17, 2025

Uh oh!

Jackmin801 left a comment

Uh oh!

samsja left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jackmin801 commented Dec 13, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 17, 2025

Choose a reason for hiding this comment

Bug: Config changes missing changelog entry (Bugbot Rules)

Uh oh!

Jackmin801 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jackmin801 commented Dec 13, 2025 •

edited by cursor bot

Loading