Skip to content

Upgrade vllm#1427

Merged
samsja merged 18 commits intomainfrom
upgrade-vllm
Dec 18, 2025
Merged

Upgrade vllm#1427
samsja merged 18 commits intomainfrom
upgrade-vllm

Conversation

@Jackmin801
Copy link
Member

@Jackmin801 Jackmin801 commented Dec 13, 2025

Note

Upgrades vLLM to 0.12 with server/router migration and config updates, bumps PyTorch to 2.9, adds optional flash-attn extra, and switches CI/Docker/install to uv sync --all-extras.

  • Inference/server:
    • Migrate to vLLM 0.12.0 APIs: use built-in router, engine_client, and run_api_server_worker_proc; register custom endpoints (/update_weights, /reload_weights, /init_broadcaster).
    • Update Chat Completions with tokens flow to new tokenizer access, logits processors validation, prompt handling, and DP-rank propagation; refine logging/streaming paths; keep LoRA in-place monkey-patch.
    • Filesystem worker now accesses model as model_runner.model.runnable.
  • Config:
    • Add max_cpu_loras (default 100) and map to vLLM; change seed to required int default 0; minor field mapping updates.
  • Build/CI/docs:
    • Use uv sync --all-extras in CPU/GPU/nightly workflows, Dockerfile, install script, and README (including FlashAttention note).
  • Dependencies:
    • Bump: vllm0.12.0, torch2.9.0 (and torchaudio/torchvision), fastapi/starlette, pydantic/pydantic-core, triton, plus related packages.
    • Introduce optional flash-attn extra (prebuilt wheel) and lockfile updates; remove direct flash-attn from core deps.

Written by Cursor Bugbot for commit fc84210. This will update automatically on new commits. Configure here.

@Jackmin801 Jackmin801 marked this pull request as ready for review December 14, 2025 09:27
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Jackmin801 and others added 2 commits December 16, 2025 19:47
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
mikasenghaas and others added 2 commits December 17, 2025 14:36
* Move LoRA out of experimental section (#1440)

* Add Bug Bot instructions for changelog enforcement (#1441)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* duplicate chat completinos endpoint into /generate

* serve chat with token in functionality

* use field to avoid misleading warning

* nicer error msg

* lock feature branch

* make use tokens prompt configurable

* use setter and print info

* bump

* include inference

* do not print warning log (logs all the time)

* bump

* bump + bring back warning log

* bump vf

* bump vf

* use dp=6 in wordle example

* no deepcopy and no warning

* do not tokenize on the server

* add field names so that tokens is cached and no warning of unrecognized field is shown

* bump vf

* auto install

* bump vf

* bump vf + set vllm tokenize method

* skip applying chat template

* Revert "skip applying chat template"

This reverts commit 43c6a2b.

* Revert "do not tokenize on the server"

This reverts commit 9182191.

* bring back log

* use route /v1/chat/completions/tokens

* fix log

* bump vf and make everything configurable

* bump and more informative log

* bump and make non-exact tokenization default

* use token prompts by default

* remove retokenization issue from docs

* rename class

* bump vf

* fix auto asc setup for lora

* bump vf

* bump vf

* bump vf

* bring back setter

* bump vf

* bump vf to latest prime-rl

* make custom routes v0.12.0 compatible

* monkey patch api server worker proc again to enable multi api server mode

---------

Co-authored-by: will brown <williambrown97@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
@mikasenghaas mikasenghaas requested a review from samsja December 17, 2025 22:42
),
] = None
] = 0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Config changes missing changelog entry (Bugbot Rules)

InferenceConfig adds max_cpu_loras and changes seed defaults/typing, which modifies configuration usage patterns, but CHANGELOG.md has no entry for these updates. This violates the PR rule requiring changelog updates for config field additions/behavior changes.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh sure bro

Copy link
Member Author

@Jackmin801 Jackmin801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yee looks good. if pass tests shud be fine to merge

Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@samsja samsja merged commit 72c5f5f into main Dec 18, 2025
6 checks passed
mikasenghaas added a commit that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants