Merged
Conversation
This reverts commit 8934ceb.
a93faa5 to
2e4c16f
Compare
Jackmin801
commented
Dec 14, 2025
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
samsja
reviewed
Dec 17, 2025
* Move LoRA out of experimental section (#1440) * Add Bug Bot instructions for changelog enforcement (#1441) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * duplicate chat completinos endpoint into /generate * serve chat with token in functionality * use field to avoid misleading warning * nicer error msg * lock feature branch * make use tokens prompt configurable * use setter and print info * bump * include inference * do not print warning log (logs all the time) * bump * bump + bring back warning log * bump vf * bump vf * use dp=6 in wordle example * no deepcopy and no warning * do not tokenize on the server * add field names so that tokens is cached and no warning of unrecognized field is shown * bump vf * auto install * bump vf * bump vf + set vllm tokenize method * skip applying chat template * Revert "skip applying chat template" This reverts commit 43c6a2b. * Revert "do not tokenize on the server" This reverts commit 9182191. * bring back log * use route /v1/chat/completions/tokens * fix log * bump vf and make everything configurable * bump and more informative log * bump and make non-exact tokenization default * use token prompts by default * remove retokenization issue from docs * rename class * bump vf * fix auto asc setup for lora * bump vf * bump vf * bump vf * bring back setter * bump vf * bump vf to latest prime-rl * make custom routes v0.12.0 compatible * monkey patch api server worker proc again to enable multi api server mode --------- Co-authored-by: will brown <williambrown97@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>
| ), | ||
| ] = None | ||
| ] = 0 | ||
|
|
There was a problem hiding this comment.
Bug: Config changes missing changelog entry (Bugbot Rules)
InferenceConfig adds max_cpu_loras and changes seed defaults/typing, which modifies configuration usage patterns, but CHANGELOG.md has no entry for these updates. This violates the PR rule requiring changelog updates for config field additions/behavior changes.
Jackmin801
commented
Dec 17, 2025
Member
Author
Jackmin801
left a comment
There was a problem hiding this comment.
yee looks good. if pass tests shud be fine to merge
samsja
approved these changes
Dec 18, 2025
mikasenghaas
added a commit
that referenced
this pull request
Dec 19, 2025
This reverts commit 72c5f5f.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Upgrades vLLM to 0.12 with server/router migration and config updates, bumps PyTorch to 2.9, adds optional flash-attn extra, and switches CI/Docker/install to
uv sync --all-extras.0.12.0APIs: use built-inrouter,engine_client, andrun_api_server_worker_proc; register custom endpoints (/update_weights,/reload_weights,/init_broadcaster).model_runner.model.runnable.max_cpu_loras(default100) and map to vLLM; changeseedto requiredintdefault0; minor field mapping updates.uv sync --all-extrasin CPU/GPU/nightly workflows, Dockerfile, install script, and README (including FlashAttention note).vllm→0.12.0,torch→2.9.0(andtorchaudio/torchvision),fastapi/starlette,pydantic/pydantic-core,triton, plus related packages.flash-attnextra (prebuilt wheel) and lockfile updates; remove directflash-attnfrom core deps.Written by Cursor Bugbot for commit fc84210. This will update automatically on new commits. Configure here.