-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[V0 deprecation] Deprecate V0 Neuron backend
ci/build
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
speculative-decoding
#21159
opened Jul 18, 2025 by
WoosukKwon
Loading…
[WIP] Support relaxed acceptance for thinking tokens in speculative decoding
speculative-decoding
#21157
opened Jul 18, 2025 by
Ximingwang-09
•
Draft
4 tasks
Add an optimization doc on TPU
documentation
Improvements or additions to documentation
tpu
Related to Google TPUs
#21155
opened Jul 18, 2025 by
bvrockwell
•
Draft
4 tasks
[V0 Deprecation] Remove V0 Spec Decode workers
ci/build
new-model
Requests to new models
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
speculative-decoding
#21152
opened Jul 18, 2025 by
WoosukKwon
Loading…
Implement structural_tag and json_schema for non-chat completion
frontend
#21150
opened Jul 17, 2025 by
pathorn
Loading…
[Bugfix] Capture ray import error as string to avoid persistent references
#21149
opened Jul 17, 2025 by
tjohnson31415
Loading…
[Core] disable gc during cuda graph capture
codex
startup-ux
v1
#21146
opened Jul 17, 2025 by
mgoin
Loading…
Enable multi-image support benchmarking for serving
performance
Performance-related issues
#21145
opened Jul 17, 2025 by
leopck
Loading…
[Misc] allow pulling vllm in Ray runtime environment
#21143
opened Jul 17, 2025 by
eric-higgins-ai
Loading…
[Core] Add request preprocess counter in v1
v1
#21139
opened Jul 17, 2025 by
vladmihailescu
Loading…
3 of 4 tasks
[Attention] Optimize FlashInfer MetadataBuilder Build call
rocm
Related to AMD ROCm
speculative-decoding
v1
#21137
opened Jul 17, 2025 by
LucasWilkinson
Loading…
3 of 4 tasks
[Misc] change default request logging behavior to off
codex
#21135
opened Jul 17, 2025 by
simon-mo
Loading…
Convert Related to DeepSeek models
llama
Related to Llama models
multi-modality
Related to multi-modality (#4194)
performance
Performance-related issues
qwen
Related to Qwen models
rocm
Related to AMD ROCm
speculative-decoding
structured-output
tool-calling
v1
tests
to ruff-format
deepseek
#21129
opened Jul 17, 2025 by
hmellor
Loading…
[Core] Set pooling params based on task and model
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
tpu
Related to Google TPUs
v1
#21128
opened Jul 17, 2025 by
DarkLight1337
Loading…
2 of 4 tasks
docker: docker-aware precompiled wheel support
ci/build
#21127
opened Jul 17, 2025 by
dougbtv
Loading…
4 tasks done
[Refactor] Remove Unused Naive Moe Kernels
performance
Performance-related issues
#21125
opened Jul 17, 2025 by
yewentao256
Loading…
[Bugfix] Allocate less memory in non-batched CUTLASS MoE
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
#21121
opened Jul 17, 2025 by
ElizaWszola
Loading…
security policy: take 1
documentation
Improvements or additions to documentation
#21119
opened Jul 17, 2025 by
sidhpurwala-huzaifa
Loading…
feat: add fused MLA QKV + strided layernorm
deepseek
Related to DeepSeek models
#21116
opened Jul 17, 2025 by
mickaelseznec
Loading…
3 of 4 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.