Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[V0 deprecation] Deprecate V0 Neuron backend ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding
#21159 opened Jul 18, 2025 by WoosukKwon Loading…
Add an optimization doc on TPU documentation Improvements or additions to documentation tpu Related to Google TPUs
#21155 opened Jul 18, 2025 by bvrockwell Draft
4 tasks
[V0 Deprecation] Remove V0 Spec Decode workers ci/build new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding
#21152 opened Jul 18, 2025 by WoosukKwon Loading…
[Misc] Make MM embedding merge interface explicit in model runner ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
#21147 opened Jul 17, 2025 by ywang96 Loading…
4 tasks
Enable multi-image support benchmarking for serving performance Performance-related issues
#21145 opened Jul 17, 2025 by leopck Loading…
[Core] Add request preprocess counter in v1 v1
#21139 opened Jul 17, 2025 by vladmihailescu Loading…
3 of 4 tasks
[Attention] Optimize FlashInfer MetadataBuilder Build call rocm Related to AMD ROCm speculative-decoding v1
#21137 opened Jul 17, 2025 by LucasWilkinson Loading…
3 of 4 tasks
[Perf] Using mul instead of div for int8 quant
#21136 opened Jul 17, 2025 by yewentao256 Loading…
Convert tests to ruff-format deepseek Related to DeepSeek models llama Related to Llama models multi-modality Related to multi-modality (#4194) performance Performance-related issues qwen Related to Qwen models rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1
#21129 opened Jul 17, 2025 by hmellor Loading…
[Core] Set pooling params based on task and model frontend ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
#21128 opened Jul 17, 2025 by DarkLight1337 Loading…
2 of 4 tasks
docker: docker-aware precompiled wheel support ci/build
#21127 opened Jul 17, 2025 by dougbtv Loading…
4 tasks done
[WIP] Use FlashInfer RoPE
#21126 opened Jul 17, 2025 by mgoin Loading…
4 tasks
[Refactor] Remove Unused Naive Moe Kernels performance Performance-related issues
#21125 opened Jul 17, 2025 by yewentao256 Loading…
[UPDATED] - Large Block_size solution v1
#21123 opened Jul 17, 2025 by nadathurv Loading…
[Bugfix] Allocate less memory in non-batched CUTLASS MoE bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
#21121 opened Jul 17, 2025 by ElizaWszola Loading…
security policy: take 1 documentation Improvements or additions to documentation
#21119 opened Jul 17, 2025 by sidhpurwala-huzaifa Loading…
feat: add fused MLA QKV + strided layernorm deepseek Related to DeepSeek models
#21116 opened Jul 17, 2025 by mickaelseznec Loading…
3 of 4 tasks
ProTip! Filter pull requests by the default branch with base:main.