Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[misc] ignore marlin_moe_wna16 local gen codes ready ONLY add when PR is ready to merge/full CI is needed
#16760 by DefTruth was merged Apr 17, 2025 Loading…
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel ready ONLY add when PR is ready to merge/full CI is needed
#16693 by DefTruth was merged Apr 16, 2025 Loading…
[Misc] fix local pytest broken
#16582 by DefTruth was closed Apr 15, 2025 Loading…
[Misc] remove warning if triton>=3.2.0 ready ONLY add when PR is ready to merge/full CI is needed
#16553 by DefTruth was merged Apr 14, 2025 Loading…
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" ready ONLY add when PR is ready to merge/full CI is needed
#16453 by DefTruth was merged Apr 11, 2025 Loading…
[Kernel] support merge_attn_states CUDA kernel, 3x speedup ci/build ready ONLY add when PR is ready to merge/full CI is needed v1
#16173 by DefTruth was merged Apr 11, 2025 Loading…
9 tasks done
[Kernel] Remove redundant Exp calculations ready ONLY add when PR is ready to merge/full CI is needed
#16123 by DefTruth was merged Apr 15, 2025 Loading…
[Bugfix] hotfix for gptq-marlin non-contiguous error
#15374 by DefTruth was closed Mar 25, 2025 Loading…
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 ready ONLY add when PR is ready to merge/full CI is needed
#15322 by DefTruth was merged Mar 23, 2025 Loading…
[W8A8] Add w8a8 block fp8 tuning script
#15126 by DefTruth was closed Mar 22, 2025 Loading…
[Misc] Add w8a8 block fp8 tune script
#15118 by DefTruth was closed Mar 19, 2025 Loading…
[MoE] Tune Fused MoE for R1 on NVIDIA_L20
#15117 by DefTruth was closed Mar 22, 2025 Loading…
[Bugfix] fix torch.compiled cache hash error ready ONLY add when PR is ready to merge/full CI is needed
#14953 by DefTruth was merged Mar 23, 2025 Loading…
[Bugfix] fix triton mla + awq crash while prefix cache hit bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
#14946 by DefTruth was closed Mar 22, 2025 Loading…
[Bugfix][V1] Fix compiled graph hash ci/build documentation Improvements or additions to documentation needs-rebase v1
#14867 by DefTruth was closed Mar 17, 2025 Loading…
[Bugfix][V1] Fix flashinfer sampling v1
#14815 by DefTruth was merged Mar 15, 2025 Loading…
[Bugfix][W8A8] fixed cutlass block fp8 binding bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
#14796 by DefTruth was merged Mar 14, 2025 Loading…
[VLM][Bugfix] enable internvl running with num_scheduler_steps > 1 ready ONLY add when PR is ready to merge/full CI is needed
#8614 by DefTruth was merged Sep 25, 2024 Loading…
[Doc] add env docs for flashinfer backend
#6437 by DefTruth was merged Jul 15, 2024 Loading…
[Misc] remove chunk detected debug logs
#4571 by DefTruth was merged May 3, 2024 Loading…
ProTip! Exclude everything labeled bug with -label:bug.