Skip to content

[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine #15946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jinzhen-lin
Copy link
Contributor

@jinzhen-lin jinzhen-lin commented Apr 2, 2025

#14138 optimized the marlin kernel and introbuted a new parameter use_atomic_add and new env var VLLM_MARLIN_USE_ATOMIC_ADD. However, when using it with V1 engine, we would got an error like

File "/root/miniconda3/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 644, in dumps
    self.dump(obj)
TypeError: cannot pickle 'PyCapsule' object

To reproduce

VLLM_MARLIN_USE_ATOMIC_ADD=1 vllm serve Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

I don't sure the actual reason, but I found that if I don't use m (input_size) as a part of the condition indicated whether we use atomic add or not, the error is gone.

This PR fix this problem by moving the condition of m to c++ code. This would also improve kernel performance of some special case, since marlin may launch several kernels with different m size in single gptq_marlin_gemm.

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Copy link

github-actions bot commented Apr 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.. if we can do the logic in the c++ then why not move it all there? I'm sure we have the information on n and k at that level. cc @bnellnm @ProExpertProg that might know why this specific operation fails compilation

@ProExpertProg
Copy link
Contributor

If I had to guess, looking at m in python is traced by Dynamo, and because m is dynamic and used in a max expression, the compilation gets specialized for m and then we potentially recompile when m is different.

I'll look at the repro to make sure there aren't other footguns here. And we should add this case to compilation tests.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 5, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay feel free to add as a test case Luka, for now we can land

@mgoin
Copy link
Member

mgoin commented Apr 5, 2025

BTW @jinzhen-lin it would be great if you could join the developer slack to participate in discussions (https://slack.vllm.ai/). We've been refactoring fused moe recently.

Unrelated to this PR, but could you consider adding support for channelwise scales for moe_wna16? There are several places that assume that group_size != -1 (example 1, example 2) and it would be good to support channelwise quantization in the triton implementation.

@jinzhen-lin
Copy link
Contributor Author

BTW @jinzhen-lin it would be great if you could join the developer slack to participate in discussions (https://slack.vllm.ai/). We've been refactoring fused moe recently.

Unrelated to this PR, but could you consider adding support for channelwise scales for moe_wna16? There are several places that assume that group_size != -1 (example 1, example 2) and it would be good to support channelwise quantization in the triton implementation.

I would develop the channelwise quantization if I have time. BTW, #14447 support channelwise quantization and have better performance than triton/cuda version.

@vllm-bot vllm-bot merged commit 2fa66ef into vllm-project:main Apr 6, 2025
78 of 81 checks passed
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 7, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
@ProExpertProg
Copy link
Contributor

Btw, I wasn't able to reproduce this (I made sure I used a commit before this one was merged), but ping me if similar issues occur

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
sfc-gh-mhidayetoglu added a commit to sfc-gh-mhidayetoglu/vllm that referenced this pull request Apr 29, 2025
* [Docs] Add Ollama meetup slides (vllm-project#15905)

Signed-off-by: simon-mo <simon.mo@hey.com>

* [Docs] Add Intel as Sponsor (vllm-project#15913)

Signed-off-by: simon-mo <simon.mo@hey.com>

* [Spec Decode] Fix input triton kernel for eagle (vllm-project#15909)

* [V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (vllm-project#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

* [Bugfix] Fix imports for MoE on CPU (vllm-project#15841)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (vllm-project#15902)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Doc] Update rocm.inc.md (vllm-project#15917)

Signed-off-by: chun37 <chun.jb.37@gmail.com>

* [V1][Bugfix] Fix typo in MoE TPU checking (vllm-project#15927)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [Benchmark]Fix error message (vllm-project#15866)

Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

* [Misc] Replace print with logger (vllm-project#15923)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [CI/Build] Further clean up LoRA tests (vllm-project#15920)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Bugfix] Fix cache block size calculation for CPU MLA (vllm-project#15848)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [Build/CI] Update lm-eval to 0.4.8 (vllm-project#15912)

Signed-off-by: Chris Thi <chris.c.thi@gmail.com>

* [Kernel] Add more dtype support for GGUF dequantization (vllm-project#15879)

Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>

* [core] Add tags parameter to wake_up() (vllm-project#15500)

Signed-off-by: Eric <erictang000@gmail.com>

* [V1] Fix json_object support with xgrammar (vllm-project#15488)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* Add minimum version for `huggingface_hub` to enable Xet downloads (vllm-project#15873)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (vllm-project#15926)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

* [CI] Remove duplicate entrypoints-test (vllm-project#15940)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (vllm-project#15938)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Metrics] Hide deprecated metrics (vllm-project#15458)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* [Frontend] Implement Tool Calling with `tool_choice='required'` (vllm-project#13483)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>

* [CPU][Bugfix] Using custom allreduce for CPU backend (vllm-project#15934)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Model] use AutoWeightsLoader in model load_weights (vllm-project#15770)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Misc] V1 LoRA support CPU offload (vllm-project#15843)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Restricted cmake to be less than version 4 as 4.x breaks the build of… (vllm-project#15859)

Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>

* [misc] instruct pytorch to use nvml-based cuda check (vllm-project#15951)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [V1] Support Mistral3 in V1 (vllm-project#15950)

Signed-off-by: mgoin <mgoin64@gmail.com>

* Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (vllm-project#15969)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (vllm-project#15736)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>

* [TPU] optimize the all-reduce performance (vllm-project#15903)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [V1][TPU] Do not compile sampling more than needed (vllm-project#15883)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [ROCM][KERNEL] Paged attention for V1 (vllm-project#15720)

Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>

* fix: better error message for get_config close vllm-project#13889 (vllm-project#15943)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [bugfix] add seed in torchrun_example.py (vllm-project#15980)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [ROCM][V0] PA kennel selection when no sliding window provided (vllm-project#15982)

Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>

* [Benchmark] Add AIMO Dataset to Benchmark (vllm-project#15955)

Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>

* [misc] improve error message for "Failed to infer device type" (vllm-project#15994)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (vllm-project#15367)

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

* [doc] update contribution link (vllm-project#15922)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* fix: tiny fix make format.sh excutable (vllm-project#16015)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [SupportsQuant] Bert, Blip, Blip2, Bloom (vllm-project#15573)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [SupportsQuant] Chameleon, Chatglm, Commandr (vllm-project#15952)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Neuron][kernel] Fuse kv cache into a single tensor (vllm-project#15911)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>

* [Minor] Fused experts refactor (vllm-project#15914)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [Misc][Performance] Advance tpu.txt to the most recent nightly torch … (vllm-project#16024)

* Re-enable the AMD Testing for the passing tests. (vllm-project#15586)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (vllm-project#15732)

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

* [TPU] Switch Test to Non-Sliding Window (vllm-project#15981)

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>

* [Bugfix] Fix function names in test_block_fp8.py (vllm-project#16033)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [ROCm] Tweak the benchmark script to run on ROCm (vllm-project#14252)

* [Misc] improve gguf check (vllm-project#15974)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU][V1] Remove ragged attention kernel parameter hard coding (vllm-project#16041)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* doc: add info for macos clang errors (vllm-project#16049)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [V1][Spec Decode] Avoid logging useless nan metrics (vllm-project#16023)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* [Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (vllm-project#15939)

Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>

* [Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (vllm-project#15945)

Signed-off-by: zhenwei <zhenweiliu@habana.ai>

* [Bugfix][kernels] Fix half2float conversion in gguf kernels (vllm-project#15995)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Benchmark][Doc] Update throughput benchmark and README (vllm-project#15998)

Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

* [CPU] Change default block_size for CPU backend (vllm-project#16002)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Distributed] [ROCM] Fix custom allreduce enable checks (vllm-project#16010)

Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>

* [ROCm][Bugfix] Use platform specific FP8 dtype (vllm-project#15717)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [ROCm][Bugfix] Bring back fallback to eager mode removed in vllm-project#14917, but for ROCm only (vllm-project#15413)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Bugfix] Fix default behavior/fallback for pp in v1 (vllm-project#16057)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [CI] Reorganize .buildkite directory (vllm-project#16001)

Signed-off-by: kevin <kevin@anyscale.com>

* [V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (vllm-project#15906)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [V1] Scatter and gather placeholders in the model runner (vllm-project#15712)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

* Revert "[V1] Scatter and gather placeholders in the model runner" (vllm-project#16075)

* [Kernel][Minor] Re-fuse triton moe weight application (vllm-project#16071)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [Bugfix][TPU] Fix V1 TPU worker for sliding window (vllm-project#16059)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [V1][Spec Decode] Update N-gram Proposer Interface (vllm-project#15750)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Misc] Auto detect bitsandbytes pre-quantized models (vllm-project#16027)

Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>

* [CI] Fix benchmark script level (vllm-project#16089)

* fix: support clang17 for macos and fix the real libomp (vllm-project#16086)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [doc] fix 404 (vllm-project#16082)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "doc: add info for macos clang errors (vllm-project#16049)" (vllm-project#16091)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* Fix some capitalisations in generated examples doc titles (vllm-project#16094)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Misc] format output for encoder_decoder.py (vllm-project#16095)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Remove redundant code (vllm-project#16098)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

* [Model] use AutoWeightsLoader for phi, gemma, deepseek (vllm-project#16088)

Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>

* [Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (vllm-project#16112)

Signed-off-by: Lu Fang <fanglu@fb.com>

* [Benchmark] Add sampling parameters to benchmark_serving. (vllm-project#16022)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>

* [Frontend] Fix typo in tool chat templates for llama3.2 and toolace (vllm-project#14501)

Signed-off-by: Ben Jackson <ben@ben.com>

* [CI][V1] Fix passing `tokenizer` as kwarg to `validate_guidance_grammar` (vllm-project#16117)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [Misc] refactor example eagle (vllm-project#16100)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Doc][Bugfix] Add missing EOF in k8s deploy doc (vllm-project#16025)

* [Misc] Improve model redirect to accept json dictionary (vllm-project#16119)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (vllm-project#16103)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Bugfix] LoRA : Fix the order in which the kernels process LoRAs  (vllm-project#16040)

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

* [Bugfix] add hf_token to EngineArgs (vllm-project#16093)

Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>

* [Misc] update requires-python in pyproject.toml (vllm-project#16116)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU] Update PyTorch/XLA (vllm-project#16130)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [V1][Minor] Optimize get_cached_block (vllm-project#16135)

* Fix requires-python (vllm-project#16132)

* [Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (vllm-project#15202)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [V1][Minor] Minor simplification for get_computed_blocks  (vllm-project#16139)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Misc] Update Mistral-3.1 example (vllm-project#16147)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (vllm-project#16129)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [CI] Set max transformers version for Ultravox model test  (vllm-project#16149)

Signed-off-by: Roger Wang <ywang@roblox.com>

* doc: fix some typos in doc (vllm-project#16154)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [VLM] Florence-2 supports online serving (vllm-project#16164)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [V1][Structured Output] Add `supports_structured_output()` method to Platform (vllm-project#16148)

Signed-off-by: shen-shanshan <467638484@qq.com>

* [Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [Misc] improve example mlpspeculator and llm_engine_example (vllm-project#16175)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Doc]Update image to latest version (vllm-project#16186)

Signed-off-by: WangErXiao <863579016@qq.com>

* Upstream Llama4 Support to Main (vllm-project#16113)

Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (vllm-project#16187)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [V1] Revert the default `max_num_seqs` to V0 values for most hardware (vllm-project#16158)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Print the warning only once (vllm-project#16193)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Misc] Human-readable `max-model-len` cli arg (vllm-project#16181)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [Misc] Move Llama 4 projector call into encoder execution (vllm-project#16201)

* [Bugfix] Fix guidance backend for Qwen models (vllm-project#16210)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>

* [V1][BugFix] Exit properly if engine core fails during startup (vllm-project#16137)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [Misc] add description attribute in CLI (vllm-project#15921)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix][V0] XGrammar structured output supports Enum (vllm-project#15878)

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

* Torchao (vllm-project#14231)

Signed-off-by: drisspg <drisspguessous@gmail.com>

* [ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (vllm-project#16031)

Signed-off-by: mgoin <michael@neuralmagic.com>

* [core] do not send error across process (vllm-project#16174)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Misc] Update compressed-tensors to version 0.9.3 (vllm-project#16196)

Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>

* Update BASE_IMAGE to 2.22 release of Neuron (vllm-project#16218)

* [V1] Scatter and gather placeholders in the model runner (vllm-project#16076)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>

* [Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (vllm-project#16161)

* Add warning for Attention backends that do not support irope yet (vllm-project#16212)

* [Bugfix] Do not skip "empty" parts of chats that are parsable (vllm-project#16219)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (vllm-project#16194)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [torch.compile][TPU] Make @support_torch_compile work for XLA backend (vllm-project#15782)

Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

* [V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (vllm-project#15837)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Misc] Merge the logs of pp layers partitions (vllm-project#16225)

Signed-off-by: Kebe <mail@kebe7jun.com>

* [Docs] Add Slides from Singapore Meetup (vllm-project#16213)

Signed-off-by: simon-mo <simon.mo@hey.com>

* [Misc] format and refactor some examples (vllm-project#16252)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Add warning for multimodal data in LLM.beam_search (vllm-project#16241)

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* [Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (vllm-project#16203)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (vllm-project#16247)

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* [Bugfix] Remove triton do_bench fast_flush arg (vllm-project#16256)

Signed-off-by: Kebe <mail@kebe7jun.com>

* Update to transformers==4.51.1 (vllm-project#16257)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [New Model]: jinaai/jina-embeddings-v3 (vllm-project#16120)

* [Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (vllm-project#16272)

Signed-off-by: imkero <kerorek@outlook.com>

* [Bugfix] Proper input validation for multi-modal encoder-decoder models (vllm-project#16156)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (vllm-project#15328)

Signed-off-by: Isotr0py <2037008807@qq.com>

* Add warning that content below line in template will be removed (vllm-project#16276)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (vllm-project#16209)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

* [Bugfix] fix deepseek fp16 scale bug (vllm-project#14809)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

* [V1] Update structured output offline inference example (vllm-project#15721)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [CI/Build] Fix CI LoRA failure (vllm-project#16270)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Add support to modelopt quantization of Mixtral model (vllm-project#15961)

Signed-off-by: Yue <yueshen@nvidia.com>

* [Model] Add smolvlm support (vllm-project#16017)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (vllm-project#16198)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>

* [Bugfix] fix gettid method is not define (vllm-project#16084)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Feature] Estimate max-model-len use available KV cache memory (vllm-project#16168)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Core] Upgrade to xgrammar 0.1.18, add cache size limit (vllm-project#16283)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (vllm-project#16221)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [TPU] Update PyTorch/XLA (vllm-project#16288)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [BugFix] Fix fusion test and add them to CI (vllm-project#16287)

Signed-off-by: luka <luka@neuralmagic.com>

* [Misc] Fix test_sharded_state_loader.py(vllm-project#16004) (vllm-project#16005)

Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>

* [Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (vllm-project#16273)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Update label-tpu mergify and remove removal bot (vllm-project#16298)

* [BugFix] logger is not callable (vllm-project#16312)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [BugFix] llama4 qknorm should be not shared across head (vllm-project#16311)

Signed-off-by: Lu Fang <fanglu@fb.com>

* update neuron config (vllm-project#16289)

Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>

* [BugFix] fix some typos found by typos. (vllm-project#16314)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [Model] Add `SupportsMultiModal.get_language_model` interface (vllm-project#16007)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Bugfix][Frontend] respect provided default guided decoding backend (vllm-project#15476)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* Revert "Update label-tpu mergify and remove removal bot" (vllm-project#16350)

* [Bugfix] Fix profiling.py (vllm-project#16202)

Signed-off-by: zh Wang <rekind133@outlook.com>

* [Bugfix] catch AssertionError in MistralTokenizer as ValueError (vllm-project#16344)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [CI]Fix hpu docker and numpy version for CI (vllm-project#16355)

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

* Fix `benchmark_throughput.py --backend=hf` (vllm-project#16352)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Build/CI] Add tracing deps to vllm container image (vllm-project#15224)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [Hardware] add platform-specific request validation api (vllm-project#16291)

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

* [Misc] refactor Structured Outputs example (vllm-project#16322)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (vllm-project#16275)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* Add GLM-4-0414 support (vllm-project#16338)

Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: yihong <zouzou0208@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (vllm-project#14094)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (vllm-project#16325)

Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>

* [TPU] Fix dummy loading OOM (vllm-project#16372)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [bugfix] Avoid the time consumption caused by creating dummy videos. (vllm-project#16371)

* [CI][Bugfix] Pin triton version for CPU (vllm-project#16384)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [misc] use tqdm.auto where appropriate (vllm-project#16290)

Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>

* [Bugfix][TPU] Fix TPU validate_request (vllm-project#16369)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* fix sonnet dataset sample when prefix len is very small (vllm-project#16379)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Model] use AutoWeightsLoader for deepseek_v2, internlm2 (vllm-project#16383)

Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>

* [Misc] Update transformers version limits of multi-modal tests (vllm-project#16381)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix validation error for text-only Mllama 3.2 (vllm-project#16377)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (vllm-project#16038)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [doc] add download model tips (vllm-project#16389)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Update Numba to 0.61.2 (vllm-project#16376)

Signed-off-by: cyy <cyyever@outlook.com>

* [Model] Remove image mm limit for LLaMa4  (vllm-project#16365)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

* [doc] update the wrong link (vllm-project#16401)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [CI] Add auto update workflow for Dockerfile graph (vllm-project#11879)

Signed-off-by: wineandchord <guoqizhou19@gmail.com>

* Fix the torch version parsing logic (vllm-project#15857)

* [VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (vllm-project#16408)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [TPU][V1] Use `language_model` interface for getting text backbone in MM (vllm-project#16410)

Signed-off-by: NickLucche <nlucches@redhat.com>

* Improve configs - `ParallelConfig` (vllm-project#16332)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [V1] Set structured output backend to `auto` by default (vllm-project#15724)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [V1][Spec Decode] Eagle Model loading (vllm-project#16035)

Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>

* [Bugfix] Fix bug when dataset is json (vllm-project#15899)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (vllm-project#15423)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* [V1] Zero-copy tensor/ndarray serialization/transmission (vllm-project#13790)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [VLM] Avoid unnecessary dummy multimodal data during processing (vllm-project#16416)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix output token length check logic (vllm-project#16419)

Signed-off-by: look <eeslook@163.com>

* [TPU][V1] Disable per-request seed/Generator (vllm-project#16172)

Signed-off-by: NickLucche <nlucches@redhat.com>

* Fix range_ratio Bug in RandomDataset (vllm-project#16126)

Signed-off-by: jadewang21 <jadewangcn@outlook.com>

* check input length of sonnet samples (vllm-project#16423)

Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>

* update benchmark_serving_structured_output to include auto backend (vllm-project#16438)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Llama4] Enable attention temperature tuning by default for long context (>32k) (vllm-project#16439)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>

* Update supported_hardware.md for TPU INT8 (vllm-project#16437)

* [Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (vllm-project#16424)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [CPU][Bugfix] Fix CPU docker issues (vllm-project#16454)

Signed-off-by: jiang.li <jiang1.li@intel.com>

* [Bugfix] Don't set an upper bound on repetition penalty (vllm-project#16403)

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

* Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (vllm-project#16453)

* [Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (vllm-project#15990)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (vllm-project#16447)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Misc] Raise error for V1 not supporting Long LoRA. (vllm-project#16415)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Misc] update api_client example (vllm-project#16459)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Don't install triton on `ppc64le` platform (vllm-project#16470)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Kernel] support merge_attn_states CUDA kernel, 3x speedup (vllm-project#16173)

Signed-off-by: DefTruth <qiustudent_r@163.com>

* [Bugfix] Fix bugs of running Quark quantized models (vllm-project#16236)

Signed-off-by: chaow <chaow@amd.com>

* [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (vllm-project#12779)

Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>

* Fix erroneous "model doesn't support compile" warning (vllm-project#16486)

Signed-off-by: rzou <zou3519@gmail.com>

* [TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (vllm-project#16483)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (vllm-project#16366)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Doc] Document InternVL3 support (vllm-project#16495)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Bugfix] handle alignment of encoder_seq_lens in mllama.py (vllm-project#14784)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* Improve configs - `LoadConfig` (vllm-project#16422)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Frontend] Added chat templates for LLaMa4 pythonic tool calling (vllm-project#16463)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>

* [Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100  (vllm-project#16488)

* Update openai_compatible_server.md (vllm-project#16507)

Signed-off-by: Christian Sears <csears@redhat.com>

* [Bugfix] clean up duplicated code (vllm-project#16485)

Signed-off-by: Gogs <gogs@fake.local>
Co-authored-by: Gogs <gogs@fake.local>

* Bugfix for PixtralHF models without spatial_merge_size (vllm-project#16513)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Doc] Fix link to vLLM blog (vllm-project#16519)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* [CI][Bugfix] Add mistral_tool_use to Ci (vllm-project#16517)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [BugFix] Handle non-contiguous tensors properly when serializing (vllm-project#16492)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [Doc] Update Llama4 Model Names in Supported Models (vllm-project#16509)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

* Optimized topk for topk=1 (Llama-4) (vllm-project#16512)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Feature][V1] Add xgrammar to support minLength, maxLength with test (vllm-project#16516)

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

* [Frontend] support matryoshka representation / support embedding API dimensions (vllm-project#16331)

* fix: spelling (vllm-project#16466)

Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>

* [Misc] Update chat utils tests (vllm-project#16520)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] Openai transcription client example use same Whisper model (vllm-project#16487)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [V1] Enable multi-input by default (vllm-project#15799)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [MISC] Make GroupCoordinator compatible with out-of-tree devices (vllm-project#16464)

Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>

* [Misc] Delete redundant code (vllm-project#16530)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* Fix syntaxWarning: invalid escape sequence '\s' (vllm-project#16532)

Signed-off-by: Jie Fu <jiefu@tencent.com>

* [Perf] Optimize Preparing Inputs for GPU Model Runner (vllm-project#16484)

Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

* [Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (vllm-project#16529)

Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>

* [V1][Spec Decode] KV cache slots for eagle heads (vllm-project#16370)

Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>

* Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (vllm-project#16537)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (vllm-project#16556)

* [Core][V0] Enable regex support with xgrammar (vllm-project#13228)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: chun37 <chun.jb.37@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
Signed-off-by: Eric <erictang000@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Ben Jackson <ben@ben.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Signed-off-by: WangErXiao <863579016@qq.com>
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Signed-off-by: imkero <kerorek@outlook.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Yue <yueshen@nvidia.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Signed-off-by: zh Wang <rekind133@outlook.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>
Signed-off-by: Chenyaaang <chenyangli@google.com>
Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: wineandchord <guoqizhou19@gmail.com>
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: look <eeslook@163.com>
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: DefTruth <qiustudent_r@163.com>
Signed-off-by: chaow <chaow@amd.com>
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Christian Sears <csears@redhat.com>
Signed-off-by: Gogs <gogs@fake.local>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>
Signed-off-by: Jie Fu <jiefu@tencent.com>
Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: chun <chun.jb.37@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Li Wang <wangli858794774@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Chris Thi <chris.c.thi@gmail.com>
Co-authored-by: LukasBluebaum <38468743+LukasBluebaum@users.noreply.github.com>
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Matthias Matt <37695050+meffmadd@users.noreply.github.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: rongfu.leng <lenronfu@gmail.com>
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Hyesoo Yang <45211235+hyeygit@users.noreply.github.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: yihong <zouzou0208@gmail.com>
Co-authored-by: Ziji Shi (Steven) <shi.ziji.sm@gmail.com>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: yarongmu-google <150371854+yarongmu-google@users.noreply.github.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: iefgnoix <isaacwxf23@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Jonghyun Choe <andy.choe729@gmail.com>
Co-authored-by: liuzhenwei <zhenweiliu@habana.ai>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tristan Leclercq <49700633+tristanleclercq@users.noreply.github.com>
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Ben Jackson <ben@ben.com>
Co-authored-by: Paul Schweigert <paul@paulschweigert.com>
Co-authored-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: paolovic <91155454+paolovic@users.noreply.github.com>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Robin <863579016@qq.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: leon-seidel <83984854+leon-seidel@users.noreply.github.com>
Co-authored-by: Driss Guessous <32754868+drisspg@users.noreply.github.com>
Co-authored-by: Miles Williams <42222518+mlsw@users.noreply.github.com>
Co-authored-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: zxfan-cpu <zxfanzhang@tencent.com>
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: TY-AMD <tianyuan.wu@amd.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Kero Liang <kerorek@outlook.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yueshen2016 <39203804+yueshen2016@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com>
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: zh Wang <rekind133@outlook.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: Aaron Ang <67321817+aaron-ang@users.noreply.github.com>
Co-authored-by: Jintao <huangjintao@mail.ustc.edu.cn>
Co-authored-by: Benjamin Kitor <bkitor@gigaio.com>
Co-authored-by: Chenyaaang <42742451+Chenyaaang@users.noreply.github.com>
Co-authored-by: cyyever <cyyever@outlook.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: wineandchord <guoqizhou123123@qq.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: look <eeslook@163.com>
Co-authored-by: WWW <jadewangcn@outlook.com>
Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com>
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
Co-authored-by: chaow-amd <chaow@amd.com>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Christian Sears <117944059+Chr1st1anSears@users.noreply.github.com>
Co-authored-by: Gogs <gogs@fake.local>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Huazhong Ji <hzji210@gmail.com>
Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com>
Co-authored-by: SnowCharm <qiuyilun@u.nus.edu>
Co-authored-by: Ryan McConville <ryan@ryanmcconville.com>
sfc-gh-mhidayetoglu added a commit to sfc-gh-mhidayetoglu/vllm that referenced this pull request May 1, 2025
* [V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (vllm-project#15907)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

* [Bugfix] Fix imports for MoE on CPU (vllm-project#15841)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (vllm-project#15902)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Doc] Update rocm.inc.md (vllm-project#15917)

Signed-off-by: chun37 <chun.jb.37@gmail.com>

* [V1][Bugfix] Fix typo in MoE TPU checking (vllm-project#15927)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [Benchmark]Fix error message (vllm-project#15866)

Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

* [Misc] Replace print with logger (vllm-project#15923)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [CI/Build] Further clean up LoRA tests (vllm-project#15920)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Bugfix] Fix cache block size calculation for CPU MLA (vllm-project#15848)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [Build/CI] Update lm-eval to 0.4.8 (vllm-project#15912)

Signed-off-by: Chris Thi <chris.c.thi@gmail.com>

* [Kernel] Add more dtype support for GGUF dequantization (vllm-project#15879)

Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>

* [core] Add tags parameter to wake_up() (vllm-project#15500)

Signed-off-by: Eric <erictang000@gmail.com>

* [V1] Fix json_object support with xgrammar (vllm-project#15488)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* Add minimum version for `huggingface_hub` to enable Xet downloads (vllm-project#15873)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (vllm-project#15926)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>

* [CI] Remove duplicate entrypoints-test (vllm-project#15940)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (vllm-project#15938)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Metrics] Hide deprecated metrics (vllm-project#15458)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* [Frontend] Implement Tool Calling with `tool_choice='required'` (vllm-project#13483)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>

* [CPU][Bugfix] Using custom allreduce for CPU backend (vllm-project#15934)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Model] use AutoWeightsLoader in model load_weights (vllm-project#15770)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Misc] V1 LoRA support CPU offload (vllm-project#15843)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Restricted cmake to be less than version 4 as 4.x breaks the build of… (vllm-project#15859)

Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>

* [misc] instruct pytorch to use nvml-based cuda check (vllm-project#15951)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [V1] Support Mistral3 in V1 (vllm-project#15950)

Signed-off-by: mgoin <mgoin64@gmail.com>

* Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (vllm-project#15969)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (vllm-project#15736)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>

* [TPU] optimize the all-reduce performance (vllm-project#15903)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [V1][TPU] Do not compile sampling more than needed (vllm-project#15883)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [ROCM][KERNEL] Paged attention for V1 (vllm-project#15720)

Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>

* fix: better error message for get_config close vllm-project#13889 (vllm-project#15943)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [bugfix] add seed in torchrun_example.py (vllm-project#15980)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [ROCM][V0] PA kennel selection when no sliding window provided (vllm-project#15982)

Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>

* [Benchmark] Add AIMO Dataset to Benchmark (vllm-project#15955)

Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>

* [misc] improve error message for "Failed to infer device type" (vllm-project#15994)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (vllm-project#15367)

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

* [doc] update contribution link (vllm-project#15922)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* fix: tiny fix make format.sh excutable (vllm-project#16015)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [SupportsQuant] Bert, Blip, Blip2, Bloom (vllm-project#15573)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [SupportsQuant] Chameleon, Chatglm, Commandr (vllm-project#15952)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Neuron][kernel] Fuse kv cache into a single tensor (vllm-project#15911)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>

* [Minor] Fused experts refactor (vllm-project#15914)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [Misc][Performance] Advance tpu.txt to the most recent nightly torch … (vllm-project#16024)

* Re-enable the AMD Testing for the passing tests. (vllm-project#15586)

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (vllm-project#15732)

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

* [TPU] Switch Test to Non-Sliding Window (vllm-project#15981)

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>

* [Bugfix] Fix function names in test_block_fp8.py (vllm-project#16033)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [ROCm] Tweak the benchmark script to run on ROCm (vllm-project#14252)

* [Misc] improve gguf check (vllm-project#15974)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU][V1] Remove ragged attention kernel parameter hard coding (vllm-project#16041)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* doc: add info for macos clang errors (vllm-project#16049)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [V1][Spec Decode] Avoid logging useless nan metrics (vllm-project#16023)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* [Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (vllm-project#15939)

Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>

* [Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (vllm-project#15945)

Signed-off-by: zhenwei <zhenweiliu@habana.ai>

* [Bugfix][kernels] Fix half2float conversion in gguf kernels (vllm-project#15995)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Benchmark][Doc] Update throughput benchmark and README (vllm-project#15998)

Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

* [CPU] Change default block_size for CPU backend (vllm-project#16002)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Distributed] [ROCM] Fix custom allreduce enable checks (vllm-project#16010)

Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>

* [ROCm][Bugfix] Use platform specific FP8 dtype (vllm-project#15717)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [ROCm][Bugfix] Bring back fallback to eager mode removed in vllm-project#14917, but for ROCm only (vllm-project#15413)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Bugfix] Fix default behavior/fallback for pp in v1 (vllm-project#16057)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [CI] Reorganize .buildkite directory (vllm-project#16001)

Signed-off-by: kevin <kevin@anyscale.com>

* [V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (vllm-project#15906)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [V1] Scatter and gather placeholders in the model runner (vllm-project#15712)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

* Revert "[V1] Scatter and gather placeholders in the model runner" (vllm-project#16075)

* [Kernel][Minor] Re-fuse triton moe weight application (vllm-project#16071)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [Bugfix][TPU] Fix V1 TPU worker for sliding window (vllm-project#16059)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [V1][Spec Decode] Update N-gram Proposer Interface (vllm-project#15750)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Misc] Auto detect bitsandbytes pre-quantized models (vllm-project#16027)

Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>

* [CI] Fix benchmark script level (vllm-project#16089)

* fix: support clang17 for macos and fix the real libomp (vllm-project#16086)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [doc] fix 404 (vllm-project#16082)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "doc: add info for macos clang errors (vllm-project#16049)" (vllm-project#16091)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* Fix some capitalisations in generated examples doc titles (vllm-project#16094)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Misc] format output for encoder_decoder.py (vllm-project#16095)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Remove redundant code (vllm-project#16098)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

* [Model] use AutoWeightsLoader for phi, gemma, deepseek (vllm-project#16088)

Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>

* [Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (vllm-project#16112)

Signed-off-by: Lu Fang <fanglu@fb.com>

* [Benchmark] Add sampling parameters to benchmark_serving. (vllm-project#16022)

Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>

* [Frontend] Fix typo in tool chat templates for llama3.2 and toolace (vllm-project#14501)

Signed-off-by: Ben Jackson <ben@ben.com>

* [CI][V1] Fix passing `tokenizer` as kwarg to `validate_guidance_grammar` (vllm-project#16117)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [Misc] refactor example eagle (vllm-project#16100)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Doc][Bugfix] Add missing EOF in k8s deploy doc (vllm-project#16025)

* [Misc] Improve model redirect to accept json dictionary (vllm-project#16119)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (vllm-project#16103)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Bugfix] LoRA : Fix the order in which the kernels process LoRAs  (vllm-project#16040)

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

* [Bugfix] add hf_token to EngineArgs (vllm-project#16093)

Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>

* [Misc] update requires-python in pyproject.toml (vllm-project#16116)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU] Update PyTorch/XLA (vllm-project#16130)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [V1][Minor] Optimize get_cached_block (vllm-project#16135)

* Fix requires-python (vllm-project#16132)

* [Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (vllm-project#15202)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [V1][Minor] Minor simplification for get_computed_blocks  (vllm-project#16139)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Misc] Update Mistral-3.1 example (vllm-project#16147)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (vllm-project#16129)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [CI] Set max transformers version for Ultravox model test  (vllm-project#16149)

Signed-off-by: Roger Wang <ywang@roblox.com>

* doc: fix some typos in doc (vllm-project#16154)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [VLM] Florence-2 supports online serving (vllm-project#16164)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [V1][Structured Output] Add `supports_structured_output()` method to Platform (vllm-project#16148)

Signed-off-by: shen-shanshan <467638484@qq.com>

* [Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [Misc] improve example mlpspeculator and llm_engine_example (vllm-project#16175)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Doc]Update image to latest version (vllm-project#16186)

Signed-off-by: WangErXiao <863579016@qq.com>

* Upstream Llama4 Support to Main (vllm-project#16113)

Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (vllm-project#16187)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [V1] Revert the default `max_num_seqs` to V0 values for most hardware (vllm-project#16158)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Print the warning only once (vllm-project#16193)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Misc] Human-readable `max-model-len` cli arg (vllm-project#16181)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [Misc] Move Llama 4 projector call into encoder execution (vllm-project#16201)

* [Bugfix] Fix guidance backend for Qwen models (vllm-project#16210)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>

* [V1][BugFix] Exit properly if engine core fails during startup (vllm-project#16137)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [Misc] add description attribute in CLI (vllm-project#15921)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix][V0] XGrammar structured output supports Enum (vllm-project#15878)

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

* Torchao (vllm-project#14231)

Signed-off-by: drisspg <drisspguessous@gmail.com>

* [ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (vllm-project#16031)

Signed-off-by: mgoin <michael@neuralmagic.com>

* [core] do not send error across process (vllm-project#16174)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Misc] Update compressed-tensors to version 0.9.3 (vllm-project#16196)

Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>

* Update BASE_IMAGE to 2.22 release of Neuron (vllm-project#16218)

* [V1] Scatter and gather placeholders in the model runner (vllm-project#16076)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>

* [Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (vllm-project#16161)

* Add warning for Attention backends that do not support irope yet (vllm-project#16212)

* [Bugfix] Do not skip "empty" parts of chats that are parsable (vllm-project#16219)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (vllm-project#16194)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [torch.compile][TPU] Make @support_torch_compile work for XLA backend (vllm-project#15782)

Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

* [V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (vllm-project#15837)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Misc] Merge the logs of pp layers partitions (vllm-project#16225)

Signed-off-by: Kebe <mail@kebe7jun.com>

* [Docs] Add Slides from Singapore Meetup (vllm-project#16213)

Signed-off-by: simon-mo <simon.mo@hey.com>

* [Misc] format and refactor some examples (vllm-project#16252)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Add warning for multimodal data in LLM.beam_search (vllm-project#16241)

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* [Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (vllm-project#16203)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (vllm-project#16247)

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* [Bugfix] Remove triton do_bench fast_flush arg (vllm-project#16256)

Signed-off-by: Kebe <mail@kebe7jun.com>

* Update to transformers==4.51.1 (vllm-project#16257)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [New Model]: jinaai/jina-embeddings-v3 (vllm-project#16120)

* [Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (vllm-project#16272)

Signed-off-by: imkero <kerorek@outlook.com>

* [Bugfix] Proper input validation for multi-modal encoder-decoder models (vllm-project#16156)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (vllm-project#15328)

Signed-off-by: Isotr0py <2037008807@qq.com>

* Add warning that content below line in template will be removed (vllm-project#16276)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (vllm-project#16209)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

* [Bugfix] fix deepseek fp16 scale bug (vllm-project#14809)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

* [V1] Update structured output offline inference example (vllm-project#15721)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [CI/Build] Fix CI LoRA failure (vllm-project#16270)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Add support to modelopt quantization of Mixtral model (vllm-project#15961)

Signed-off-by: Yue <yueshen@nvidia.com>

* [Model] Add smolvlm support (vllm-project#16017)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (vllm-project#16198)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>

* [Bugfix] fix gettid method is not define (vllm-project#16084)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Feature] Estimate max-model-len use available KV cache memory (vllm-project#16168)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

* [Core] Upgrade to xgrammar 0.1.18, add cache size limit (vllm-project#16283)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (vllm-project#16221)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [TPU] Update PyTorch/XLA (vllm-project#16288)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [BugFix] Fix fusion test and add them to CI (vllm-project#16287)

Signed-off-by: luka <luka@neuralmagic.com>

* [Misc] Fix test_sharded_state_loader.py(vllm-project#16004) (vllm-project#16005)

Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>

* [Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (vllm-project#16273)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Update label-tpu mergify and remove removal bot (vllm-project#16298)

* [BugFix] logger is not callable (vllm-project#16312)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [BugFix] llama4 qknorm should be not shared across head (vllm-project#16311)

Signed-off-by: Lu Fang <fanglu@fb.com>

* update neuron config (vllm-project#16289)

Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>

* [BugFix] fix some typos found by typos. (vllm-project#16314)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* [Model] Add `SupportsMultiModal.get_language_model` interface (vllm-project#16007)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Bugfix][Frontend] respect provided default guided decoding backend (vllm-project#15476)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* Revert "Update label-tpu mergify and remove removal bot" (vllm-project#16350)

* [Bugfix] Fix profiling.py (vllm-project#16202)

Signed-off-by: zh Wang <rekind133@outlook.com>

* [Bugfix] catch AssertionError in MistralTokenizer as ValueError (vllm-project#16344)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [CI]Fix hpu docker and numpy version for CI (vllm-project#16355)

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

* Fix `benchmark_throughput.py --backend=hf` (vllm-project#16352)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Build/CI] Add tracing deps to vllm container image (vllm-project#15224)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [Hardware] add platform-specific request validation api (vllm-project#16291)

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

* [Misc] refactor Structured Outputs example (vllm-project#16322)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (vllm-project#16275)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* Add GLM-4-0414 support (vllm-project#16338)

Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: yihong <zouzou0208@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (vllm-project#14094)

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

* [Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (vllm-project#16325)

Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>

* [TPU] Fix dummy loading OOM (vllm-project#16372)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

* [bugfix] Avoid the time consumption caused by creating dummy videos. (vllm-project#16371)

* [CI][Bugfix] Pin triton version for CPU (vllm-project#16384)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [misc] use tqdm.auto where appropriate (vllm-project#16290)

Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>

* [Bugfix][TPU] Fix TPU validate_request (vllm-project#16369)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* fix sonnet dataset sample when prefix len is very small (vllm-project#16379)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Model] use AutoWeightsLoader for deepseek_v2, internlm2 (vllm-project#16383)

Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>

* [Misc] Update transformers version limits of multi-modal tests (vllm-project#16381)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix validation error for text-only Mllama 3.2 (vllm-project#16377)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (vllm-project#16038)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [doc] add download model tips (vllm-project#16389)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Update Numba to 0.61.2 (vllm-project#16376)

Signed-off-by: cyy <cyyever@outlook.com>

* [Model] Remove image mm limit for LLaMa4  (vllm-project#16365)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

* [doc] update the wrong link (vllm-project#16401)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [CI] Add auto update workflow for Dockerfile graph (vllm-project#11879)

Signed-off-by: wineandchord <guoqizhou19@gmail.com>

* Fix the torch version parsing logic (vllm-project#15857)

* [VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (vllm-project#16408)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [TPU][V1] Use `language_model` interface for getting text backbone in MM (vllm-project#16410)

Signed-off-by: NickLucche <nlucches@redhat.com>

* Improve configs - `ParallelConfig` (vllm-project#16332)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [V1] Set structured output backend to `auto` by default (vllm-project#15724)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [V1][Spec Decode] Eagle Model loading (vllm-project#16035)

Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>

* [Bugfix] Fix bug when dataset is json (vllm-project#15899)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (vllm-project#15423)

Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* [V1] Zero-copy tensor/ndarray serialization/transmission (vllm-project#13790)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [VLM] Avoid unnecessary dummy multimodal data during processing (vllm-project#16416)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix output token length check logic (vllm-project#16419)

Signed-off-by: look <eeslook@163.com>

* [TPU][V1] Disable per-request seed/Generator (vllm-project#16172)

Signed-off-by: NickLucche <nlucches@redhat.com>

* Fix range_ratio Bug in RandomDataset (vllm-project#16126)

Signed-off-by: jadewang21 <jadewangcn@outlook.com>

* check input length of sonnet samples (vllm-project#16423)

Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>

* update benchmark_serving_structured_output to include auto backend (vllm-project#16438)

Signed-off-by: Chenyaaang <chenyangli@google.com>

* [Llama4] Enable attention temperature tuning by default for long context (>32k) (vllm-project#16439)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>

* Update supported_hardware.md for TPU INT8 (vllm-project#16437)

* [Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (vllm-project#16424)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [CPU][Bugfix] Fix CPU docker issues (vllm-project#16454)

Signed-off-by: jiang.li <jiang1.li@intel.com>

* [Bugfix] Don't set an upper bound on repetition penalty (vllm-project#16403)

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

* Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (vllm-project#16453)

* [Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (vllm-project#15990)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (vllm-project#16447)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Misc] Raise error for V1 not supporting Long LoRA. (vllm-project#16415)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Misc] update api_client example (vllm-project#16459)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Don't install triton on `ppc64le` platform (vllm-project#16470)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Kernel] support merge_attn_states CUDA kernel, 3x speedup (vllm-project#16173)

Signed-off-by: DefTruth <qiustudent_r@163.com>

* [Bugfix] Fix bugs of running Quark quantized models (vllm-project#16236)

Signed-off-by: chaow <chaow@amd.com>

* [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (vllm-project#12779)

Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>

* Fix erroneous "model doesn't support compile" warning (vllm-project#16486)

Signed-off-by: rzou <zou3519@gmail.com>

* [TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (vllm-project#16483)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (vllm-project#16366)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Doc] Document InternVL3 support (vllm-project#16495)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Bugfix] handle alignment of encoder_seq_lens in mllama.py (vllm-project#14784)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* Improve configs - `LoadConfig` (vllm-project#16422)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Frontend] Added chat templates for LLaMa4 pythonic tool calling (vllm-project#16463)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>

* [Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100  (vllm-project#16488)

* Update openai_compatible_server.md (vllm-project#16507)

Signed-off-by: Christian Sears <csears@redhat.com>

* [Bugfix] clean up duplicated code (vllm-project#16485)

Signed-off-by: Gogs <gogs@fake.local>
Co-authored-by: Gogs <gogs@fake.local>

* Bugfix for PixtralHF models without spatial_merge_size (vllm-project#16513)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Doc] Fix link to vLLM blog (vllm-project#16519)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* [CI][Bugfix] Add mistral_tool_use to Ci (vllm-project#16517)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [BugFix] Handle non-contiguous tensors properly when serializing (vllm-project#16492)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [Doc] Update Llama4 Model Names in Supported Models (vllm-project#16509)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

* Optimized topk for topk=1 (Llama-4) (vllm-project#16512)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Feature][V1] Add xgrammar to support minLength, maxLength with test (vllm-project#16516)

Signed-off-by: Leon Seidel <leon.seidel@fau.de>

* [Frontend] support matryoshka representation / support embedding API dimensions (vllm-project#16331)

* fix: spelling (vllm-project#16466)

Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>

* [Misc] Update chat utils tests (vllm-project#16520)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] Openai transcription client example use same Whisper model (vllm-project#16487)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [V1] Enable multi-input by default (vllm-project#15799)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [MISC] Make GroupCoordinator compatible with out-of-tree devices (vllm-project#16464)

Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>

* [Misc] Delete redundant code (vllm-project#16530)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* Fix syntaxWarning: invalid escape sequence '\s' (vllm-project#16532)

Signed-off-by: Jie Fu <jiefu@tencent.com>

* [Perf] Optimize Preparing Inputs for GPU Model Runner (vllm-project#16484)

Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

* [Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (vllm-project#16529)

Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>

* [V1][Spec Decode] KV cache slots for eagle heads (vllm-project#16370)

Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>

* Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (vllm-project#16537)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (vllm-project#16556)

* [Core][V0] Enable regex support with xgrammar (vllm-project#13228)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* capture only SP * batch_size <= max_batch_size case to cover small max_batch_size

---------

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: chun37 <chun.jb.37@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
Signed-off-by: Eric <erictang000@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Ben Jackson <ben@ben.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Signed-off-by: WangErXiao <863579016@qq.com>
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Signed-off-by: imkero <kerorek@outlook.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Yue <yueshen@nvidia.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Signed-off-by: zh Wang <rekind133@outlook.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>
Signed-off-by: Chenyaaang <chenyangli@google.com>
Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: wineandchord <guoqizhou19@gmail.com>
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: look <eeslook@163.com>
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: DefTruth <qiustudent_r@163.com>
Signed-off-by: chaow <chaow@amd.com>
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Christian Sears <csears@redhat.com>
Signed-off-by: Gogs <gogs@fake.local>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>
Signed-off-by: Jie Fu <jiefu@tencent.com>
Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: chun <chun.jb.37@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Li Wang <wangli858794774@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Chris Thi <chris.c.thi@gmail.com>
Co-authored-by: LukasBluebaum <38468743+LukasBluebaum@users.noreply.github.com>
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Matthias Matt <37695050+meffmadd@users.noreply.github.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: rongfu.leng <lenronfu@gmail.com>
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Hyesoo Yang <45211235+hyeygit@users.noreply.github.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: yihong <zouzou0208@gmail.com>
Co-authored-by: Ziji Shi (Steven) <shi.ziji.sm@gmail.com>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: yarongmu-google <150371854+yarongmu-google@users.noreply.github.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: iefgnoix <isaacwxf23@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Jonghyun Choe <andy.choe729@gmail.com>
Co-authored-by: liuzhenwei <zhenweiliu@habana.ai>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tristan Leclercq <49700633+tristanleclercq@users.noreply.github.com>
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Ben Jackson <ben@ben.com>
Co-authored-by: Paul Schweigert <paul@paulschweigert.com>
Co-authored-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: paolovic <91155454+paolovic@users.noreply.github.com>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Robin <863579016@qq.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: leon-seidel <83984854+leon-seidel@users.noreply.github.com>
Co-authored-by: Driss Guessous <32754868+drisspg@users.noreply.github.com>
Co-authored-by: Miles Williams <42222518+mlsw@users.noreply.github.com>
Co-authored-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: zxfan-cpu <zxfanzhang@tencent.com>
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: TY-AMD <tianyuan.wu@amd.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Kero Liang <kerorek@outlook.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yueshen2016 <39203804+yueshen2016@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com>
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: zh Wang <rekind133@outlook.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: Aaron Ang <67321817+aaron-ang@users.noreply.github.com>
Co-authored-by: Jintao <huangjintao@mail.ustc.edu.cn>
Co-authored-by: Benjamin Kitor <bkitor@gigaio.com>
Co-authored-by: Chenyaaang <42742451+Chenyaaang@users.noreply.github.com>
Co-authored-by: cyyever <cyyever@outlook.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: wineandchord <guoqizhou123123@qq.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: look <eeslook@163.com>
Co-authored-by: WWW <jadewangcn@outlook.com>
Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com>
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
Co-authored-by: chaow-amd <chaow@amd.com>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Christian Sears <117944059+Chr1st1anSears@users.noreply.github.com>
Co-authored-by: Gogs <gogs@fake.local>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Huazhong Ji <hzji210@gmail.com>
Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com>
Co-authored-by: SnowCharm <qiuyilun@u.nus.edu>
Co-authored-by: Ryan McConville <ryan@ryanmcconville.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…gine (vllm-project#15946)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants