forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit 3def9c4
Fix interleaved_sliding_window config (#8)
* [Bugfix] disable processor cache (vllm-project#19068)
Signed-off-by: raushan <raushan@huggingface.co>
* [Doc] Improve the Pull Request template with key components (vllm-project#19086)
Signed-off-by: Lu Fang <lufang@fb.com>
* [Misc] Add missing `_Backend` enums (vllm-project#19081)
Signed-off-by: nicklucche <nlucches@redhat.com>
* [Misc] fix: add miss best_of param validation (vllm-project#18555)
Signed-off-by: googs1025 <googs1025@gmail.com>
* [Misc] Add SPDX-FileCopyrightText (vllm-project#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
* [Doc] Readme standardization (vllm-project#18695)
Co-authored-by: Soren Dreano <soren@numind.ai>
* [doc] update docker version (vllm-project#19074)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Kernel] DeepEP dispatch-combine kernel integration (vllm-project#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
* [V1] Support cross-layer KV sharing (vllm-project#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
* [Perf] Tune `scaled_fp8_quant` by increasing vectorization (vllm-project#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
* Fix interaction between `Optional` and `Annotated` in CLI typing (vllm-project#19093)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikun@apache.org>
* [v1] Re-init input batch for multiple kv cache groups (vllm-project#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
* [V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (vllm-project#18971)
* [Bugfix] get_num_blocks_to_allocate with null_block (vllm-project#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
* [Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (vllm-project#19075)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
* [Bugfix][P/D] Fix Prefix Cache Bug (vllm-project#18411)
Signed-off-by: nicklucche <nlucches@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
* [Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (vllm-project#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
* feat: add data parallel rank to KVEventBatch (vllm-project#18925)
* [Misc] Fix path and python alias errors in disagg_prefill exmaples (vllm-project#18919)
* [Docs] Add developer doc about CI failures (vllm-project#18782)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
* [CPU] V1 support for the CPU backend (vllm-project#16441)
* [Core] Cast multimodal input in hf processor (vllm-project#18862)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
* [KERNEL] Sampler. CUDA kernel for applying repetition penalty (vllm-project#18437)
* [Cleanup][v1]:remote guided-decoding-backend for example (vllm-project#19059)
Signed-off-by: calvin chen <120380290@qq.com>
* [NVIDIA] Add Cutlass MLA backend (vllm-project#17625)
* [Bugfix] Fix FA3 full cuda graph correctness (vllm-project#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
* Fix vllm-project#19130 (vllm-project#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
* [TPU] Skip hanging tests (vllm-project#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
* Fix ValueError: Missing value for tag key(s): model_name,engine. (vllm-project#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
* [Misc] Add packages for benchmark as extra dependency (vllm-project#19089)
Signed-off-by: Isotr0py <2037008807@qq.com>
* Improve the output precision of embedding models (vllm-project#19092)
* [CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (vllm-project#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* Add DeepSeek-R1-0528 function call chat template (vllm-project#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
* Sm100 blockwise fp8 swap ab (vllm-project#18564)
* [Doc] Update V1 Guide for embedding models (vllm-project#19141)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* Allow AsyncLLMEngine.generate to target a specific DP rank (vllm-project#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* [Bugfix][EP+DP] Fix internode check (vllm-project#19112)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
* [Perf] Tunings for SM100 FP8 CUTLASS kernel (vllm-project#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
* [TPU] Update dynamo dump file name in compilation test (vllm-project#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
* [Bugfix] fix v1 cpu worker fails on macOS (vllm-project#19121)
* [Kernel] Integrate batched/masked deepgemm kernel (vllm-project#19111)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
* [Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (vllm-project#18817)
Signed-off-by: googs1025 <googs1025@gmail.com>
* [P/D] Heterogeneous TP (vllm-project#18833)
Signed-off-by: nicklucche <nlucches@redhat.com>
* [doc] small fix (vllm-project#19167)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Bugfix][Nixl] Fix full prefix cache hit bug (vllm-project#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
* [Bugfix] Fix port handling in make_zmq_path (vllm-project#19117)
* [Torch Nightly]add missing dependency (vllm-project#18770)
Signed-off-by: Yang Wang <elainewy@meta.com>
* Handle non-serializable objects when dumping benchmark results (vllm-project#19114)
* [BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (vllm-project#19171)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
* [Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (vllm-project#19135)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
* [Build] Annotate wheel and container path for release workflow (vllm-project#19162)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [Misc] Remove unnecessary fallback to prefill-decode attention (vllm-project#19138)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
* [Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (vllm-project#19105)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
* [Frontend] improve vllm run-batch --help display (vllm-project#19187)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (vllm-project#19202)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
* [mistral_common] Add v11 tokenizer (vllm-project#19193)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (vllm-project#19205)
* [Hardware][NVIDIA] FP4 MoE kernel optimization (vllm-project#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
* [MISC][Bugfix] Use less CPU when message queue has been empty for some time (vllm-project#16226)
Signed-off-by: Povilas Kanapickas <povilas@radix.lt>
* [P/D][NixlConnector] Enable FlashInfer backend (vllm-project#19090)
* [Quantization] Skip Fp4 Test for `compressed-tensors` (vllm-project#19217)
* [V1] Use FlashInfer by default on Blackwell GPUs (vllm-project#19118)
* [Model] NemotronH support (vllm-project#18863)
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
* Fix AOPerModuleConfig name changes (vllm-project#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
* [Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (vllm-project#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
* [v1] Hybrid Memory Allocator (vllm-project#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
* [TPU] update torch_xla pin (vllm-project#19231)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
* Support allowed_token_ids in ChatCompletionRequest (vllm-project#19143)
Signed-off-by: Xu Song <xusong.vip@gmail.com>
* [Chore] update CODEOWNERS (vllm-project#19247)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
* [v1][P/D] Fix a edge case in kv cache schedule (vllm-project#19182)
Co-authored-by: jinghui <jinghui@fb.com>
* [TPU] fix kv cache dtype in model runner (vllm-project#19244)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
* [Quantization] Bump compressed-tensors version; update NVFP4A16 test model (vllm-project#19224)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
* [Docs] Improve V1 KVConnector interface documentation (vllm-project#19172)
Signed-off-by: Nick Hill <nhill@redhat.com>
* Fix CompilationConfig repr (vllm-project#19091)
Signed-off-by: rzou <zou3519@gmail.com>
* Unit Test for run_dp_sharded_vision_model (vllm-project#19103)
Signed-off-by: Siqi Yan <siqi@meta.com>
Co-authored-by: Siqi Yan <siqi@meta.com>
* [Model] Optimize nemotron_h implementation (vllm-project#19249)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
* [Core] Raise when non-multi-instance DP clients target a DP rank (vllm-project#19227)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* improve logits bias (vllm-project#19041)
* Fixed ppc build when it runs on non-RHEL based linux distros (vllm-project#18422)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
* [BugFix] Fix MultiConnector test after HMA changes (vllm-project#19291)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (vllm-project#19225)
Co-authored-by: Adolfo Victoria <adovi@meta.com>
* [Core] Fix abrupt request abort (vllm-project#18485)
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
* [BugFix] Fix tpu_model_runner block_id concatenation (vllm-project#19228)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [Misc][Tools][Benchmark] Fix and improve auto tune script (vllm-project#19163)
Signed-off-by: Chenyaaang <chenyangli@google.com>
* [Build][ROCm] Update Dockerfile.rocm (vllm-project#19296)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
* [Easy][Test] Simplify test_function_tool_use with multiple parametrizes (vllm-project#19269)
Signed-off-by: Lu Fang <lufang@fb.com>
* [Kernel] Integrate CUTLASS MoE kernel with PPLX (vllm-project#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
* [TPU][Test] Add script to run benchmark on TPU for buildkite (vllm-project#19039)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
* [CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py (vllm-project#19253)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
* Add FlexAttention to V1 (vllm-project#16078)
Signed-off-by: drisspg <drisspguessous@gmail.com>
* [Misc] refactor context extension (vllm-project#19246)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [CI/Build] Improve Llama GGUF test robustness (vllm-project#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
* [Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (vllm-project#19311)
Signed-off-by: Lifan Shen <lifans@meta.com>
* [AMD] Update compatible packaging version (vllm-project#19309)
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com>
* [BugFix][V1] Fix memory profiling bug (vllm-project#18974)
Signed-off-by: luka <luka@neuralmagic.com>
* [Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer (vllm-project#19283)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
* [Bugfix] Re-enable use_cudagraph in vLLM v1 (vllm-project#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
* [Misc] Change tests/compile to use VLLM_V1 by default (vllm-project#19302)
Signed-off-by: rzou <zou3519@gmail.com>
* Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (vllm-project#19315)
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
* [Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection (vllm-project#19082)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
* [Quantization] Add compressed-tensors NVFP4 support (vllm-project#18312)
* [Multi Modal] Add an env var for message queue max chunk bytes (vllm-project#19242)
Signed-off-by: yZhen <yZhen@fb.com>
Co-authored-by: yZhen <yZhen@fb.com>
* [Bugfix] model_max_length should consider max_model_len in tokenizer_config (vllm-project#19201)
* [Deprecation] Remove `inputs` arg fallback in Engine classes (vllm-project#18799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* [Misc] Add documentation update reminder to PR template (vllm-project#19289)
Signed-off-by: Isotr0py <2037008807@qq.com>
* [Frontend] Remove unreachable code from llm.py (vllm-project#19288)
Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com>
* [Misc] Cleanup compilation tests (vllm-project#19343)
Signed-off-by: rzou <zou3519@gmail.com>
* [doc] improve ci doc (vllm-project#19307)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Doc] Fix description in the Automatic Prefix Caching design doc (vllm-project#19333)
Signed-off-by: cr7258 <chengzw258@163.com>
* [CI/Build] Fix LoRA test (vllm-project#19350)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
* [Fix] Allow kernel compilation for CUDA capability 8.7 (vllm-project#19328)
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
* [CI] Introduce rules for llama auto-label (vllm-project#19323)
Signed-off-by: Lu Fang <lufang@fb.com>
* [Docs] Fix a bullet list in usage/security.md (vllm-project#19358)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
* [full_graph] Fix query_start_loc padding (vllm-project#19321)
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
* [v1] Add fp32 support to v1 engine through flex attn (vllm-project#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
* [Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (vllm-project#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
* [Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 (vllm-project#19348)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
* [Quantization] Bump compressed-tensors version (vllm-project#19295)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* [Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (vllm-project#18472)
Signed-off-by: liusiqian <liusiqian@tal.com>
* [TPU]Fix KV cache sharing tests (vllm-project#19371)
* [HOT-FIX] Add `kv_sharing_target_layer_name` argument to cutlass_mla backend (vllm-project#19374)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
* [Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (vllm-project#19383)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
* [V1] Reuse V0's memory_profiling util for gpu worker memory profiling (vllm-project#19312)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
* [Bugfix] Fix benchmark_moe.py (vllm-project#19016)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
* Use xla flag to improve the quantized model performance (vllm-project#19303)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
* Fix docs/mkdocs/hooks/remove_announcement.py (vllm-project#19382)
* [Frontend] Add tqdm_leave_pbar to control progress bar visibility (vllm-project#19357)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Core] Use tuple for kv cache group block ids (vllm-project#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [Bugfix] Fix modelscope token passed in (vllm-project#19389)
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
* [Core] Batch multi modal input using pinned memory (vllm-project#19169)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
* Add security warning to bug report template (vllm-project#19365)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Misc] refactor neuron_multimodal and profiling (vllm-project#19397)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* Add clear documentation around the impact of debugging flag (vllm-project#19369)
Signed-off-by: Anna Pendleton <pendleton@google.com>
* Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (vllm-project#17930)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
* Revert "[v1] Add fp32 support to v1 engine through flex attn" (vllm-project#19404)
* [BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword `use_irope` (vllm-project#19134)
Signed-off-by: Yunqiu Guo <guorachel@meta.com>
* [BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (vllm-project#19411)
Signed-off-by: jiang.li <jiang1.li@intel.com>
* Simplify ep kernels installation (vllm-project#19412)
Signed-off-by: youkaichao <youkaichao@gmail.com>
* [Misc] Slight improvement of the BNB (vllm-project#19418)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [Docs] Note that alternative structured output backends are supported (vllm-project#19426)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
* [ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (vllm-project#19440)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
* [Model] use AutoWeightsLoader for commandr (vllm-project#19399)
Signed-off-by: py-andy-c <pychen1017@gmail.com>
* Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (vllm-project#19401)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
* [BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (vllm-project#19390)
Signed-off-by: rzou <zou3519@gmail.com>
* [New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)
* [BugFix] Fix docker build cpu-dev image error (vllm-project#19394)
Signed-off-by: niu_he <carlton2tang@gmail.com>
* Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (vllm-project#19451)
Signed-off-by: Lu Fang <lufang@fb.com>
* [CI] Disable failing GGUF model test (vllm-project#19454)
Signed-off-by: mgoin <mgoin64@gmail.com>
* [Misc] Remove unused `MultiModalHasher.hash_prompt_mm_data` (vllm-project#19422)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
* Add fused MOE config for Qwen3 30B A3B on B200 (vllm-project#19455)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
* Fix Typo in Documentation and Function Name (vllm-project#19442)
* [ROCm] Add rules to automatically label ROCm related PRs (vllm-project#19405)
Signed-off-by: Lu Fang <lufang@fb.com>
* [Kernel] Support deep_gemm for linear methods (vllm-project#19085)
Signed-off-by: artetaout <lulala341@gmail.com>
* [Doc] Update V1 User Guide for Hardware and Models (vllm-project#19474)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* [Doc] Fix quantization link titles (vllm-project#19478)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* [Doc] Support "important" and "announcement" admonitions (vllm-project#19479)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* [Misc] Reduce warning message introduced in env_override (vllm-project#19476)
Signed-off-by: Lu Fang <lufang@fb.com>
* Support non-string values in JSON keys from CLI (vllm-project#19471)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* Add cache to cuda get_device_capability (vllm-project#19436)
Signed-off-by: mgoin <mgoin64@gmail.com>
* Fix some typo (vllm-project#19475)
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
* Support no privileged mode on CPU for docker and kubernetes deployments (vllm-project#19241)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
* [Bugfix] Update the example code, make it work with the latest lmcache (vllm-project#19453)
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com>
* [CI] Update FlashInfer to 0.2.6.post1 (vllm-project#19297)
Signed-off-by: mgoin <mgoin64@gmail.com>
* [doc] fix "Other AI accelerators" getting started page (vllm-project#19457)
Signed-off-by: David Xia <david@davidxia.com>
* [Misc] Fix misleading ROCm warning (vllm-project#19486)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
* [Docs] Remove WIP features in V1 guide (vllm-project#19498)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
* [Kernels] Add activation chunking logic to FusedMoEModularKernel (vllm-project#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
* [AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (vllm-project#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
* [UX] Add Feedback During CUDAGraph Capture (vllm-project#19501)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
* [CI/Build] Fix torch nightly CI dependencies (vllm-project#19505)
Signed-off-by: Richard Zou <zou3519@gmail.com>
* [CI] change spell checker from codespell to typos (vllm-project#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
* [BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (vllm-project#19514)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
* Add Triton Fused MoE kernel config for E=16 on B200 (vllm-project#19518)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
* [Frontend] Improve error message in tool_choice validation (vllm-project#19239)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
* [BugFix] Work-around incremental detokenization edge case error (vllm-project#19449)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (vllm-project#19522)
Signed-off-by: strutive07 <strutive07@gmail.com>
* [AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (vllm-project#19509)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
* Fix typo (vllm-project#19525)
Signed-off-by: 2niuhe <carlton2tang@gmail.com>
* [Security] Prevent new imports of (cloud)pickle (vllm-project#18018)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
* [Bugfix][V1] Allow manual FlashAttention for Blackwell (vllm-project#19492)
Signed-off-by: mgoin <mgoin64@gmail.com>
* [Bugfix] Respect num-gpu-blocks-override in v1 (vllm-project#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* [Quantization] Improve AWQ logic (vllm-project#19431)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
* [Doc] Add V1 column to supported models list (vllm-project#19523)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
* [V1][NixlConnector] Drop `num_blocks` check (vllm-project#19532)
Signed-off-by: NickLucche <nlucches@redhat.com>
* [Perf] Vectorize static / dynamic INT8 quant kernels (vllm-project#19233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
* Fix TorchAOConfig skip layers (vllm-project#19265)
Signed-off-by: mobicham <hicham@mobiuslabs.com>
* [torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (vllm-project#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
* [doc] Make top navigation sticky (vllm-project#19540)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (vllm-project#18847)
* [Misc] Turn MOE_DP_CHUNK_SIZE into an env var (vllm-project#19506)
* [Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant (vllm-project#19452)
Signed-off-by: mgoin <mgoin64@gmail.com>
* [Doc] Unify structured outputs examples (vllm-project#18196)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
* [V1] Resolve failed concurrent structured output requests (vllm-project#19565)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
* Revert "[Build/CI] Add tracing deps to vllm container image (vllm-project#15224)" (vllm-project#19378)
* [BugFix] : Fix Batched DeepGemm Experts (vllm-project#19515)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
* [Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-project#19570)
Signed-off-by: qizixi <qizixi@meta.com>
* [Doc] uses absolute links for structured outputs (vllm-project#19582)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
* [doc] fix incorrect link (vllm-project#19586)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Misc] Correct broken docs link (vllm-project#19553)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
* [CPU] Refine default config for the CPU backend (vllm-project#19539)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
* [Fix] bump mistral common to support magistral (vllm-project#19533)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
* [Fix] The zip function in Python 3.9 does not have the strict argument (vllm-project#19549)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
* use base version for version comparison (vllm-project#19587)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
* [torch.compile] reorganize the cache directory to support compiling multiple models (vllm-project#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
* [BugFix] Honor `enable_caching` in connector-delayed kvcache load case (vllm-project#19435)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [Model] Fix minimax model cache & lm_head precision (vllm-project#19592)
Signed-off-by: qingjun <qingjun@minimaxi.com>
* [Refactor] Remove unused variables in `moe_permute_unpermute_kernel.inl` (vllm-project#19573)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
* [doc][mkdocs] fix the duplicate Supported features sections in GPU docs (vllm-project#19606)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [CUDA] Enable full cudagraph for FlashMLA (vllm-project#18581)
Signed-off-by: luka <luka@neuralmagic.com>
* [Doc] Add troubleshooting section to k8s deployment (vllm-project#19377)
Signed-off-by: Anna Pendleton <pendleton@google.com>
* [torch.compile] Use custom ops when use_inductor=False (vllm-project#19618)
* Adding "AMD: Multi-step Tests" to amdproduction. (vllm-project#19508)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
* [BugFix] Fix DP Coordinator incorrect debug log message (vllm-project#19624)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (vllm-project#18354)
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
* [Bugfix] Fix the speculative decoding test by setting the target dtype (vllm-project#19633)
* [Misc] Modularize CLI Argument Parsing in Benchmark Scripts (vllm-project#19593)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
* [Bugfix] Fix auto dtype casting for BatchFeature (vllm-project#19316)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
* [Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (vllm-project#19500)
* Only build CUTLASS MoE kernels on Hopper (vllm-project#19648)
* [Bugfix] Don't attempt to use triton if no driver is active (vllm-project#19561)
* [Fix] Convert kv_transfer_config from dict to KVTransferConfig (vllm-project#19262)
* [Perf] Further tunings for SM100 FP8 CUTLASS kernel (vllm-project#19566)
* [Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (vllm-project#19644)
* [Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (vllm-project#19339)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
* [Benchmark] Refactor benchmark script for fp8 & int8 (vllm-project#19627)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
* Enable prefix caching with full cuda graphs (vllm-project#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
* [CI/Build] Fix torch nightly CI dependencies part 2 (vllm-project#19589)
* [Misc] Remove duplicate multiproc method setting for CPU platform (vllm-project#19649)
Signed-off-by: Isotr0py <2037008807@qq.com>
* [MISC] Remove unused variableds in C++ (vllm-project#19609)
Signed-off-by: Lu Fang <lufang@fb.com>
* [Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (vllm-project#18957)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
* [Misc][Frontend] passthrough `bad_words` (vllm-project#19564)
Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
* [Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (vllm-project#19660)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
* [TPU] support attention head dim smaller than 128 (vllm-project#19620)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
* [MISC] typo fix (vllm-project#19672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
* [CI] Add mteb testing for rerank models (vllm-project#19344)
* [Docs] Move multiproc doc to v1 dir (vllm-project#19651)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
* [Kernel] GGUF MMVQ kernel for multiple input vectors (vllm-project#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
* [BugFix] Don't catch BaseException when dumping execute_model errors (vllm-project#19626)
Signed-off-by: Nick Hill <nhill@redhat.com>
* [DOC] Add reasoning capability to vLLM streamlit code (vllm-project#19557)
* [Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. (vllm-project#19652)
Signed-off-by: Shawn Tan <shawntan@ibm.com>
* [Bugfix] Fix TP inference for Flex attention backend (vllm-project#19657)
Signed-off-by: Isotr0py <2037008807@qq.com>
* [MISC] bump huggingface_hub pkg to 0.33.0 (vllm-project#19547)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
* [Bugfix] fix missing 'finish_reason': null in streaming chat (vllm-project#19662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
* [Kernels] Use empty for modular MoE workspaces (vllm-project#19667)
Signed-off-by: Bill Nell <bnell@redhat.com>
* [Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (vllm-project#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
* [V1] Change return type on get_multimodal_embeddings() (vllm-project#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
* fix
Signed-off-by: Amog Kamsetty <amogkamsetty@gmail.com>
* remove logging
Signed-off-by: Amog Kamsetty <amogkamsetty@gmail.com>
---------
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: googs1025 <googs1025@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Varun <vsundarr@redhat.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: calvin chen <120380290@qq.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Signed-off-by: Povilas Kanapickas <povilas@radix.lt>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: Xu Song <xusong.vip@gmail.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Siqi Yan <siqi@meta.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Chenyaaang <chenyangli@google.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Lifan Shen <lifans@meta.com>
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com>
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Signed-off-by: yZhen <yZhen@fb.com>
Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com>
Signed-off-by: cr7258 <chengzw258@163.com>
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: liusiqian <liusiqian@tal.com>
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Anna Pendleton <pendleton@google.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Yunqiu Guo <guorachel@meta.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: py-andy-c <pychen1017@gmail.com>
Signed-off-by: niu_he <carlton2tang@gmail.com>
Signed-off-by: Junhao Li <junhao@ubicloud.com>
Signed-off-by: artetaout <lulala341@gmail.com>
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com>
Signed-off-by: David Xia <david@davidxia.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: strutive07 <strutive07@gmail.com>
Signed-off-by: 2niuhe <carlton2tang@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: mobicham <hicham@mobiuslabs.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Shawn Tan <shawntan@ibm.com>
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: Amog Kamsetty <amogkamsetty@gmail.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com>
Co-authored-by: Soren Dreano <soren@numind.ai>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikun@apache.org>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Calvin Chen <45745657+calvin0327@users.noreply.github.com>
Co-authored-by: Kaixi Hou <kaixih@nvidia.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Xu Wenqing <121550081+Xu-Wenqing@users.noreply.github.com>
Co-authored-by: Lain <fusiyuan2000@hotmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yang Wang <elainewy@meta.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Chiyue Wei <92623189+dubcyfor3@users.noreply.github.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Povilas Kanapickas <povilas@radix.lt>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com>
Co-authored-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Xu Song <xusong.vip@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Jinghui Zhang <jinghuizhang0804@gmail.com>
Co-authored-by: jinghui <jinghui@fb.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Siqi Yan <ysq0807@hotmail.com>
Co-authored-by: Siqi Yan <siqi@meta.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Yu Guo <82124926+yuguo68@users.noreply.github.com>
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
Co-authored-by: Adolfo Victoria <adolfokarim@gmail.com>
Co-authored-by: Adolfo Victoria <adovi@meta.com>
Co-authored-by: Chenyaaang <42742451+Chenyaaang@users.noreply.github.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: QiliangCui <derrhein@gmail.com>
Co-authored-by: Aaruni Aggarwal <47731267+AaruniAggarwal@users.noreply.github.com>
Co-authored-by: Driss Guessous <32754868+drisspg@users.noreply.github.com>
Co-authored-by: Lifans <draftbks@gmail.com>
Co-authored-by: pramenku <7664080+pramenku@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Akash kaothalkar <61960177+Akashcodes732@users.noreply.github.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: jennyyyyzhen <47012288+jennyyyyzhen@users.noreply.github.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: Kseniya Parkhamchuk <43078183+KsuParkhamchuk@users.noreply.github.com>
Co-authored-by: Se7en <chengzw258@163.com>
Co-authored-by: Conroy Cheers <conroy@corncheese.org>
Co-authored-by: Michael Yao <haifeng.yao@daocloud.io>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: liusiqian-tal <141730978+liusiqian-tal@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
Co-authored-by: XiongfeiWei <isaacwxf23@gmail.com>
Co-authored-by: Li Wang <wangli858794774@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Anna Pendleton <pendleton@google.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: py-andy-c <37168711+py-andy-c@users.noreply.github.com>
Co-authored-by: niu_he <carlton2tang@gmail.com>
Co-authored-by: Junhao Li <junhao@ubicloud.com>
Co-authored-by: leopardracer <136604165+leopardracer@users.noreply.github.com>
Co-authored-by: artetaout <128046886+artetaout@users.noreply.github.com>
Co-authored-by: Ximingwang-09 <72070413+Ximingwang-09@users.noreply.github.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: runzhen <wangrunzhen@gmail.com>
Co-authored-by: David Xia <david@davidxia.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: wonjun Jang <strutive07@gmail.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mobicham <37179323+mobicham@users.noreply.github.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
Co-authored-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: qscqesze <qingjun@minimaxi.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Saheli Bhattacharjee <47847054+sahelib25@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: maobaolong <baoloongmao@tencent.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: quanliu <33453350+quanliu1991@users.noreply.github.com>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: Francesco Bertolotti <f14.bertolotti@gmail.com>
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: Navanit Dubey <98005188+Navanit-git@users.noreply.github.com>
Co-authored-by: Shawn Tan <shawntan@ibm.com>
Co-authored-by: qscqesze <qscqesze@gmail.com>1 parent add726d commit 3def9c4Copy full SHA for 3def9c4
File tree
Expand file treeCollapse file tree
3 files changed
+4
-3
lines changedFilter options
- vllm
- model_executor/models
- v1/worker
Expand file treeCollapse file tree
3 files changed
+4
-3
lines changed+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
581 | 581 |
| |
582 | 582 |
| |
583 | 583 |
| |
584 |
| - | |
| 584 | + | |
585 | 585 |
| |
586 | 586 |
| |
587 | 587 |
| |
|
vllm/model_executor/models/transformers.py
Copy file name to clipboardExpand all lines: vllm/model_executor/models/transformers.py+2-2Lines changed: 2 additions & 2 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
311 | 311 |
| |
312 | 312 |
| |
313 | 313 |
| |
314 |
| - | |
315 |
| - | |
| 314 | + | |
| 315 | + | |
316 | 316 |
| |
317 | 317 |
| |
318 | 318 |
| |
|
vllm/v1/worker/gpu_model_runner.py
Copy file name to clipboardExpand all lines: vllm/v1/worker/gpu_model_runner.py+1Lines changed: 1 addition & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
2361 | 2361 |
| |
2362 | 2362 |
| |
2363 | 2363 |
| |
| 2364 | + | |
2364 | 2365 |
| |
2365 | 2366 |
| |
2366 | 2367 |
| |
|
0 commit comments