Skip to content

Fix #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 881 commits into from
Jun 16, 2025
Merged
Changes from all commits
Commits
Show all changes
881 commits
Select commit Hold shift + click to select a range
d81edde
[Bugfix] disable processor cache (#19068)
zucchini-nlp Jun 3, 2025
d00dd65
[Doc] Improve the Pull Request template with key components (#19086)
houseroad Jun 3, 2025
4b7817c
[Misc] Add missing `_Backend` enums (#19081)
NickLucche Jun 3, 2025
d054da1
[Misc] fix: add miss best_of param validation (#18555)
googs1025 Jun 3, 2025
02f0c7b
[Misc] Add SPDX-FileCopyrightText (#19100)
simon-mo Jun 3, 2025
19bdaf3
[Doc] Readme standardization (#18695)
SorenDreano Jun 3, 2025
01eee40
[doc] update docker version (#19074)
reidliu41 Jun 3, 2025
fa98d77
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
varun-sundar-rabindranath Jun 3, 2025
bdf1396
[V1] Support cross-layer KV sharing (#18212)
sarckk Jun 3, 2025
e31446b
[Perf] Tune `scaled_fp8_quant` by increasing vectorization (#18844)
mgoin Jun 3, 2025
6865fe0
Fix interaction between `Optional` and `Annotated` in CLI typing (#19…
hmellor Jun 3, 2025
6cac54f
[v1] Re-init input batch for multiple kv cache groups (#18654)
heheda12345 Jun 3, 2025
135cf55
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder wi…
ekagra-ranjan Jun 3, 2025
b5fd950
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
heheda12345 Jun 3, 2025
4de790f
[Bugfix]: Fix the incompatibility issue with tool_choice 'required' w…
chaunceyjiang Jun 3, 2025
5d96533
[Bugfix][P/D] Fix Prefix Cache Bug (#18411)
NickLucche Jun 3, 2025
a8da78e
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory …
heheda12345 Jun 4, 2025
b712be9
feat: add data parallel rank to KVEventBatch (#18925)
PeaBrane Jun 4, 2025
abd7df2
[Misc] Fix path and python alias errors in disagg_prefill exmaples (#…
Jeffwan Jun 4, 2025
52dceb1
[Docs] Add developer doc about CI failures (#18782)
russellb Jun 4, 2025
4555143
[CPU] V1 support for the CPU backend (#16441)
bigPYJ1151 Jun 4, 2025
1409ef9
[Core] Cast multimodal input in hf processor (#18862)
lgeiger Jun 4, 2025
5d6d1ad
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
vadiklyutiy Jun 4, 2025
8d646c2
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
calvin0327 Jun 4, 2025
41aa578
[NVIDIA] Add Cutlass MLA backend (#17625)
kaixih Jun 4, 2025
b124e10
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
WoosukKwon Jun 4, 2025
3336c8c
Fix #19130 (#19132)
princepride Jun 4, 2025
8e972d9
[TPU] Skip hanging tests (#19115)
lsy323 Jun 4, 2025
2669a0d
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19…
eicherseiji Jun 4, 2025
8711bc5
[Misc] Add packages for benchmark as extra dependency (#19089)
Isotr0py Jun 4, 2025
35cf32d
Improve the output precision of embedding models (#19092)
noooop Jun 4, 2025
01dc9a7
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
DarkLight1337 Jun 4, 2025
02658c2
Add DeepSeek-R1-0528 function call chat template (#18874)
Xu-Wenqing Jun 4, 2025
5f2cd25
Sm100 blockwise fp8 swap ab (#18564)
IwakuraRein Jun 4, 2025
8f4ffbd
[Doc] Update V1 Guide for embedding models (#19141)
DarkLight1337 Jun 4, 2025
c8dcc15
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
jmswen Jun 4, 2025
d459fae
[Bugfix][EP+DP] Fix internode check (#19112)
tlrmchlsmth Jun 4, 2025
53a5a0c
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
mgoin Jun 4, 2025
7ee2590
[TPU] Update dynamo dump file name in compilation test (#19108)
lsy323 Jun 4, 2025
ef3f98b
[Bugfix] fix v1 cpu worker fails on macOS (#19121)
kebe7jun Jun 4, 2025
c3fd4d6
[Kernel] Integrate batched/masked deepgemm kernel (#19111)
varun-sundar-rabindranath Jun 4, 2025
23027e2
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in As…
googs1025 Jun 4, 2025
b2fac67
[P/D] Heterogeneous TP (#18833)
NickLucche Jun 4, 2025
78dcf56
[doc] small fix (#19167)
reidliu41 Jun 5, 2025
c56ed8b
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
robertgshaw2-redhat Jun 5, 2025
a408820
[Bugfix] Fix port handling in make_zmq_path (#19117)
mgoin Jun 5, 2025
25b918e
[Torch Nightly]add missing dependency (#18770)
yangw-dev Jun 5, 2025
0678b52
Handle non-serializable objects when dumping benchmark results (#19114)
huydhn Jun 5, 2025
af7fc84
[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171)
WoosukKwon Jun 5, 2025
8fc5750
[Bugfix]: Fix the incompatibility issue with stream when Thinking is …
chaunceyjiang Jun 5, 2025
da40380
[Build] Annotate wheel and container path for release workflow (#19162)
simon-mo Jun 5, 2025
1809308
[Misc] Remove unnecessary fallback to prefill-decode attention (#19138)
vllmellm Jun 5, 2025
188a459
[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105)
22quinn Jun 5, 2025
1aeb925
[Frontend] improve vllm run-batch --help display (#19187)
reidliu41 Jun 5, 2025
9bc8bb0
[Bugfix] properly catch PIL-related errors for vision models when inc…
gcalmettes Jun 5, 2025
f20f9f0
[mistral_common] Add v11 tokenizer (#19193)
patrickvonplaten Jun 5, 2025
ec89524
Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205)
Xu-Wenqing Jun 5, 2025
61059be
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
dubcyfor3 Jun 5, 2025
85e2b7b
[MISC][Bugfix] Use less CPU when message queue has been empty for som…
p12tic Jun 5, 2025
9ef9173
[P/D][NixlConnector] Enable FlashInfer backend (#19090)
NickLucche Jun 5, 2025
aa49f14
[Quantization] Skip Fp4 Test for `compressed-tensors` (#19217)
dsikka Jun 5, 2025
8736030
[V1] Use FlashInfer by default on Blackwell GPUs (#19118)
mgoin Jun 5, 2025
cb6d572
[Model] NemotronH support (#18863)
vegaluisjose Jun 5, 2025
c8134be
Fix AOPerModuleConfig name changes (#18869)
jerryzh168 Jun 6, 2025
3465b87
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
benchislett Jun 6, 2025
f8a1a2d
[v1] Hybrid Memory Allocator (#17996)
heheda12345 Jun 6, 2025
b61dc5f
[TPU] update torch_xla pin (#19231)
yaochengji Jun 6, 2025
3da2313
Support allowed_token_ids in ChatCompletionRequest (#19143)
xu-song Jun 6, 2025
91a2ef9
[Chore] update CODEOWNERS (#19247)
aarnphm Jun 6, 2025
90b78ec
[v1][P/D] Fix a edge case in kv cache schedule (#19182)
KingsleyZhang123 Jun 6, 2025
0d49483
[TPU] fix kv cache dtype in model runner (#19244)
yaochengji Jun 6, 2025
9487035
[Quantization] Bump compressed-tensors version; update NVFP4A16 test …
dsikka Jun 6, 2025
65c6944
[Docs] Improve V1 KVConnector interface documentation (#19172)
njhill Jun 6, 2025
da511d5
Fix CompilationConfig repr (#19091)
zou3519 Jun 6, 2025
f168b85
Unit Test for run_dp_sharded_vision_model (#19103)
cryptopic Jun 6, 2025
7661e92
[Model] Optimize nemotron_h implementation (#19249)
jeejeelee Jun 6, 2025
7353492
[Core] Raise when non-multi-instance DP clients target a DP rank (#19…
jmswen Jun 6, 2025
8267f99
improve logits bias (#19041)
yuguo68 Jun 6, 2025
94ecee6
Fixed ppc build when it runs on non-RHEL based linux distros (#18422)
npanpaliya Jun 6, 2025
aad30bd
[BugFix] Fix MultiConnector test after HMA changes (#19291)
njhill Jun 6, 2025
ca27f0f
[Bugfix][Core] Update cancellation logic in `generate()` to handle Ge…
Adolfo-Karim Jun 6, 2025
b6a3a9f
[Core] Fix abrupt request abort (#18485)
NickLucche Jun 6, 2025
46ecc57
[BugFix] Fix tpu_model_runner block_id concatenation (#19228)
njhill Jun 6, 2025
441b65d
[Misc][Tools][Benchmark] Fix and improve auto tune script (#19163)
Chenyaaang Jun 6, 2025
e010688
[Build][ROCm] Update Dockerfile.rocm (#19296)
Alexei-V-Ivanov-AMD Jun 6, 2025
6e0cd10
[Easy][Test] Simplify test_function_tool_use with multiple parametriz…
houseroad Jun 7, 2025
84166fe
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
ElizaWszola Jun 7, 2025
66c508b
[TPU][Test] Add script to run benchmark on TPU for buildkite (#19039)
QiliangCui Jun 7, 2025
c4296b1
[CI][PowerPC] Use a more appropriate way to select testcase in tests/…
AaruniAggarwal Jun 7, 2025
cf02f9b
Add FlexAttention to V1 (#16078)
drisspg Jun 7, 2025
122cdca
[Misc] refactor context extension (#19246)
reidliu41 Jun 7, 2025
d2f0e7e
[CI/Build] Improve Llama GGUF test robustness (#19287)
Isotr0py Jun 7, 2025
4e4f63a
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py…
draftbk Jun 7, 2025
88be823
[AMD] Update compatible packaging version (#19309)
pramenku Jun 7, 2025
2d8476e
[BugFix][V1] Fix memory profiling bug (#18974)
ProExpertProg Jun 7, 2025
d77f7fb
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an i…
chaunceyjiang Jun 8, 2025
eaa2e51
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
zou3519 Jun 8, 2025
3d64d36
[Misc] Change tests/compile to use VLLM_V1 by default (#19302)
zou3519 Jun 8, 2025
989dcee
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315)
Xu-Wenqing Jun 8, 2025
b9a1791
[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection …
Akashcodes732 Jun 8, 2025
c123bc3
[Quantization] Add compressed-tensors NVFP4 support (#18312)
dsikka Jun 8, 2025
cda10fa
[Multi Modal] Add an env var for message queue max chunk bytes (#19242)
jennyyyyzhen Jun 8, 2025
2ffb9b6
[Bugfix] model_max_length should consider max_model_len in tokenizer_…
noooop Jun 8, 2025
e31ae3d
[Deprecation] Remove `inputs` arg fallback in Engine classes (#18799)
DarkLight1337 Jun 9, 2025
e1c4380
[Misc] Add documentation update reminder to PR template (#19289)
Isotr0py Jun 9, 2025
8335667
[Frontend] Remove unreachable code from llm.py (#19288)
KsuParkhamchuk Jun 9, 2025
3a4d417
[Misc] Cleanup compilation tests (#19343)
zou3519 Jun 9, 2025
12e5829
[doc] improve ci doc (#19307)
reidliu41 Jun 9, 2025
0eca5ea
[Doc] Fix description in the Automatic Prefix Caching design doc (#19…
cr7258 Jun 9, 2025
95a6568
[CI/Build] Fix LoRA test (#19350)
jeejeelee Jun 9, 2025
59abbd8
[Fix] Allow kernel compilation for CUDA capability 8.7 (#19328)
conroy-cheers Jun 9, 2025
01810f9
[CI] Introduce rules for llama auto-label (#19323)
houseroad Jun 9, 2025
c57c941
[Docs] Fix a bullet list in usage/security.md (#19358)
windsonsea Jun 9, 2025
770e5dc
[full_graph] Fix query_start_loc padding (#19321)
yinghai Jun 9, 2025
b808919
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Isotr0py Jun 9, 2025
5cf2dae
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#1…
varun-sundar-rabindranath Jun 9, 2025
c1c7dbb
[Bugfix][Core] Prevent token lengths exceeding `max_model_len` in V0 …
22quinn Jun 9, 2025
ebb2f38
[Quantization] Bump compressed-tensors version (#19295)
kylesayrs Jun 9, 2025
31f58be
[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472)
liusiqian-tal Jun 9, 2025
7d44c46
[TPU]Fix KV cache sharing tests (#19371)
lsy323 Jun 9, 2025
8058c91
[HOT-FIX] Add `kv_sharing_target_layer_name` argument to cutlass_mla …
pavanimajety Jun 9, 2025
3a7cd62
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configura…
lsy323 Jun 9, 2025
cc867be
[V1] Reuse V0's memory_profiling util for gpu worker memory profiling…
yeqcharlotte Jun 10, 2025
4589b94
[Bugfix] Fix benchmark_moe.py (#19016)
gty111 Jun 10, 2025
9af6d22
Use xla flag to improve the quantized model performance (#19303)
vanbasten23 Jun 10, 2025
c016047
Fix docs/mkdocs/hooks/remove_announcement.py (#19382)
hmellor Jun 10, 2025
6cd4ae8
[Frontend] Add tqdm_leave_pbar to control progress bar visibility (#1…
reidliu41 Jun 10, 2025
646d62f
[Core] Use tuple for kv cache group block ids (#19175)
njhill Jun 10, 2025
1efef71
[Bugfix] Fix modelscope token passed in (#19389)
Potabk Jun 10, 2025
319cb1e
[Core] Batch multi modal input using pinned memory (#19169)
lgeiger Jun 10, 2025
a3f66e7
Add security warning to bug report template (#19365)
russellb Jun 10, 2025
6b1391c
[Misc] refactor neuron_multimodal and profiling (#19397)
reidliu41 Jun 10, 2025
32b3946
Add clear documentation around the impact of debugging flag (#19369)
annapendleton Jun 10, 2025
9368cc9
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA nod…
louie-tsai Jun 10, 2025
5f1ac1e
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
Isotr0py Jun 10, 2025
467bef1
[BugFix][FlashInfer] Fix attention backend interface mismatch with un…
YUNQIUGUO Jun 10, 2025
e424884
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411)
bigPYJ1151 Jun 10, 2025
64a9af5
Simplify ep kernels installation (#19412)
youkaichao Jun 10, 2025
b6553be
[Misc] Slight improvement of the BNB (#19418)
jeejeelee Jun 10, 2025
da9b523
[Docs] Note that alternative structured output backends are supported…
russellb Jun 10, 2025
5241ca5
[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#…
gshtras Jun 10, 2025
33f8dba
[Model] use AutoWeightsLoader for commandr (#19399)
py-andy-c Jun 10, 2025
22c3c0a
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#…
Xu-Wenqing Jun 10, 2025
77f0d46
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
zou3519 Jun 10, 2025
3952731
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
noooop Jun 11, 2025
a45b979
[BugFix] Fix docker build cpu-dev image error (#19394)
2niuhe Jun 11, 2025
2b1e211
Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19…
houseroad Jun 11, 2025
1e473b3
[CI] Disable failing GGUF model test (#19454)
mgoin Jun 11, 2025
96ada38
[Misc] Remove unused `MultiModalHasher.hash_prompt_mm_data` (#19422)
lgeiger Jun 11, 2025
2d40665
Add fused MOE config for Qwen3 30B A3B on B200 (#19455)
0xjunhao Jun 11, 2025
7c644ab
Fix Typo in Documentation and Function Name (#19442)
leopardracer Jun 11, 2025
5039ec2
[ROCm] Add rules to automatically label ROCm related PRs (#19405)
houseroad Jun 11, 2025
b8e809a
[Kernel] Support deep_gemm for linear methods (#19085)
artetaout Jun 11, 2025
68b4a26
[Doc] Update V1 User Guide for Hardware and Models (#19474)
DarkLight1337 Jun 11, 2025
a5115f4
[Doc] Fix quantization link titles (#19478)
DarkLight1337 Jun 11, 2025
29a38f0
[Doc] Support "important" and "announcement" admonitions (#19479)
DarkLight1337 Jun 11, 2025
871d6b7
[Misc] Reduce warning message introduced in env_override (#19476)
houseroad Jun 11, 2025
a2142f0
Support non-string values in JSON keys from CLI (#19471)
DarkLight1337 Jun 11, 2025
7484e1f
Add cache to cuda get_device_capability (#19436)
mgoin Jun 11, 2025
3c8694e
Fix some typo (#19475)
Ximingwang-09 Jun 11, 2025
5c8d34a
Support no privileged mode on CPU for docker and kubernetes deploymen…
louie-tsai Jun 11, 2025
943ffa5
[Bugfix] Update the example code, make it work with the latest lmcach…
runzhen Jun 11, 2025
497a91e
[CI] Update FlashInfer to 0.2.6.post1 (#19297)
mgoin Jun 11, 2025
89b0f84
[doc] fix "Other AI accelerators" getting started page (#19457)
davidxia Jun 11, 2025
04a5561
[Misc] Fix misleading ROCm warning (#19486)
jeejeelee Jun 11, 2025
b2d9be6
[Docs] Remove WIP features in V1 guide (#19498)
WoosukKwon Jun 11, 2025
29fa5ca
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19…
bnellnm Jun 11, 2025
c7ea0b5
[AMD] [Quantization] Add override flag for attention dtype instead of…
rasmith Jun 11, 2025
97a9465
[UX] Add Feedback During CUDAGraph Capture (#19501)
robertgshaw2-redhat Jun 11, 2025
42f52cc
[CI/Build] Fix torch nightly CI dependencies (#19505)
zou3519 Jun 11, 2025
2f1c19b
[CI] change spell checker from codespell to typos (#18711)
andyxning Jun 12, 2025
e5d35d6
[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via laz…
varun-sundar-rabindranath Jun 12, 2025
3f6341b
Add Triton Fused MoE kernel config for E=16 on B200 (#19518)
b8zhong Jun 12, 2025
7e3e74c
[Frontend] Improve error message in tool_choice validation (#19239)
22quinn Jun 12, 2025
d5bdf89
[BugFix] Work-around incremental detokenization edge case error (#19449)
njhill Jun 12, 2025
1b0b065
[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#1…
strutive07 Jun 12, 2025
2e090bd
[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm …
rasmith Jun 12, 2025
dff6800
Fix typo (#19525)
2niuhe Jun 12, 2025
4f6c42f
[Security] Prevent new imports of (cloud)pickle (#18018)
russellb Jun 12, 2025
af09b3f
[Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492)
mgoin Jun 12, 2025
c9280e6
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
jmswen Jun 12, 2025
73e2e01
[Quantization] Improve AWQ logic (#19431)
jeejeelee Jun 12, 2025
c742438
[Doc] Add V1 column to supported models list (#19523)
DarkLight1337 Jun 12, 2025
1129e2b
[V1][NixlConnector] Drop `num_blocks` check (#19532)
NickLucche Jun 12, 2025
b6efafd
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
yewentao256 Jun 12, 2025
96846bb
Fix TorchAOConfig skip layers (#19265)
mobicham Jun 12, 2025
f98548b
[torch.compile][ROCm] Fuse quantization onto attention using a torch.…
ProExpertProg Jun 12, 2025
4b25ab1
[doc] Make top navigation sticky (#19540)
reidliu41 Jun 12, 2025
017ef64
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to …
ekagra-ranjan Jun 12, 2025
9d880f5
[Misc] Turn MOE_DP_CHUNK_SIZE into an env var (#19506)
varun-sundar-rabindranath Jun 12, 2025
a3319f4
[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quan…
mgoin Jun 12, 2025
dba68f9
[Doc] Unify structured outputs examples (#18196)
aarnphm Jun 12, 2025
c57bb19
[V1] Resolve failed concurrent structured output requests (#19565)
russellb Jun 12, 2025
e6aab5d
Revert "[Build/CI] Add tracing deps to vllm container image (#15224)"…
kouroshHakha Jun 13, 2025
e3b1266
[BugFix] : Fix Batched DeepGemm Experts (#19515)
varun-sundar-rabindranath Jun 13, 2025
c68698b
[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570)
zixi-qi Jun 13, 2025
7b3c9ff
[Doc] uses absolute links for structured outputs (#19582)
aarnphm Jun 13, 2025
c707cfc
[doc] fix incorrect link (#19586)
reidliu41 Jun 13, 2025
bb4a0de
[Misc] Correct broken docs link (#19553)
Zerohertz Jun 13, 2025
6458721
[CPU] Refine default config for the CPU backend (#19539)
bigPYJ1151 Jun 13, 2025
ace5cda
[Fix] bump mistral common to support magistral (#19533)
princepride Jun 13, 2025
cefdb99
[Fix] The zip function in Python 3.9 does not have the strict argumen…
princepride Jun 13, 2025
ce688ad
use base version for version comparison (#19587)
BoyuanFeng Jun 13, 2025
d70bc7c
[torch.compile] reorganize the cache directory to support compiling m…
youkaichao Jun 13, 2025
7e8d97d
[BugFix] Honor `enable_caching` in connector-delayed kvcache load cas…
njhill Jun 13, 2025
a24cb91
[Model] Fix minimax model cache & lm_head precision (#19592)
qscqesze Jun 13, 2025
ce9dc02
[Refactor] Remove unused variables in `moe_permute_unpermute_kernel.i…
yewentao256 Jun 13, 2025
1015296
[doc][mkdocs] fix the duplicate Supported features sections in GPU d…
reidliu41 Jun 13, 2025
3597b06
[CUDA] Enable full cudagraph for FlashMLA (#18581)
ProExpertProg Jun 13, 2025
0f08745
[Doc] Add troubleshooting section to k8s deployment (#19377)
annapendleton Jun 13, 2025
aafbbd9
[torch.compile] Use custom ops when use_inductor=False (#19618)
WoosukKwon Jun 13, 2025
d65668b
Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
Concurrensee Jun 14, 2025
bd517eb
[BugFix] Fix DP Coordinator incorrect debug log message (#19624)
njhill Jun 14, 2025
d1e34cc
[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific…
sahelib25 Jun 14, 2025
06be858
[Bugfix] Fix the speculative decoding test by setting the target dtyp…
houseroad Jun 14, 2025
6fa718a
[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593)
reidliu41 Jun 14, 2025
2db9044
[Bugfix] Fix auto dtype casting for BatchFeature (#19316)
Isotr0py Jun 14, 2025
294fc1e
[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500)
jiahanc Jun 14, 2025
bc956b3
Only build CUTLASS MoE kernels on Hopper (#19648)
huydhn Jun 14, 2025
861a0a0
[Bugfix] Don't attempt to use triton if no driver is active (#19561)
kzawora-intel Jun 14, 2025
0850001
[Fix] Convert kv_transfer_config from dict to KVTransferConfig (#19262)
maobaolong Jun 14, 2025
e13945f
[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566)
ilmarkov Jun 15, 2025
ee1531b
[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy…
houseroad Jun 15, 2025
0b73736
[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads`…
22quinn Jun 15, 2025
3d330c4
[Benchmark] Refactor benchmark script for fp8 & int8 (#19627)
yewentao256 Jun 15, 2025
055915e
Enable prefix caching with full cuda graphs (#19617)
WoosukKwon Jun 15, 2025
91b2c17
[CI/Build] Fix torch nightly CI dependencies part 2 (#19589)
zou3519 Jun 15, 2025
a5e7242
[Misc] Remove duplicate multiproc method setting for CPU platform (#1…
Isotr0py Jun 16, 2025
c6703d1
[MISC] Remove unused variableds in C++ (#19609)
houseroad Jun 16, 2025
92183b4
[Bugfix][Core] Prefix caching causes incorrect outputs due to outdate…
quanliu1991 Jun 16, 2025
367871a
[Misc][Frontend] passthrough `bad_words` (#19564)
f14-bertolotti Jun 16, 2025
b692e9c
[Misc] Fix skipped max-model-len validation when deriving max model l…
yeqcharlotte Jun 16, 2025
a77aea5
[TPU] support attention head dim smaller than 128 (#19620)
yaochengji Jun 16, 2025
26bc46e
[MISC] typo fix (#19672)
andyxning Jun 16, 2025
f40f763
[CI] Add mteb testing for rerank models (#19344)
noooop Jun 16, 2025
8d12070
[Docs] Move multiproc doc to v1 dir (#19651)
russellb Jun 16, 2025
dec66d2
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
SzymonOzog Jun 16, 2025
ee35e96
[BugFix] Don't catch BaseException when dumping execute_model errors …
njhill Jun 16, 2025
3e75069
[DOC] Add reasoning capability to vLLM streamlit code (#19557)
Navanit-git Jun 16, 2025
4d54240
[Feature]:Allow for Granite MoE Hybrid models with _only_ shared expe…
shawntan Jun 16, 2025
1173804
[Bugfix] Fix TP inference for Flex attention backend (#19657)
Isotr0py Jun 16, 2025
c3fec47
[MISC] bump huggingface_hub pkg to 0.33.0 (#19547)
andyxning Jun 16, 2025
836d4ce
[Bugfix] fix missing 'finish_reason': null in streaming chat (#19662)
chaunceyjiang Jun 16, 2025
5e5baa9
[Kernels] Use empty for modular MoE workspaces (#19667)
bnellnm Jun 16, 2025
387bdf0
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture wit…
qscqesze Jun 16, 2025
90f9c2e
[V1] Change return type on get_multimodal_embeddings() (#19446)
russellb Jun 16, 2025
eb059d5
Merge branch 'main' of github.com:vllm-project/vllm
amogkam Jun 16, 2025
27f3870
Merge branch 'main' of github.com:character-tech/vllm
amogkam Jun 16, 2025
b2337e6
Merge branch 'main' of github.com:character-tech/vllm
amogkam Jun 16, 2025
3c75f08
fix
amogkam Jun 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion vllm/engine/multiprocessing/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,6 @@ def generate(
trace_headers, prompt_adapter_request,
priority)

@overload
def encode(
self,
prompt: PromptType,
Expand Down
Loading