Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MFM-2025-02-21] Merge main to llama fp8, DeepSeekV3 and PTPC-FP8 #445

Merged
merged 1,120 commits into from
Feb 25, 2025
Merged
Changes from 1 commit
Commits
Show all changes
1120 commits
Select commit Hold shift + click to select a range
64862d1
[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spi…
maleksan85 Feb 5, 2025
249824c
Refactor `Linear` handling in `TransformersModel` (#12727)
hmellor Feb 5, 2025
98fd089
[VLM] Add MLA with pure RoPE support for deepseek-vl2 models (#12729)
Isotr0py Feb 5, 2025
686006a
[Misc] Bump the compressed-tensors version (#12736)
dsikka Feb 5, 2025
7ff7a63
[Model][Quant] Fix GLM, Fix fused module mappings for quantization (#…
kylesayrs Feb 5, 2025
58b218d
[Doc] Update PR Reminder with link to Developer Slack (#12748)
mgoin Feb 5, 2025
fcf2e3d
[Bugfix] Fix OpenVINO model runner (#12750)
hmellor Feb 5, 2025
3d09e59
[V1][Misc] Shorten `FinishReason` enum and use constant strings (#12760)
njhill Feb 5, 2025
c53dc46
[Doc] Remove performance warning for auto_awq.md (#12743)
mgoin Feb 5, 2025
022bcc7
[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_f…
Akashcodes732 Feb 5, 2025
bc1bdec
[core][distributed] exact ray placement control (#12732)
youkaichao Feb 5, 2025
f65ecc9
The code assumes WARP_SIZE to be equal to 32, which is not the case o…
gshtras Feb 5, 2025
4c3aac5
Merging PR #12536
heheda12345 Feb 5, 2025
af8486d
[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)
SanjuCSudhakaran Feb 5, 2025
3b2005e
Add: Support for Sparse24Bitmask Compressed Models
rahul-tuli Feb 5, 2025
a4ce74c
[VLM] Use shared field to pass token ids to model
DarkLight1337 Feb 5, 2025
9a5b155
[Docs] Drop duplicate [source] links
russellb Feb 5, 2025
bf3b79e
[VLM] Qwen2.5-VL
ywang96 Feb 5, 2025
75404d0
[VLM] Update compatibility with transformers 4.49
DarkLight1337 Feb 6, 2025
5b19b93
[ROCm][Kernel] Using the correct warp_size value
gshtras Feb 6, 2025
76abd0c
[Bugfix] Better FP8 supported defaults
LucasWilkinson Feb 6, 2025
9cdea30
[Misc][Easy] Remove the space from the file name
houseroad Feb 6, 2025
d88506d
[Model] LoRA Support for Ultravox model (#11253)
thedebugger Feb 6, 2025
56534cd
[Bugfix] Fix the test_ultravox.py's license (#12806)
houseroad Feb 6, 2025
1a6fcad
Improve `TransformersModel` UX (#12785)
hmellor Feb 6, 2025
449d1bc
[Misc] Remove duplicated DeepSeek V2/V3 model definition (#12793)
mgoin Feb 6, 2025
0408efc
[Misc] Improve error message for incorrect pynvml (#12809)
youkaichao Feb 6, 2025
7ca9934
[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757)
dsikka Feb 6, 2025
cefd56e
[Docs] Add Google Cloud Slides (#12814)
simon-mo Feb 6, 2025
c786e75
[Attention] Use FA3 for MLA on Hopper (#12807)
LucasWilkinson Feb 6, 2025
e152f29
[misc] Reduce number of config file requests to HuggingFace (#12797)
khluu Feb 6, 2025
13434bd
Update README.md 20250205_aiter (#407)
arakowsk-amd Feb 6, 2025
1e57b1e
[Misc] Remove unnecessary decode call (#12833)
DarkLight1337 Feb 6, 2025
85ac82d
[Kernel] Make rotary_embedding ops more flexible with input shape (#1…
Isotr0py Feb 6, 2025
09b95e3
[torch.compile] PyTorch 2.6 and nightly compatibility (#12393)
youkaichao Feb 6, 2025
afe74f7
[Doc] double quote cmake package in build.inc.md (#12840)
jitseklomp Feb 6, 2025
8108ac8
[Bugfix] Fix unsupported FA version check for Turing GPU (#12828)
Isotr0py Feb 6, 2025
467a96a
[V1] LoRA Support (#10957)
varun-sundar-rabindranath Feb 6, 2025
aff4045
Add Bamba Model (#10909)
fabianlim Feb 6, 2025
741429a
[MISC] Check space in the file names in the pre commit checks (#12804)
houseroad Feb 6, 2025
b260782
[misc] Revert # 12833 (#12857)
khluu Feb 7, 2025
ef533d2
[Bugfix] FA2 illegal memory access (#12848)
LucasWilkinson Feb 7, 2025
433c4a4
Make vllm compatible with verl (#12824)
ZSL98 Feb 7, 2025
aa375dc
[Bugfix] Missing quant_config in deepseek embedding layer (#12836)
SzymonOzog Feb 7, 2025
6e1fc61
Prevent unecessary requests to huggingface hub (#12837)
maxdebayser Feb 7, 2025
1918aa1
[MISC][EASY] Break check file names into entry and args in the pre-co…
houseroad Feb 7, 2025
ce26b16
[Misc] Remove unnecessary detokenization in multimodal processing (#1…
DarkLight1337 Feb 7, 2025
538fab9
PR #12718 (#12718)
garg-amit Feb 7, 2025
0630d45
[V1] Logprobs and prompt logprobs support (#9880)
afeldman-nm Feb 7, 2025
eaa92d4
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activa…
tjtanaa Feb 7, 2025
3f610f0
fix rocm get_device name for moe configs (#359)
divakar-amd Feb 7, 2025
932c6b7
[V1] LM Eval With Streaming Integration Tests (#11590)
robertgshaw2-redhat Feb 7, 2025
45cbc49
[Bugfix] Fix disagg hang caused by the prefill and decode communicati…
houseroad Feb 8, 2025
b21f0f9
[V1][Minor] Remove outdated comment (#12928)
WoosukKwon Feb 8, 2025
3243158
[V1] Move KV block hashes from Request to KVCacheManager (#12922)
WoosukKwon Feb 8, 2025
306923d
[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mappin…
jeejeelee Feb 8, 2025
cc01223
[Misc] Fix typo in the example file (#12896)
DK-DARKmatter Feb 8, 2025
d01f66b
[Bugfix] Fix multi-round chat error when mistral tokenizer is used (#…
zifeitong Feb 8, 2025
91dd8f7
[bugfix] respect distributed_executor_backend in world_size=1 (#12934)
youkaichao Feb 8, 2025
e31498b
[Misc] Add offline test for disaggregated prefill (#12418)
Shaoting-Feng Feb 8, 2025
4ea48fb
[V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943)
WoosukKwon Feb 8, 2025
407b553
[Build] Make pypi install work on CPU platform (#12874)
wangxiyuan Feb 8, 2025
2880e21
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel…
SanjuCSudhakaran Feb 8, 2025
7e18376
[misc] Add LoRA to benchmark_serving (#12898)
varun-sundar-rabindranath Feb 8, 2025
011e612
[Misc] Log time consumption on weight downloading (#12926)
waltforme Feb 8, 2025
c45d398
[CI] Resolve transformers-neuronx version conflict (#12925)
liangfu Feb 8, 2025
256a2d2
[Doc] Correct HF repository for TeleChat2 models (#12949)
waltforme Feb 8, 2025
4c8dd12
[Misc] Add qwen2.5-vl BNB support (#12944)
Isotr0py Feb 8, 2025
8a69e0e
[CI/Build] Auto-fix Markdown files (#12941)
DarkLight1337 Feb 8, 2025
913df14
[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU …
ShangmingCai Feb 8, 2025
fe743b7
[bugfix] fix early import of flash attention (#12959)
youkaichao Feb 8, 2025
86222a3
[VLM] Merged multi-modal processor for GLM4V (#12449)
jeejeelee Feb 8, 2025
870c374
[V1][Minor] Remove outdated comment (#12968)
WoosukKwon Feb 8, 2025
d366ccc
[RFC] [Mistral] FP8 format (#10130)
patrickvonplaten Feb 8, 2025
24700c3
[V1] Cache `uses_mrope` in GPUModelRunner (#12969)
WoosukKwon Feb 8, 2025
cf797aa
[core] port pynvml into vllm codebase (#12963)
youkaichao Feb 9, 2025
29f1d47
[MISC] Always import version library first in the vllm package (#12979)
houseroad Feb 9, 2025
59fff4a
[core] improve error handling when wake up from sleep mode (#12981)
youkaichao Feb 10, 2025
aa0ca5e
[core][rlhf] add colocate example for RLHF (#12984)
youkaichao Feb 10, 2025
67c4637
[V1] Use msgpack for core request serialization (#12918)
njhill Feb 10, 2025
44607e0
Check if selected backend is None in get_attn_backend_cls() (#12975)
terrytangyuan Feb 10, 2025
b2496bb
[core] fix sleep mode and pytorch checkpoint compatibility (#13001)
youkaichao Feb 10, 2025
2431371
[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
terrytangyuan Feb 10, 2025
fde7126
[misc] Add retries with exponential backoff for HF file existence che…
khluu Feb 10, 2025
51f0b5f
[Bugfix] Clean up and fix multi-modal processors (#13012)
DarkLight1337 Feb 10, 2025
2ae8890
Fix seed parameter behavior in vLLM (#13007)
SmartManoj Feb 10, 2025
29499bb
Fixing the output formatting (#414)
gshtras Feb 10, 2025
08b2d84
[Model] Ultravox Model: Support v0.5 Release (#12912)
farzadab Feb 10, 2025
6a0deb7
Merge remote-tracking branch 'upstream/main'
gshtras Feb 10, 2025
91e8767
[misc] Fix setup.py condition to avoid AMD from being mistaken with C…
khluu Feb 11, 2025
2ff4857
[V1][Minor] Move scheduler outputs to a separate file (#13062)
WoosukKwon Feb 11, 2025
2c0f582
[Docs] Annouce Meta Meetup (#13065)
simon-mo Feb 11, 2025
cb080f3
[Bugfix] Support missing tool parameters in mistral tokenizer (#12884)
fgreinacher Feb 11, 2025
58047c6
[Benchmark] Add BurstGPT to benchmark_serving (#13063)
WoosukKwon Feb 11, 2025
c320ca8
[Core] Don't do platform detection at import time (#12933)
russellb Feb 11, 2025
78a141d
[Misc] LoRA - Refactor Punica ops tests (#12970)
varun-sundar-rabindranath Feb 11, 2025
fc6485d
[Bugfix]: Reasoning output bug according to the chat template change …
gaocegege Feb 11, 2025
41c5dd4
[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592)
comaniac Feb 11, 2025
9cf4759
[executor] init `local_rank` as device index (#13027)
MengqingCao Feb 11, 2025
7539bbc
[ROCm] Using a more precise memory profiling (#12624)
gshtras Feb 11, 2025
da31719
[Build] Fix cuda link target of cumem_allocator in CPU env (#12863)
guoyuhong Feb 11, 2025
2e3b969
[Platform] add pre_register_and_update function (#12432)
wangxiyuan Feb 11, 2025
110f59a
[Bugfix] fix flaky test (#13089)
SmartManoj Feb 11, 2025
75e6e14
[V1][Metrics] Add several request timing histograms (#12644)
markmc Feb 11, 2025
ad97763
Set `torch_dtype` in `TransformersModel` (#13088)
hmellor Feb 11, 2025
bf3e052
[Misc] Fix typo at comments at metrics.py (#13024)
je1lee Feb 11, 2025
21f5d50
[Bugfix] Do not use resource module on Windows (#12858) (#13029)
MoonRide303 Feb 11, 2025
6c4dbe2
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)
HollowMan6 Feb 11, 2025
2b25b7d
Fix initializing GGUF weights for ColumnParallelLinear when using ten…
SzymonOzog Feb 11, 2025
e2dc610
Add tuned moe config for qwen1.5_moe_A2.7B (#398)
sky0530 Feb 11, 2025
565c1ef
[CI/Build][Bugfix] Fix CPU backend default threads num (#13077)
bigPYJ1151 Feb 11, 2025
c536ed5
Removing non-existent parameter
gshtras Feb 11, 2025
869a461
Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…
gshtras Feb 11, 2025
deb6c1c
[Doc] Improve OpenVINO installation doc (#13102)
hmellor Feb 11, 2025
14ecab5
[Bugfix] Guided decoding falls back to outlines when fails to import …
terrytangyuan Feb 11, 2025
72c2b68
[Misc] Move pre-commit suggestion back to the end (#13114)
russellb Feb 11, 2025
3ee696a
[RFC][vllm-API] Support tokenizer registry for customized tokenizer i…
youngkent Feb 12, 2025
974dfd4
[Model] IBM/NASA Prithvi Geospatial model (#12830)
christian-pinto Feb 12, 2025
842b0fd
[ci] Add more source file dependencies for some tests (#13123)
khluu Feb 12, 2025
e92694b
[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAtt…
lingfanyu Feb 12, 2025
a0597c6
Bump helm/kind-action from 1.10.0 to 1.12.0 (#11612)
dependabot[bot] Feb 12, 2025
dd3b4a0
Bump actions/stale from 9.0.0 to 9.1.0 (#12462)
dependabot[bot] Feb 12, 2025
0c7d9ef
Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#12463)
dependabot[bot] Feb 12, 2025
d59def4
Bump actions/setup-python from 5.3.0 to 5.4.0 (#12672)
dependabot[bot] Feb 12, 2025
7c4033a
Further reduce the HTTP calls to huggingface.co (#13107)
maxdebayser Feb 12, 2025
f1042e8
[Misc] AMD Build Improvements (#12923)
842974287 Feb 12, 2025
f4d97e4
[Bug] [V1] Try fetching stop_reason from EngineOutput before checking…
bnellnm Feb 12, 2025
985b4a2
[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)
DarkLight1337 Feb 12, 2025
314cfad
[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mi…
rafvasq Feb 12, 2025
82cabf5
[Misc] Delete unused LoRA modules (#13151)
jeejeelee Feb 12, 2025
042c341
Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#1…
houseroad Feb 12, 2025
2c2b560
[CI/Build] Use mypy matcher for pre-commit CI job (#13162)
russellb Feb 12, 2025
9917cda
Update Benchmark Profiling Scripts (#417)
AdrianAbeyta Feb 12, 2025
36a0863
[CORE] [QUANT] Support for GPTQModel's `dynamic` quantization per mod…
Qubitium Feb 12, 2025
09972e7
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularit…
mgoin Feb 12, 2025
b06c154
DS V2V3 fix for same file
Concurrensee Feb 12, 2025
c9a338f
Merge remote-tracking branch 'origin/DS_V2V3_FP16_fix' into upstream_…
gshtras Feb 12, 2025
0ad02c3
Merge branch 'main' into upstream_merge_25_02_10
gshtras Feb 12, 2025
a657220
Lint
gshtras Feb 12, 2025
46476bd
updating manfiest (#416)
arakowsk-amd Feb 12, 2025
42e17aa
Merge branch 'main' into upstream_merge_25_02_10
gshtras Feb 12, 2025
14b7899
[CI] Fix failing FP8 cpu offload test (#13170)
mgoin Feb 12, 2025
d92dea8
Merge pull request #418 from ROCm/upstream_merge_25_02_10
gshtras Feb 12, 2025
cbbbecb
Aiter base (#419)
gshtras Feb 12, 2025
4c0d93f
[V1][Bugfix] Copy encoder input ids to fix set iteration issue during…
andoorve Feb 12, 2025
8eafe5e
[CI/Build] Ignore ruff warning up007 (#13182)
russellb Feb 13, 2025
9f9704d
[perf-benchmark] cleanup unused Docker images and volumes in H100 ben…
khluu Feb 13, 2025
4fc5c23
[NVIDIA] Support nvfp4 quantization (#12784)
kaixih Feb 13, 2025
d88c866
[Bugfix][Example] Fix GCed profiling server for TPU (#12792)
mgoin Feb 13, 2025
bc55d13
[VLM] Implement merged multimodal processor for Mllama (#11427)
Isotr0py Feb 13, 2025
009439c
Simplify logic of locating CUDART so file path (#13203)
houseroad Feb 13, 2025
60c68df
[Build] Automatically use the wheel of the base commit with Python-on…
comaniac Feb 13, 2025
04f50ad
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong fie…
LikeSundayLikeRain Feb 13, 2025
d46d490
[Frontend] Move CLI code into vllm.cmd package (#12971)
russellb Feb 13, 2025
cb944d5
Allow Unsloth Dynamic 4bit BnB quants to work (#12974)
danielhanchen Feb 13, 2025
0ccd876
[CI/Build] Allow ruff to auto-fix some issues (#13180)
russellb Feb 13, 2025
9605c12
[V1][core] Implement pipeline parallel on Ray (#12996)
ruisearch42 Feb 13, 2025
fa253f1
[VLM] Remove input processor from clip and siglip (#13165)
Isotr0py Feb 13, 2025
578087e
[Frontend] Pass pre-created socket to uvicorn (#13113)
russellb Feb 13, 2025
fdcf64d
[V1] Clarify input processing and multimodal feature caching logic (#…
ywang96 Feb 13, 2025
c9d3ecf
[VLM] Merged multi-modal processor for Molmo (#12966)
DarkLight1337 Feb 13, 2025
2092a6f
[V1][Core] Add worker_base for v1 worker (#12816)
AoyuQC Feb 13, 2025
02ed8a1
[Misc] Qwen2.5-VL Optimization (#13155)
wulipc Feb 13, 2025
1bc3b5e
[VLM] Separate text-only and vision variants of the same model archit…
DarkLight1337 Feb 13, 2025
37dfa60
[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193)
vaibhavjainwiz Feb 13, 2025
d84cef7
[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909)
NickLucche Feb 13, 2025
5f8d758
Initial attempt to adjust codeowners to the ROCm fork (#420)
gshtras Feb 13, 2025
aa63571
Applying weight padding to deepseek (#421)
gshtras Feb 13, 2025
bffddd9
Add label if pre-commit passes (#12527)
hmellor Feb 13, 2025
66ee774
[Model] DeepSeek Tunings (#423)
rasmith Feb 13, 2025
2344192
Optimize moe_align_block_size for deepseek_v3 (#12850)
mgoin Feb 13, 2025
c1e37bf
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
tlrmchlsmth Feb 14, 2025
e38be64
Revert "Add label if pre-commit passes" (#13242)
hmellor Feb 14, 2025
4108869
[ROCm] Avoid using the default stream on ROCm (#13238)
gshtras Feb 14, 2025
8c32b08
[Kernel] Fix awq error when n is not divisable by 128 (#13227)
jinzhen-lin Feb 14, 2025
dd5ede4
[V1] Consolidate MM cache size to vllm.envs (#13239)
ywang96 Feb 14, 2025
09545c0
[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250)
tlrmchlsmth Feb 14, 2025
0676782
[Bugfix][CI] Inherit codespell settings from pyproject.toml in the pr…
tlrmchlsmth Feb 14, 2025
84683fa
[Bugfix] Offline example of disaggregated prefill (#13214)
XiaobingSuper Feb 14, 2025
40932d7
[Misc] Remove redundant statements in scheduler.py (#13229)
WrRan Feb 14, 2025
f2b20fe
Consolidate Llama model usage in tests (#13094)
hmellor Feb 14, 2025
f0b2da7
Expand MLA to support most types of quantization (#13181)
mgoin Feb 14, 2025
cbc4012
[V1] LoRA - Enable Serving Usecase (#12883)
varun-sundar-rabindranath Feb 14, 2025
ba59b78
[ROCm][V1] Add intial ROCm support to V1 (#12790)
SageMoore Feb 14, 2025
b0ccfc5
[Bugfix][V1] GPUModelRunner._update_states should return True when th…
imkero Feb 14, 2025
45f90bc
[WIP] TPU V1 Support Refactored (#13049)
alexm-redhat Feb 14, 2025
185cc19
[Frontend] Optionally remove memory buffer used for uploading to URLs…
pooyadavoodi Feb 14, 2025
83481ce
[Bugfix] Fix missing parentheses (#13263)
xu-song Feb 14, 2025
556ef7f
[Misc] Log time consumption of sleep and wake-up (#13115)
waltforme Feb 14, 2025
4da1f66
[VLM] Keep track of whether prompt replacements have been applied (#1…
DarkLight1337 Feb 14, 2025
085b7b2
[V1] Simplify GPUModelRunner._update_states check (#13265)
njhill Feb 14, 2025
6224a9f
Support logit_bias in v1 Sampler (#13079)
houseroad Feb 14, 2025
7734e9a
[Core] choice-based structured output with xgrammar (#12632)
russellb Feb 14, 2025
c9e2d64
[Hardware][Gaudi][Bugfix] Fix error for guided decoding (#12317)
zhouyu5 Feb 14, 2025
2679970
Removing bad config (#425)
gshtras Feb 14, 2025
b96c11c
The order in the file is important. One needs to be explicitly be add…
gshtras Feb 14, 2025
5e5c8e0
[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many expe…
mgoin Feb 14, 2025
3bcb8c7
[Core] Reduce TTFT with concurrent partial prefills (#10235)
joerunde Feb 14, 2025
a12934d
[V1][Core] min_p sampling support (#13191)
AoyuQC Feb 14, 2025
e7eea5a
[V1][CI] Fix failed v1-test because of min_p (#13316)
WoosukKwon Feb 15, 2025
6a854c7
[V1][Sampler] Don't apply temp for greedy-only (#13311)
njhill Feb 15, 2025
0c73026
[V1][PP] Fix memory profiling in PP (#13315)
WoosukKwon Feb 15, 2025
c9f9d5b
[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't bu…
SageMoore Feb 15, 2025
579d7a6
[Bugfix][Docs] Fix offline Whisper (#13274)
NickLucche Feb 15, 2025
97a3d6d
[Bugfix] Massage MLA's usage of flash attn for RoCM (#13310)
tlrmchlsmth Feb 15, 2025
9076325
[BugFix] Don't scan entire cache dir when loading model (#13302)
njhill Feb 15, 2025
067fa22
[Bugfix]Fix search start_index of stop_checker (#13280)
xu-song Feb 15, 2025
7fdaaf4
[Bugfix] Fix qwen2.5-vl image processor (#13286)
Isotr0py Feb 15, 2025
2ad1bc7
[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288)
markmc Feb 15, 2025
ed0de3e
[AMD] [Model] DeepSeek tunings (#13199)
rasmith Feb 15, 2025
9206b3d
[V1][PP] Run engine busy loop with batch queue (#13064)
comaniac Feb 15, 2025
54ed913
[ci/build] update flashinfer (#13323)
youkaichao Feb 15, 2025
367cb8c
[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331)
DarkLight1337 Feb 15, 2025
80f63a3
[V1][Spec Decode] Ngram Spec Decode (#12193)
LiuXiaoxuanPKU Feb 16, 2025
12913d1
[Quant] Add `SupportsQuant` to phi3 and clip (#13104)
kylesayrs Feb 16, 2025
d3d547e
[Bugfix] Pin xgrammar to 0.1.11 (#13338)
mgoin Feb 16, 2025
ccaff7f
avoid calling hf_list_repo_files for local model
Isotr0py Feb 16, 2025
7cc05dd
annotation
Isotr0py Feb 16, 2025
dc0f7cc
[BugFix] Enhance test_pos_encoding to support execution on multi-devi…
wchen61 Feb 16, 2025
b7d3098
[V1] Update doc and examples for H2O-VL (#13349)
ywang96 Feb 16, 2025
124776e
[ci] skip failed tests for flashinfer (#13352)
youkaichao Feb 16, 2025
a0231b7
[platform] add base class for communicators (#13208)
youkaichao Feb 16, 2025
5d2965b
[Bugfix] Fix 2 Node and Spec Decode tests (#13341)
DarkLight1337 Feb 16, 2025
da833b0
[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325)
arkylin Feb 16, 2025
7b89386
[V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359)
WoosukKwon Feb 16, 2025
e18227b
[V1][PP] Cache Intermediate Tensors (#13353)
WoosukKwon Feb 16, 2025
d67cc21
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend ed…
Isotr0py Feb 16, 2025
69e1d23
[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362)
WoosukKwon Feb 16, 2025
2010f04
[V1][Misc] Avoid unnecessary log output (#13289)
jeejeelee Feb 17, 2025
46cdd59
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
ShangmingCai Feb 17, 2025
f857311
Fix spelling error in index.md (#13369)
yankooo Feb 17, 2025
4518683
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#…
huydhn Feb 17, 2025
238dfc8
[MISC] tiny fixes (#13378)
MengqingCao Feb 17, 2025
7b623fc
[VLM] Check required fields before initializing field config in `Dict…
DarkLight1337 Feb 17, 2025
1f69c4a
[Model] Support Mamba2 (Codestral Mamba) (#9292)
tlrmchlsmth Feb 17, 2025
30513d1
[Bugfix] fix xpu communicator (#13368)
yma11 Feb 17, 2025
ce77eb9
[Bugfix] Fix VLLM_USE_MODELSCOPE issue (#13384)
r4ntix Feb 17, 2025
ce342c7
Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…
gshtras Feb 17, 2025
669fc3f
Merge remote-tracking branch 'Isotr0py/local-lookup' into upstream_me…
gshtras Feb 17, 2025
365687d
Merge pull request #430 from ROCm/upstream_merge_25_02_17
gshtras Feb 17, 2025
4fd2f5b
Updating PR template to point people to the upstream repo. Updating c…
gshtras Feb 17, 2025
17b26bd
Enabling the ROCm-vLLM CI on MI250 machines (#432)
Alexei-V-Ivanov-AMD Feb 18, 2025
955ba64
Optimization for quantized gemm skinny sizes (#411)
amd-hhashemi Feb 19, 2025
b63a984
Restricting FP8 wvSplitk to MI300x (#439)
gshtras Feb 19, 2025
39456f3
Remove mi300a (#440)
gshtras Feb 19, 2025
5a6afcc
resolve diff for mixtral8x7B configs (#437)
divakar-amd Feb 20, 2025
ff13c7a
Torch version bump to fix tunable ops (#442)
gshtras Feb 20, 2025
32cc0fc
merge origin/main into merge-main-to-llama-fp8
vllmellm Feb 21, 2025
9dceba0
Merge remote-tracking branch 'origin/main' into merge-main-to-llama-fp8
vllmellm Feb 21, 2025
fd88257
bugfix: remove unused argument passed to the forward pass of Replica…
vllmellm Feb 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[executor] init local_rank as device index (vllm-project#13027)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
  • Loading branch information
MengqingCao authored Feb 11, 2025
commit 9cf4759493919580011f03812abf16387eafe18c
5 changes: 5 additions & 0 deletions vllm/executor/uniproc_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ def _init_executor(self) -> None:
distributed_init_method = get_distributed_init_method(
get_ip(), get_open_port())
local_rank = 0
# set local rank as the device index if specified
device_info = self.vllm_config.device_config.device.__str__().split(
":")
if len(device_info) > 1:
local_rank = int(device_info[1])
rank = 0
kwargs = dict(
vllm_config=self.vllm_config,
Expand Down