-
Notifications
You must be signed in to change notification settings - Fork 221
Insights: vllm-project/vllm-ascend
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.9.1rc1
published
Jun 22, 2025
56 Pull requests merged by 27 people
-
[BugFix]dbo support torchair graph in decode
#1420 merged
Jun 26, 2025 -
[v0.9.1][perf] add a switch for enabling NZ layout in weights and enable NZ for GMM.
#1409 merged
Jun 26, 2025 -
[CI/UT][BugFix] Fix sampling params
#1423 merged
Jun 26, 2025 -
Handle with_prefill_across_dp for multistream mla
#1322 merged
Jun 26, 2025 -
[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3
#1416 merged
Jun 26, 2025 -
[FOLLOWUP] fix name and format in accuracy test (#1288)
#1435 merged
Jun 25, 2025 -
[refactor] Refactoring forward_context and model_runner_v1
#1422 merged
Jun 25, 2025 -
[Perf] Use fused ops npu_top_k_top_p
#1308 merged
Jun 25, 2025 -
[BugFix]Remove not using patch_eagle.py for CI.
#1385 merged
Jun 25, 2025 -
adjusting the communication method in graph mode
#1194 merged
Jun 25, 2025 -
[Doc] Fix doc typo
#1424 merged
Jun 25, 2025 -
[Misc] Clean up uesless code for LLM initialize
#1373 merged
Jun 25, 2025 -
[CI]Update accuracy report test
#1288 merged
Jun 25, 2025 -
[Doc] Add sleep mode doc
#1295 merged
Jun 25, 2025 -
[Doc] Add patch doc
#1414 merged
Jun 25, 2025 -
[DP] Tiny fix of dp and update example
#1273 merged
Jun 25, 2025 -
[Refactor] Remove duplicate multimodal codes in ModelRunner
#1393 merged
Jun 25, 2025 -
[Doc] Update FAQ and add test guidance
#1360 merged
Jun 25, 2025 -
[Bugfix] Sync MRotaryEmbedding interface change to recover CI
#1399 merged
Jun 24, 2025 -
update Disaggregate prefill README
#1379 merged
Jun 24, 2025 -
remove environment variable VLLM_ENABLE_MC2
#1406 merged
Jun 24, 2025 -
[CI/UT] Fix disaggregated prefill ci
#1313 merged
Jun 24, 2025 -
[MISC] Remove useless patch
#1366 merged
Jun 24, 2025 -
support fused_moe_allgather_ep
#1335 merged
Jun 23, 2025 -
[TEST][DOC] Fix doctest and add system package installation
#1375 merged
Jun 23, 2025 -
[V0.9.1][Bugfix] Remove schedulre patch for disaggregated PD
#1361 merged
Jun 23, 2025 -
[Doc] Add reinstall instructions doc
#1370 merged
Jun 23, 2025 -
Modify installation.md for adding pip extra index of torch-npu
#1272 merged
Jun 23, 2025 -
[Doc] Add reinstall instructions doc
#1303 merged
Jun 23, 2025 -
[bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions
#1331 merged
Jun 23, 2025 -
[Bugfix] fix env variable in dbo
#1284 merged
Jun 23, 2025 -
[CI] Update guided decoding ut
#1312 merged
Jun 23, 2025 -
[CI/UT][bugfix] fix v0 spec decode
#1321 merged
Jun 23, 2025 -
update torch-npu to 2.5.1.post1.dev20250619
#1347 merged
Jun 23, 2025 -
[Doc] Change not to no in faqs.md
#1357 merged
Jun 23, 2025 -
[CI] Enable merge trigger unit test and accuracy test schedule job
#1345 merged
Jun 22, 2025 -
Cleanup ununsed doc
#1352 merged
Jun 22, 2025 -
Bump v0.9.1rc1 release
#1349 merged
Jun 22, 2025 -
[0.9.1][Bugfix] fix oom issue in mla and enable mla_pa for deepseek mla decode
#1311 merged
Jun 22, 2025 -
[0.9.1][BugFix]fix accuracy in dbo after refactor MOE
#1328 merged
Jun 21, 2025 -
update torch_npu in vllm-ascend to dev20250619
#1346 merged
Jun 21, 2025 -
[0.9.1][Feature] Support Qwen3 W4A8 quantization
#1275 merged
Jun 21, 2025 -
[v0.9.1-dev][CI/UT][bugfix]fix v0 spec decode
#1323 merged
Jun 21, 2025 -
[Platform] Add initial experimental support for Altlas 300I series
#1333 merged
Jun 21, 2025 -
[Test] Enable code cov for V1 and enable push trigger
#1164 merged
Jun 20, 2025 -
Support Pangu Pro MoE model
#1204 merged
Jun 20, 2025 -
[0.9.1]support deepseek w4a8 quantization
#1320 merged
Jun 20, 2025 -
[V1][eagle3] Support eagle3 proposer for v1
#1032 merged
Jun 20, 2025 -
[CI] Add codespell check for doc
#1314 merged
Jun 20, 2025 -
Add user guide for quantization
#1206 merged
Jun 20, 2025 -
[Fix] Fix the token-wise padding mechanism
#1300 merged
Jun 20, 2025 -
[0.9.1][Bugfix] fix dp error in dbo
#1291 merged
Jun 20, 2025 -
[UT] refactor test_expert_load_balancer and fix broken CI
#1293 merged
Jun 19, 2025 -
[0.9.1][Bugfix]: fix env variables for deepseek dbo
#1285 merged
Jun 19, 2025 -
Disaggregate prefill for kv cache register style (merge into v0.9.1-dev)
#1296 merged
Jun 19, 2025
41 Pull requests opened by 28 people
-
[WIP] Add MTP dummy_run and Adapt torchair graph mode
#1294 opened
Jun 19, 2025 -
[WIP] Fix block table shape
#1297 opened
Jun 19, 2025 -
[perf] optimize rope in deepseek
#1304 opened
Jun 19, 2025 -
[WIP] support fa3 quant
#1310 opened
Jun 20, 2025 -
[Bugfix] fix disaggregated prefill bug (merge into v0.9.1)
#1317 opened
Jun 20, 2025 -
[Doc] Update user guide for using lm-eval
#1325 opened
Jun 20, 2025 -
[Perf] Improve MLA multistream performance
#1353 opened
Jun 22, 2025 -
use npu_moe_gating_top_k_softmax
#1355 opened
Jun 22, 2025 -
[V1][ModelRunner] Support pooling model for v1 engine
#1359 opened
Jun 23, 2025 -
[Doc] Add qwen2-audio eager mode tutorial
#1371 opened
Jun 23, 2025 -
Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode
#1374 opened
Jun 23, 2025 -
[Bugfix] Fix sleep mode level 2
#1376 opened
Jun 23, 2025 -
[WIP]FC3
#1377 opened
Jun 23, 2025 -
[BugFix] Fix a bug of running chunked-prefill with torchair.
#1378 opened
Jun 23, 2025 -
[Bugfix] Fix memory-leak caused by dist._functional_collectives.reduce_scatter_tensor
#1380 opened
Jun 23, 2025 -
[Bugfix] Support Qwen3-MOE on aclgraph mode
#1381 opened
Jun 23, 2025 -
[ExternalDP][RL] Make external DP support on EP and ETP
#1384 opened
Jun 24, 2025 -
[Build] Add build info
#1386 opened
Jun 24, 2025 -
【Feature】Dynamic Expert Load Balance Zero-like-overhead
#1391 opened
Jun 24, 2025 -
[Doc] Add performance tuning doc to main
#1392 opened
Jun 24, 2025 -
[Doc] Add Qwen2.5-VL eager mode doc
#1394 opened
Jun 24, 2025 -
shared_experts+router_experts merge all_reduce(Improve TTOP 5ms)
#1395 opened
Jun 24, 2025 -
[v0.9.1][Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3
#1397 opened
Jun 24, 2025 -
V0.9.1 dev
#1402 opened
Jun 24, 2025 -
[Fix] Prevent Forced Stream Synchronization Triggered by Environment …
#1403 opened
Jun 24, 2025 -
[BugFix] Fix the problem that torchair doesn't support tp > 4.
#1404 opened
Jun 24, 2025 -
[V0.9.1] Prevent Forced Stream Synchronization Triggered by Environme…
#1405 opened
Jun 24, 2025 -
rm router logits Improve TTOP 3ms
#1407 opened
Jun 24, 2025 -
Add profiling multimodal model step and fix the OOM bug when profilin…
#1408 opened
Jun 24, 2025 -
[Perf] Add a switch to enable NZ layout in weights
#1410 opened
Jun 24, 2025 -
Br fix multi stream moe
#1417 opened
Jun 25, 2025 -
[Doc] Add multi-npu qwen3-MoE-32B Tutorials
#1419 opened
Jun 25, 2025 -
[WIP]support MERRouter
#1421 opened
Jun 25, 2025 -
[Doc] Add guidance on how to implement and register new models
#1426 opened
Jun 25, 2025 -
[V0.9.1] Optimize perf of Qwen3
#1431 opened
Jun 25, 2025 -
add chunk mc2 for prefill
#1434 opened
Jun 25, 2025 -
support w8a8c8
#1436 opened
Jun 25, 2025 -
Add toy_proxy_server chat/start_profile/stop_profile api
#1437 opened
Jun 25, 2025 -
Refactoring w4a8 and w8a8 and supporting w4a8 graph mode
#1438 opened
Jun 25, 2025 -
[Doc] Update accuracy reports for main
#1439 opened
Jun 25, 2025
16 Issues closed by 9 people
-
[Bug]: Failed to complete vllm benchmark after enable VLLM_USE_V1=1 due to gather_v3 error
#1038 closed
Jun 26, 2025 -
[Misc]: v1的图模式当3个并发时性能严重下降
#1254 closed
Jun 25, 2025 -
[Bug]: UnboundLocalError: local variable 'decode_hs_or_q_c' referenced before assignment
#1369 closed
Jun 25, 2025 -
[RFC]:
#1428 closed
Jun 25, 2025 -
[Installation]: How to deploy vllm-ascend in AutoDL's 910B instance
#1363 closed
Jun 25, 2025 -
No matching distribution found for torch-npu==2.5.1.post1.dev20250619
#1413 closed
Jun 25, 2025 -
[Bug]: DeepSeek (TP8/PP2) failed to run with ACL stream synchronize failed, error code:507048
#1193 closed
Jun 25, 2025 -
[Bug]: [WARNING:swift] Please install the package: `pip install "decord" -U`
#1388 closed
Jun 24, 2025 -
[Bug]: vLLM0.8.5.post1 + vLLM_Ascend0.8.5rc1 + TRL Qwen2.5 GRPO v1 fail back v0 engine
#1305 closed
Jun 24, 2025 -
[Bug]: ModuleNotFoundError: No module named 'qwen_vl_utils'
#1356 closed
Jun 23, 2025 -
[Bug]: stateless_init_process_group is invalid on NPUs
#942 closed
Jun 23, 2025 -
[Feature]: Implement Eagle3 Acceleration on vllm-ascend
#1004 closed
Jun 20, 2025 -
[Bug]: Attribute issue for latest upstream vllm, Need pull request
#1299 closed
Jun 19, 2025
21 Issues opened by 14 people
-
[Usage]: can not to use vllm serve with docker
#1440 opened
Jun 26, 2025 -
[Bug]: [v0.9.1rc1] 310P3 start success , reasoning exit vllm
#1425 opened
Jun 25, 2025 -
[Installation]: x86 systems cannot be installed directly through "pip install vllm_ascend==xxx"
#1411 opened
Jun 25, 2025 -
[Installation]: Failed to find function aclmdlRICaptureBegin
#1401 opened
Jun 24, 2025 -
[Bug]: assert self.cpu_group is not None
#1396 opened
Jun 24, 2025 -
[Feature]: Request vllm-ascend to support torch_npu>=2.6
#1390 opened
Jun 24, 2025 -
[Doc]: add Optimization and Tuning for main
#1387 opened
Jun 24, 2025 -
[Bug]: assert coord_socket is not None
#1372 opened
Jun 23, 2025 -
[New Model]: InternVL3-8B
#1362 opened
Jun 23, 2025 -
[v0.9.1rc1] FAQ / Feedback | 问题/反馈
#1351 opened
Jun 22, 2025 -
[Bug]: Prefix cache feature does not work with the Ascend Scheduler
#1350 opened
Jun 22, 2025 -
Record vLLM main branch ci passed commit id
#1339 opened
Jun 21, 2025 -
[Bug]: vllm-ascend 0.7.3.post1 does not support w8a8 quantization, but 0.9.0rc2 does.
#1329 opened
Jun 20, 2025 -
[Bug]: qwen3 moe failed with aclgraph
#1324 opened
Jun 20, 2025 -
[release] 0.9.1rc1 release checklist
#1315 opened
Jun 20, 2025 -
[RFC]: Support Altlas 300I series
#1309 opened
Jun 20, 2025 -
[Feature]: Support PP with VLLM_USE_V1=1
#1302 opened
Jun 19, 2025 -
[RFC]: Unit test coverage improvement
#1298 opened
Jun 19, 2025
44 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Disaggregate prefill for kv cache register style
#950 commented on
Jun 26, 2025 • 13 new comments -
[Perf] Optimize perf of Qwen3
#1245 commented on
Jun 26, 2025 • 3 new comments -
[1/N][CI] Move linting system to pre-commits hooks
#1256 commented on
Jun 25, 2025 • 2 new comments -
[V1][Spec-decode] First Stage support of Eagle 1
#1128 commented on
Jun 26, 2025 • 1 new comment -
[CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr
#1136 commented on
Jun 25, 2025 • 1 new comment -
use group gemm nz
#910 commented on
Jun 23, 2025 • 0 new comments -
[Bugfix][Worker] Clear NPU memory between test profiling
#989 commented on
Jun 23, 2025 • 0 new comments -
[KVConnector][1/N] v1 kvcache connector with the Chariot-DS backend
#1080 commented on
Jun 26, 2025 • 0 new comments -
[Bugfix] Reduce _npu_flash_attention mask to 128x128 for memory savings
#1100 commented on
Jun 25, 2025 • 0 new comments -
[perf] optimize apply_penalties & topKtopP for V0&V1 Engine
#1107 commented on
Jun 20, 2025 • 0 new comments -
[Feature] Use max_num_seqs tokens with profile_run for decode
#1110 commented on
Jun 25, 2025 • 0 new comments -
[docs] Update guidance on how to implement and register new models
#1126 commented on
Jun 25, 2025 • 0 new comments -
[CI] Add accuracy ci for DP and EP and TP and ETP
#1140 commented on
Jun 26, 2025 • 0 new comments -
[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py`
#1189 commented on
Jun 25, 2025 • 0 new comments -
[Feature]Moe alltoallv communication optimization for unquantized RL training sence.
#1208 commented on
Jun 24, 2025 • 0 new comments -
[EPLB]: Correct local expert number calculation with redundant experts && add e2e test
#1223 commented on
Jun 25, 2025 • 0 new comments -
[Draft] Add MTP dummy_run and Adapt torchair graph mode
#1244 commented on
Jun 20, 2025 • 0 new comments -
Feat rope: enable npu_mrope by environment variables
#1251 commented on
Jun 21, 2025 • 0 new comments -
Feature rope: enable npu_mrope by environment variables
#1260 commented on
Jun 19, 2025 • 0 new comments -
[Refactor][WIP] Refactor mla_v1
#1263 commented on
Jun 25, 2025 • 0 new comments -
[DOC] add LoRA user guide
#1265 commented on
Jun 25, 2025 • 0 new comments -
dLLM, short for distributed LLM, an easy-to-use tool for multi-node vllm deployment
#1280 commented on
Jun 23, 2025 • 0 new comments -
[Bug]:AttributeError: 'InternVLChatConfig' object has no attribute 'num_hidden_layers'
#1276 commented on
Jun 19, 2025 • 0 new comments -
[Usage]: Modelslim
#1270 commented on
Jun 20, 2025 • 0 new comments -
[Bug]: export VLLM_USE_V1=1 后启动报错
#1249 commented on
Jun 20, 2025 • 0 new comments -
[Bug]: Build Error seems like compiler renaming causing it.
#1278 commented on
Jun 20, 2025 • 0 new comments -
vLLM Ascend Roadmap Q2 2025
#448 commented on
Jun 21, 2025 • 0 new comments -
[Bug]: Flaky test: test_models_distributed_topk failed due to The IP address and port have been bound already.
#1253 commented on
Jun 21, 2025 • 0 new comments -
[Feature] Support the v1 connector API
#605 commented on
Jun 21, 2025 • 0 new comments -
[v0.9.0rc2] FAQ / Feedback | 问题/反馈 #
#1115 commented on
Jun 22, 2025 • 0 new comments -
[Bug]: Qwen3 pp 并行部署出现问题
#982 commented on
Jun 23, 2025 • 0 new comments -
[Bug]: The sleep mode in version 0.8.4rc2 cannot properly release NPU memory when called in the veRL framework
#977 commented on
Jun 23, 2025 • 0 new comments -
[Feature]: Support AWQ quantization
#1233 commented on
Jun 24, 2025 • 0 new comments -
[Bug]: deepseek-R1-w8a8 VLLM_ENABLE_MC2=1 error
#1243 commented on
Jun 24, 2025 • 0 new comments -
[Bug]: AscendSampler does not handle empty logit tensor
#1133 commented on
Jun 24, 2025 • 0 new comments -
[Bug]: Ray Timeout Error running Multi-Node(tp_size=2) Online Server with Acl_Graph when handling curl request
#1238 commented on
Jun 25, 2025 • 0 new comments -
[Guide] Official Guide Index
#840 commented on
Jun 25, 2025 • 0 new comments -
[RFC]: E2E CI test for key features
#413 commented on
Jun 25, 2025 • 0 new comments -
[Bug]: Only a single TORCH_LIBRARY can be used to register the namespace _C
#1239 commented on
Jun 25, 2025 • 0 new comments -
[RFC]: Doc enhancement
#1248 commented on
Jun 26, 2025 • 0 new comments -
[Bug]: Qwen3-30B-A3B Shows Precision Issues in DP2+TP2 Parallel Mode
#1289 commented on
Jun 26, 2025 • 0 new comments -
[release] 0.9.0rc1 release checklist
#904 commented on
Jun 26, 2025 • 0 new comments -
vLLM Ascend Roadmap Q3 2025
#1168 commented on
Jun 26, 2025 • 0 new comments -
[perf][WIP]: using NZ optimization for quantized GMM
#906 commented on
Jun 23, 2025 • 0 new comments