-
Notifications
You must be signed in to change notification settings - Fork 360
Insights: vllm-project/vllm-ascend
Overview
Could not load contribution data
Please try again later
43 Pull requests merged by 24 people
-
feat: add mtp ut and fix some bugs
#2453 merged
Aug 22, 2025 -
[Bugfix] Fix the bug of incorrect precision
#2479 merged
Aug 22, 2025 -
[Doc] Add release note for
v0.9.1rc3
#2488 merged
Aug 22, 2025 -
[0.9.1][Doc] Add release note for
v0.9.1rc3
#2431 merged
Aug 22, 2025 -
[QuickFix] Skip failed ut to recover CI quickly
#2484 merged
Aug 22, 2025 -
[Bugfix] Fix the bug that qwen3 moe doesn't work with aclgraph
#2478 merged
Aug 22, 2025 -
[Doc] Add feature branch long_seq_optimization
#2477 merged
Aug 22, 2025 -
[CI] fix ci
#2464 merged
Aug 21, 2025 -
[2/N][refactor] torchair deepseek mla backend refactor
#2459 merged
Aug 21, 2025 -
Add feature branch policy
#2432 merged
Aug 21, 2025 -
add mlp tp optimze
#2120 merged
Aug 21, 2025 -
[DOC] update doc: LoRA with ACLGraph
#2430 merged
Aug 21, 2025 -
refact runner model v1
#2461 merged
Aug 21, 2025 -
[main][bugfix] Modify the default value of the enable_shared_pert_dp to false
#2457 merged
Aug 20, 2025 -
[main][quantization] Adapt to the new format of ds w4a8 weight
#2392 merged
Aug 20, 2025 -
[0.9.1] [BUGFIX] support mtp in disaggregated-prefill scenario
#2444 merged
Aug 20, 2025 -
[CI] Fix UT
#2452 merged
Aug 20, 2025 -
refactor allgather/mc2-related fused_experts
#2369 merged
Aug 20, 2025 -
[PD] Correct the ip and port env
#2450 merged
Aug 20, 2025 -
qwen3_moe/qwen25 support torchair graph
#2403 merged
Aug 20, 2025 -
[misc] remove uesless envs
#2448 merged
Aug 20, 2025 -
[CI] add lint block before running e2e
#2447 merged
Aug 20, 2025 -
Fix some ci issue and refactor modelrunner
#2445 merged
Aug 20, 2025 -
Nominate Mengqing Cao as vllm-ascend maintainer
#2433 merged
Aug 19, 2025 -
[improve] Remove redundant parentheses in pangu_moe.py
#2081 merged
Aug 19, 2025 -
Nominate ApsarasX as vllm-ascend maintainer
#2419 merged
Aug 19, 2025 -
[Bugfix] Fix
grammar_bitmask
IndexError caused by outdatedapply_grammar_bitmask
method#2314 merged
Aug 19, 2025 -
[3/N][Refactor] Move
torchair_attention
totorchair
dir#2017 merged
Aug 19, 2025 -
[Bug] Fix bug in test_chunked.py
#1992 merged
Aug 19, 2025 -
[0.9.1][Fix] Removes explicit ATB extension registration
#1921 merged
Aug 19, 2025 -
[0.9.1][BUGFIX] fix error info and adapt attn_metedata refactor
#2402 merged
Aug 19, 2025 -
fix doc typo
#2407 merged
Aug 19, 2025 -
[Bugfix] Fix custom op register issue
#2409 merged
Aug 19, 2025 -
Add Custom Kernels For LoRA Performance
#2325 merged
Aug 19, 2025 -
Bump actions/checkout from 4 to 5
#2420 merged
Aug 19, 2025 -
[0.9.1][BUGFIX] fix mtp config bug
#2412 merged
Aug 18, 2025 -
[0.9.1][Bugfix] Fix header include issue in rope
#2398 merged
Aug 18, 2025 -
Add ModelRunner_prepare_inputs doc
#1493 merged
Aug 18, 2025 -
[1/N][refactor] torchair deepseek modeling refactor
#2384 merged
Aug 18, 2025 -
[Bugfix] Fix header include issue in rope
#2397 merged
Aug 18, 2025 -
[P/D] Mooncake Connector for v1 distributed
#1568 merged
Aug 18, 2025 -
[v0.9.1] MTP supports V1 scheduler
#2371 merged
Aug 16, 2025 -
[bugfix] ascend schedule encountered an incorrect req block length in…
#2394 merged
Aug 16, 2025
35 Pull requests opened by 25 people
-
[0.9.1][bugfix] Unify MoE routing init with standard torch_npu operator
#2400 opened
Aug 15, 2025 -
[main][bugfix] Unify MoE routing init with standard torch_npu operator
#2401 opened
Aug 16, 2025 -
[Core] Add GPT-OSS model support for Ascend NPU
#2421 opened
Aug 18, 2025 -
[MAIN][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled.
#2426 opened
Aug 18, 2025 -
[bugfix] ascend schedule encountered an incorrect req block length in…
#2429 opened
Aug 19, 2025 -
[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig
#2434 opened
Aug 19, 2025 -
Add gpt oss
#2436 opened
Aug 19, 2025 -
[1/N][Draft][Refactor]torchair pangu_moe modeling refactor
#2437 opened
Aug 19, 2025 -
[1/N][refactor] torchair fused_moe refactor
#2438 opened
Aug 19, 2025 -
[main] Fix AddRMSNormW8A8Quant init bug
#2440 opened
Aug 19, 2025 -
[0.9.1-DEV][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled.
#2441 opened
Aug 19, 2025 -
[main][bugfix] Fix bugs and refactor cached mask generation logic
#2442 opened
Aug 19, 2025 -
Adapted the independent TP partitioning of the O matrix in the Qwen3-235B model for pure DP scenarios.
#2443 opened
Aug 19, 2025 -
[Model] Optimizing gemma3 model's GemmaRMSNorm function
#2456 opened
Aug 20, 2025 -
[FEATURE][MTP] Support MTP > 1
#2458 opened
Aug 20, 2025 -
[Do not merge][0.9.1][feat] use tensor_list for gmm
#2462 opened
Aug 20, 2025 -
Refact mla 0821
#2465 opened
Aug 21, 2025 -
long_seq branch
#2467 opened
Aug 21, 2025 -
[WIP][Feat] Add MC2 communication method for MoE layers
#2469 opened
Aug 21, 2025 -
[Fix] fix resources limit error when apply speculative decoding and aclgraph
#2472 opened
Aug 21, 2025 -
convert the format of gmm to nz
#2474 opened
Aug 21, 2025 -
add v3initingrouting.
#2475 opened
Aug 21, 2025 -
【WIP】【main】FlashComm2 For Qwen3 MoE
#2480 opened
Aug 21, 2025 -
[Refactor] cleanup converting_weight_acl_format_format
#2482 opened
Aug 22, 2025 -
[CI] Fix CI
#2483 opened
Aug 22, 2025 -
Prefetching MoE weights to enhance the performance of Qwen3-235B.
#2486 opened
Aug 22, 2025 -
refactor alltoallv in fused_moe
#2487 opened
Aug 22, 2025 -
add logger.warning to avoid possible oom conditions
#2489 opened
Aug 22, 2025 -
[v0.9.1][Doc] Update FAQ
#2490 opened
Aug 22, 2025 -
修改torchair qwen3 moe 注册
#2491 opened
Aug 22, 2025 -
[Draft]optimize dp allreduce
#2492 opened
Aug 22, 2025 -
[Feat] Enable SP for DeepSeekV2 MLA
#2493 opened
Aug 22, 2025 -
[5/N][Refactor] Refactor `AscendAttentionTorchairBackendImpl`, add `_dispatch_forward()` method
#2495 opened
Aug 22, 2025 -
[Bugfix] Fix wrong torchair model register key
#2496 opened
Aug 22, 2025
6 Issues closed by 2 people
-
[Usage]: vllm 0.9.1 + turbo没有生效 vllm0.7.3 + turbo吞吐有性能提升
#2451 closed
Aug 22, 2025 -
[Bug]: [utils.py:741] Waiting for 1 local, 0 remote core engine proc(s) to start.
#2416 closed
Aug 21, 2025 -
[Bug]: VLLM ascend v0.9.2.rc1-310p with lora run exteremely slow
#1812 closed
Aug 19, 2025 -
[Performance]: 在910b显卡上使用0.9.0rc2镜像部署lora模型时速度很慢
#1686 closed
Aug 19, 2025 -
[Bug]: 0.9.1 version Lora/MultiLora 推理速度慢 2、3 tokens\s
#1629 closed
Aug 19, 2025 -
[Bug]: Lora feature cannot be used in Aclgraph mode
#1464 closed
Aug 19, 2025
20 Issues opened by 20 people
-
[Bug]: Qwen3-235B-A22B-W8A8 跑不起来
#2497 opened
Aug 22, 2025 -
[Doc]: Qwen3-30B使用PD分离双机部署,进行ais_bench进行测试出现卡停现象,同时P、D服务都没有出现报错提示
#2494 opened
Aug 22, 2025 -
[Bug]: Eagle3 Speculative Decoding graph mode start error
#2481 opened
Aug 21, 2025 -
[Bug]: Deepseek mtp torchair graph mode
#2476 opened
Aug 21, 2025 -
[RFC]: Proposal for Optimizing KVCache Transfer via Layer-wise Strategy in Disaggregation
#2470 opened
Aug 21, 2025 -
[Bug]: Deepseek bug with DBO
#2468 opened
Aug 21, 2025 -
[Usage]:直接使用vllm-ascend部署Qwen3性能非常差(相对于mindie),请问mindie turbo是否已支持加速Qwen3?
#2466 opened
Aug 21, 2025 -
[Performance]: vllm0.10.0版本是否需要加上turbo 在mindie-turbo中并没有看到适配vllm0.10.0的turbo版本
#2463 opened
Aug 21, 2025 -
[Bug]: Qwen3-30B-A3B DP+EP offline inference timeout
#2460 opened
Aug 20, 2025 -
[Bug]: DeepSeek R1 precision issue, send 1 token to server, get response containing irrelevant things
#2455 opened
Aug 20, 2025 -
[Performance]: 在两台8卡910B上跑glm4.5吞吐极低
#2446 opened
Aug 20, 2025 -
[Bug]: Wrong use of AttentionMaskBuilder.get_splitfuse_attn_mask
#2428 opened
Aug 19, 2025 -
[Bug]: Qwen3-32b start up failed
#2424 opened
Aug 18, 2025 -
[Bug]: 在310P3的显卡上,用v.10.0版本运行Qwen3-Embedding-0.6B,显存占到18G
#2422 opened
Aug 18, 2025 -
[Bug]: MiniMax/MiniMax-M1-40k and MiniMax/MiniMax-Text-01 failed to start in enage and graph model
#2414 opened
Aug 18, 2025 -
[Bug]: Deepseek-w8a8 has precision issues.
#2413 opened
Aug 18, 2025 -
[v0.9.1rc3] FAQ / Feedback | 问题/反馈
#2410 opened
Aug 18, 2025 -
[Bug]: Disaggregate prefill repro fail
#2406 opened
Aug 17, 2025 -
[Bug]: link failed using llmdatadist for disaggregated prefill
#2399 opened
Aug 15, 2025
81 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[core] Adopt graph rewriter on fx.graph to enable automatic kernel fusion
#2389 commented on
Aug 20, 2025 • 11 new comments -
[Feat]: Add custom lmhead tensor model parallel
#2309 commented on
Aug 22, 2025 • 3 new comments -
[Qwen-moe] Remove the minor operation arange
#2373 commented on
Aug 22, 2025 • 2 new comments -
[feat]: oproj tensor parallelism in pure DP and graph-mode scenarios.
#2167 commented on
Aug 22, 2025 • 2 new comments -
[Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut
#2352 commented on
Aug 22, 2025 • 2 new comments -
[0.9.1][FEATURE][MTP] Support MTP > 1
#2180 commented on
Aug 21, 2025 • 1 new comment -
Fix the accuracy issues caused by the mrope operator
#2388 commented on
Aug 21, 2025 • 0 new comments -
refactor _process_reqs in model_runner_v1.py
#2240 commented on
Aug 20, 2025 • 0 new comments -
[main][Bugfix] Fix `AscendFusedMoE` not working with AclGraph
#2224 commented on
Aug 21, 2025 • 0 new comments -
[Bugfix] Add Eagle-3 Support for Qwen3
#2215 commented on
Aug 20, 2025 • 0 new comments -
qwen3_moe/qwen25 support torchair graph
#2214 commented on
Aug 20, 2025 • 0 new comments -
[Bugfix] Fix torchair prefix cache not working under non-moe model
#2212 commented on
Aug 19, 2025 • 0 new comments -
[bugfix] fix gd or apc or fc run error when ascend_…
#2190 commented on
Aug 21, 2025 • 0 new comments -
add super kernel for decode moe
#2157 commented on
Aug 19, 2025 • 0 new comments -
[Feat] Implement Full Graph on main branch
#2128 commented on
Aug 20, 2025 • 0 new comments -
[v0.9.1] Switch Infra to linux-aarch64-a2 and python to 3.11
#2119 commented on
Aug 19, 2025 • 0 new comments -
support fix route code
#2103 commented on
Aug 19, 2025 • 0 new comments -
[0.9.1]use npu all_reduce
#2087 commented on
Aug 19, 2025 • 0 new comments -
Fix for tp execute model output
#2074 commented on
Aug 19, 2025 • 0 new comments -
[bugfix] fix prefix cache in qwen
#2059 commented on
Aug 21, 2025 • 0 new comments -
Add graph mode for Qwen2.5/Qwen3 ..
#2041 commented on
Aug 16, 2025 • 0 new comments -
Added support for KV connector v1
#2039 commented on
Aug 21, 2025 • 0 new comments -
[0.9.1][V1][PP] Support pp with ray backend in V1 (#1800)
#2027 commented on
Aug 19, 2025 • 0 new comments -
[CustomOp] Register RotaryEmbedding instead of overwrite forward
#2385 commented on
Aug 22, 2025 • 0 new comments -
Abstraction of Independent TP Sharding Module for MatMul under Pure DP
#2381 commented on
Aug 22, 2025 • 0 new comments -
[CustomOp] Register FusedMoe
#2380 commented on
Aug 21, 2025 • 0 new comments -
[Doc] Add multi-node ray backend tutorial
#2376 commented on
Aug 18, 2025 • 0 new comments -
[4/N][Refactor] Refactor `AscendAttentionMetadataBuilder` for better extensibility and make the builder class of torchair extend from it
#2375 commented on
Aug 22, 2025 • 0 new comments -
[CORE] concurrent partial prefills
#2372 commented on
Aug 22, 2025 • 0 new comments -
[Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl.
#2363 commented on
Aug 20, 2025 • 0 new comments -
[Refactor] refactor spec decode
#2361 commented on
Aug 22, 2025 • 0 new comments -
[0.9.1] fix dbo error info
#2358 commented on
Aug 19, 2025 • 0 new comments -
Delete dumplicate codes and fix a bug
#2341 commented on
Aug 22, 2025 • 0 new comments -
Torchair graph812 cov
#2337 commented on
Aug 19, 2025 • 0 new comments -
[BugFix] Fix the issue where the eagle method for speculative decoding fails to load the model
#2331 commented on
Aug 19, 2025 • 0 new comments -
[CI] Update accuracy CI
#2330 commented on
Aug 22, 2025 • 0 new comments -
[Structured Output][CI] Add test for `outlines` backend for structured output in CI
#2283 commented on
Aug 22, 2025 • 0 new comments -
Accuracy report formatting
#2279 commented on
Aug 22, 2025 • 0 new comments -
[CI] [1/2] Refactor e2e CI - singlecard
#2276 commented on
Aug 22, 2025 • 0 new comments -
[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers
#2275 commented on
Aug 22, 2025 • 0 new comments -
[Main] prefill dbo build on alltoall_seq
#2241 commented on
Aug 20, 2025 • 0 new comments -
feat: support V1 report_usage_stats in vllm-ascend
#1061 commented on
Aug 18, 2025 • 0 new comments -
[Refactor]: Refactor torchair in vllm-ascend
#2273 commented on
Aug 22, 2025 • 0 new comments -
[Usage]: ZhipuAI/GLM-4.5 run multi node with aclgraph
#2082 commented on
Aug 22, 2025 • 0 new comments -
[Doc]: torch_npu import and calls statistics
#1511 commented on
Aug 22, 2025 • 0 new comments -
[Bug]: 基于verl的lora训练,迭代几步后,vllm进行lora推理时报错
#1845 commented on
Aug 22, 2025 • 0 new comments -
[Release]: Release checklist for `v0.9.1rc3`
#2396 commented on
Aug 22, 2025 • 0 new comments -
[Bug]: Qwen2.5-7B The process exits for this inner error, and the current working operator name is SelfAttentionOperation
#2239 commented on
Aug 22, 2025 • 0 new comments -
[Bug]: v0.10.0rc1启动Qwen3-235B-A22B报错
#2266 commented on
Aug 22, 2025 • 0 new comments -
[Usage]: 910B2推理qwen-30B-A3B速度
#2328 commented on
Aug 21, 2025 • 0 new comments -
[RFC]: Refactoring fused_moe
#2321 commented on
Aug 21, 2025 • 0 new comments -
vLLM Ascend Model Support Priority
#1608 commented on
Aug 20, 2025 • 0 new comments -
[RFC]: Unit test coverage improvement
#1298 commented on
Aug 20, 2025 • 0 new comments -
[Feature]: Prometheus + Grafana Metrics Integration for vLLM
#1795 commented on
Aug 19, 2025 • 0 new comments -
[Bug]: Attempted to assign 58 = 58 multimodal tokens to 59 placeholders
#1045 commented on
Aug 19, 2025 • 0 new comments -
[Bug]: v0.9.2rc2, Qwen3-235B-A22B-Thinking-2507, with mindie_turbo, deploy failed
#2146 commented on
Aug 19, 2025 • 0 new comments -
[Bug]: 在910B上用vllm-ascend部署的模型性能很差还有无限循环输出
#2310 commented on
Aug 19, 2025 • 0 new comments -
[Bug]: ACL stream synchronize failed, error code:507053
#2070 commented on
Aug 19, 2025 • 0 new comments -
[Usage]: 如何使用Full Graph?在0.9.1-dev使用启动报错
#2391 commented on
Aug 18, 2025 • 0 new comments -
[Usage]: The token speed is too slow during inference. How can I improve the inference performance and speed of the Qwen series multimodal models?
#1975 commented on
Aug 18, 2025 • 0 new comments -
[Bug]: vllm-ascend 0.9.2rc1,qwen3-32b模型,运行调用一段时间后,报错rtKernelLaunchWithHandleV2 failed: 507035
#2349 commented on
Aug 15, 2025 • 0 new comments -
[main][bugfix]Fix the issue where quantized model fails to start
#2004 commented on
Aug 19, 2025 • 0 new comments -
[0.9.1]eplb support qwen3-moe
#2000 commented on
Aug 22, 2025 • 0 new comments -
[P/D] NIXL Connector for v1 distributed
#1984 commented on
Aug 19, 2025 • 0 new comments -
[Bug] use hidden_size_per_attention_head for scale_value
#1958 commented on
Aug 19, 2025 • 0 new comments -
[Misc] Refractor forward metadata retrieval across DP nodes to reduce redundant padding.
#1950 commented on
Aug 19, 2025 • 0 new comments -
[Build] Make buid_py work in develop installation mode
#1925 commented on
Aug 19, 2025 • 0 new comments -
[Bugfix]: Correct handling of cos_sin_cache length
#1900 commented on
Aug 21, 2025 • 0 new comments -
[Feature] Optimize forward metadata collection across dp ranks
#1857 commented on
Aug 19, 2025 • 0 new comments -
[BugFix]fixed all_reduce_merge_allgather_ep bug
#1818 commented on
Aug 19, 2025 • 0 new comments -
[BugFix]fixed rm_router_logits_allgather_ep bug
#1817 commented on
Aug 19, 2025 • 0 new comments -
【main】 Support SP for qwen2.5 and qwen3 moe
#1761 commented on
Aug 18, 2025 • 0 new comments -
[0.9.1]support fa3 quant
#1695 commented on
Aug 19, 2025 • 0 new comments -
[WIP][Feature]cpu offload connector
#1659 commented on
Aug 20, 2025 • 0 new comments -
pangumoe support rope
#1633 commented on
Aug 18, 2025 • 0 new comments -
[CI][Benchmark] Add Qwen3-30B-A3B and Qwen3-32B performance benchmark
#1613 commented on
Aug 18, 2025 • 0 new comments -
[fix.]Adding support for VLLM_RANDOMIZE_DP_DUMMY_INPUTS
#1567 commented on
Aug 18, 2025 • 0 new comments -
[0.9.1][Feature] Support w8a8 weight prefetch
#1509 commented on
Aug 18, 2025 • 0 new comments -
[Doc] Refactor env vars module and change the way of doc generation
#1504 commented on
Aug 18, 2025 • 0 new comments -
dLLM, short for distributed LLM, an easy-to-use tool for multi-node vllm deployment
#1280 commented on
Aug 18, 2025 • 0 new comments -
[KVConnector][1/N] v1 kvcache connector with the Chariot-DS backend
#1080 commented on
Aug 18, 2025 • 0 new comments