-
Notifications
You must be signed in to change notification settings - Fork 101
Insights: HabanaAI/vllm-fork
Overview
-
- 20 Merged pull requests
- 18 Open pull requests
- 0 Closed issues
- 1 New issue
Could not load contribution data
Please try again later
20 Pull requests merged by 14 people
-
consolidate pd/dp scripts
#1346 merged
Jun 2, 2025 -
implement a load balancer in PD proxy server w/ script updates.
#1343 merged
May 30, 2025 -
[habana_main][bypass inc][fp8 kv cache] Enable FP8 KV cache bypass INC
#1333 merged
May 30, 2025 -
Return logprobs in delayed_sampling
#1323 merged
May 29, 2025 -
Enable build hpu-vllm docker on machine without hpu
#1341 merged
May 29, 2025 -
Cherry pick docs from 1.21 release to habana_main
#1332 merged
May 28, 2025 -
Set vllm-hpu-extension revision to 80985d3
#1318 merged
May 28, 2025 -
Port: Fix for scheduler handling of padded-aware-scheduling
#1300 merged
May 28, 2025 -
docker vllm - minor fixes
#1330 merged
May 28, 2025 -
Align max_block calculation with get_kv_cache_shape changes
#1312 merged
May 28, 2025 -
[deepseek_r1] fix warmup logic form contiguous_pa
#1328 merged
May 28, 2025 -
[DeepSeek PP] Remove Residual
#1326 merged
May 28, 2025 -
Enable qwen2vl with padding
#1327 merged
May 28, 2025 -
Initial commit of vllm docker.
#1303 merged
May 27, 2025 -
Fix benchmark_throughput for vllm async run.
#1313 merged
May 27, 2025 -
Removed redundant variable for APC enablement
#1322 merged
May 27, 2025 -
Optimized Qwen3 and Qwen3-MoE on Gaudi for the aice/v1.21.0 branch
#1320 merged
May 27, 2025 -
T.compile _update_metadata method
#1311 merged
May 27, 2025 -
fix has_patched_prev_output flag error
#1317 merged
May 27, 2025
18 Pull requests opened by 17 people
-
Add Flag to speed up Qwen3 fp8 warmup issue
#1319 opened
May 27, 2025 -
Fix vllm crash when running with lm-eval
#1321 opened
May 27, 2025 -
[V1] Add new block scheduling queue to select blocks with lowest id
#1329 opened
May 28, 2025 -
Upgrade to HPU docker 1.21.0 and update run_cluster.sh
#1331 opened
May 28, 2025 -
Fix prefill warm up issue
#1335 opened
May 29, 2025 -
fix requirements/hpu.txt for hpu extension
#1336 opened
May 29, 2025 -
[deepseek_r1] Enable StaticMoE for decoding phase of static activation quant path
#1338 opened
May 29, 2025 -
Revise DeepSeek-R1 README and update start scripts
#1339 opened
May 29, 2025 -
[WIP][TC][FP8] Enable dynamo to create floating point data-dependent fxgraphs
#1340 opened
May 29, 2025 -
[draft] merged_prefill for V1
#1342 opened
May 29, 2025 -
[WIP] Enable interleaved sliding_window for gemma3
#1344 opened
May 30, 2025 -
by default disable contiguous_pa on Gaudi2.
#1345 opened
May 30, 2025 -
Add split_qkv for Mixtral
#1347 opened
Jun 2, 2025 -
[V1] Increase EXECUTE_MODEL_TIMEOUT_S
#1348 opened
Jun 2, 2025 -
Remove duplicate kv_b_proj from models using MLA
#1349 opened
Jun 2, 2025 -
Update readme with exponential warmup as default
#1350 opened
Jun 2, 2025 -
Update docker file
#1351 opened
Jun 2, 2025 -
fix prefill and add lm for pd
#1352 opened
Jun 2, 2025
1 Issue opened by 1 person
-
[Usage]: How to deploy vllm with deepseek-v3
#1337 opened
May 29, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[DRAFT] 3d warmup
#1178 commented on
Jun 2, 2025 • 15 new comments -
Support PD disaggregation
#1056 commented on
Jun 1, 2025 • 4 new comments -
vLLM-Base: Full enabling of ALiBi
#1055 commented on
May 28, 2025 • 2 new comments -
Enabled MoE for both BF16 and INC based FP8.
#1309 commented on
May 28, 2025 • 2 new comments -
Add split_qkv for Granite
#1263 commented on
Jun 2, 2025 • 2 new comments -
Split gate and up projections in LLamaMLP for models with bias
#1310 commented on
Jun 2, 2025 • 1 new comment -
Add info about split_qkv to README
#1089 commented on
Jun 2, 2025 • 1 new comment -
[Gaudi][Model] Qwen2.5-VL optimization for 112 aligned images
#1163 commented on
Jun 2, 2025 • 1 new comment -
[Gaudi][Intel] Update Dockerfile.hpu for Gaudi 1.20.1
#1316 commented on
May 29, 2025 • 0 new comments -
[draft] Optimize RotaryEmbedding for reuse in t.compile with dynamic shapes
#1315 commented on
May 28, 2025 • 0 new comments -
parallel compile for fast warm up
#1304 commented on
May 27, 2025 • 0 new comments -
Qwen2.5 Omni
#1296 commented on
May 29, 2025 • 0 new comments -
[SW-225565] Enable traingular softmax with merged prefill
#1278 commented on
Jun 2, 2025 • 0 new comments -
Enable triangular attention
#1268 commented on
May 28, 2025 • 0 new comments -
Enable embedding test on jenkins
#1234 commented on
Jun 2, 2025 • 0 new comments -
[Bug]: Llama 405B FP8 on Gaudi 3
#1194 commented on
May 29, 2025 • 0 new comments -
[Feature]: Hangs After Model Load, but run_example_tp.py Executes Successfull
#967 commented on
May 29, 2025 • 0 new comments -
[Installation]: v0.7.2+Gaudi-1.21.0 Dockerfile.hpu build fails with incorrect base image
#1261 commented on
May 29, 2025 • 0 new comments