[UPDATED] - Large Block_size solution #21123

nadathurv · 2025-07-17T15:25:24Z

This is the updated and current work on this issue. It is related to the To-Do item

Original Problem

Hybrid models were using extremely large block sizes (~400 tokens) due to individual layer constraints. Each attention layer was padded so that kv_hidden_size * block_size of one layer was larger than the mamba state size of one layer, leading to inefficient memory usage.

Solution

Implement aggregate constraint approach instead of individual layer constraints:

Before: Each layer individually satisfies mamba state requirement
After: Combined memory of all attention layers satisfies mamba state requirement

Key Changes

kv_cache_coordinator.py: Add calculate_optimal_block_size() method
- Implements aggregate constraint calculation: max_mamba_state / (num_attention_layers * min_per_token_bytes)
- Provides fallback to OPTIMAL_BLOCK_FALLBACK when calculation fails
- Includes cached version with LRU cache for performance optimization
kv_cache_utils.py: Add _get_kv_cache_config_optimal_block_size() integration
- Deep copies all specs to prevent mutation of original configurations
- Applies calculated optimal block size uniformly across all layer specs
- Wraps calculation in try-catch with fallback to existing uniform page size logic
- Integrates with existing get_kv_cache_config() flow for hybrid models

cc @heheda12345 @tlrmchlsmth

Outdated links: Original Work

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-Authored-By: nadathurv <218520480+nadathurv@users.noreply.github.com>

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

…urv/vllm into large-block-size-solution

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

github-actions · 2025-07-17T15:25:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces an intelligent way to calculate the optimal block size for hybrid models, which should improve memory efficiency. The core logic for the calculation in kv_cache_coordinator.py is robust and handles edge cases well. The integration in kv_cache_utils.py is also well-structured.

I've identified one high-severity issue regarding error handling. The use of a broad, silent except Exception could mask bugs and should be updated to include logging for better maintainability and easier debugging. Other than that, the changes look good.

vllm/v1/core/kv_cache_utils.py

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

…urv/vllm into large-block-size-solution

nadathurv and others added 7 commits July 16, 2025 11:52

Merge branch 'vllm-project:main' into large-block-size-solution

f3e953b

Refactoring, addressing critical issues

ed4b315

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

Merge branch 'large-block-size-solution' of https://github.com/nadath…

e248248

…urv/vllm into large-block-size-solution

Improved logic after reviewing code

73e555b

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

Merge branch 'vllm-project:main' into large-block-size-solution

a1a4d7f

nadathurv requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 17, 2025 15:25

mergify bot added the v1 label Jul 17, 2025

gemini-code-assist bot reviewed Jul 17, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Outdated Show resolved Hide resolved

nadathurv added 2 commits July 17, 2025 08:30

Fixed Exception logging issue

a2f4853

Signed-off-by: nadathurv <work.vnadathur@gmail.com> Co-Authored-By: nadathurv <work.vnadathur@gmail.com>

Merge branch 'large-block-size-solution' of https://github.com/nadath…

e00a5e3

…urv/vllm into large-block-size-solution

heheda12345 mentioned this pull request Jul 18, 2025

[RFC]: Native support for Mamba, SSM, and hybrid transformer models in vLLM V1 #17140

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[UPDATED] - Large Block_size solution #21123

[UPDATED] - Large Block_size solution #21123

nadathurv commented Jul 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[UPDATED] - Large Block_size solution #21123

Are you sure you want to change the base?

[UPDATED] - Large Block_size solution #21123

Conversation

nadathurv commented Jul 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Original Problem

Solution

Key Changes

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

nadathurv commented Jul 17, 2025 •

edited by github-actions bot

Loading