[Core] Relax the LoRA max rank #26461

jeejeelee · 2025-10-09T03:12:35Z

Purpose

Although we have already supported lora rank=1, we are adding a max lora rank=1 option to accommodate scenarios where only LoRA rank=1 is needed.
also cc @varun-sundar-rabindranath

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

gemini-code-assist

Code Review

This pull request aims to add support for max_lora_rank=1. The changes in vllm/config/lora.py and vllm/v1/worker/lora_model_runner_mixin.py are correct for this purpose. However, I've identified a significant performance issue related to CUDA graph capture for LoRA, which is detailed in my comment.

gemini-code-assist · 2025-10-09T03:16:29Z

vllm/v1/worker/lora_model_runner_mixin.py

+            lora_warmup_rank = (
+                lora_config.max_lora_rank if lora_config.max_lora_rank < 8 else 8
+            )


This logic for determining lora_warmup_rank is correct for supporting max_lora_rank=1, but it perpetuates a significant performance issue for CUDA graph capture when max_lora_rank > 8.

The _dummy_run method, which uses this lora_warmup_rank, is responsible for both profiling and CUDA graph capture. By capping the warmup rank at 8, any captured CUDA graph will be specific to LoRA ranks up to 8. If the engine is configured with max_lora_rank > 8 (e.g., 16) and receives requests with LoRAs of a rank greater than 8, the captured graph will not be used, causing a fallback to eager execution and negating the performance benefits of CUDA graphs.

To ensure CUDA graphs are captured for the maximum configured rank, the warmup rank should be lora_config.max_lora_rank during graph capture. A simple fix is to always use max_lora_rank, which would also make it consistent with the TPU runner. This might slow down warm-up for large ranks, but it ensures that CUDA graphs are effective for all configured LoRA ranks.

lora_warmup_rank = lora_config.max_lora_rank

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

jeejeelee added 5 commits September 30, 2025 15:23

Done

0bcec83

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'vllm-project:main' into main

e87da3f

Merge branch 'vllm-project:main' into main

5130c1b

Merge branch 'vllm-project:main' into main

050c585

Done

2b05f67

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee requested review from ProExpertProg, WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners October 9, 2025 03:12

mergify bot added ci/build v1 labels Oct 9, 2025

Revert

ea7407a

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

jeejeelee requested a review from DarkLight1337 October 9, 2025 03:17

DarkLight1337 approved these changes Oct 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 9, 2025 03:34

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025

vllm-bot merged commit 1b2c440 into vllm-project:main Oct 9, 2025
46 of 48 checks passed

jeejeelee deleted the relax-lora-max-rank branch October 9, 2025 07:58

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

fb1c1fc

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

6e88270

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

2852f57

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

1a72ec4

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

adcc794

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

9f70e2d

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Core] Relax the LoRA max rank (vllm-project#26461)

eedaef8

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Relax the LoRA max rank #26461

[Core] Relax the LoRA max rank #26461

jeejeelee commented Oct 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Core] Relax the LoRA max rank #26461

[Core] Relax the LoRA max rank #26461

Conversation

jeejeelee commented Oct 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeejeelee commented Oct 9, 2025 •

edited by github-actions bot

Loading