[Performance]: Query: Memory (VRAM vs. RAM) and Performance Implications of Scaling LoRA Adapters in vLLM

### Proposal to improve performance

I would like to inquire about the resource allocation when deploying multiple LoRA adapters using vLLM. I am using the following command to serve the model:
Generated bash
CUDA_VISIBLE_DEVICES=7 vllm serve /home/gpuserver/Downloads/zzh/Qwen/Qwen2.5-VL-3B-Instruct \
    --enable-lora \
    --lora-modules lora1=/path/to/lora/sft lora2=/path/to/lora/sft
Use code with caution.
Bash
My primary question is: as the number of LoRA adapters increases, which memory resource is primarily consumed—GPU memory (VRAM) or system memory (RAM)?
Furthermore, I am curious about the performance. If there is a mechanism that swaps LoRA adapters from system memory to VRAM on demand, can a reasonable level of inference speed still be guaranteed?

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: Query: Memory (VRAM vs. RAM) and Performance Implications of Scaling LoRA Adapters in vLLM #20160

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: Query: Memory (VRAM vs. RAM) and Performance Implications of Scaling LoRA Adapters in vLLM #20160

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions