[Feature]: Support loading LoRA adapters directly from s3 bucket

### 🚀 The feature, motivation and pitch

Currently vLLM supports dynamically loading and unloading LoRA adapters at runtime through dedicated API endpoints. The supported behaviour is described in the documentation: https://docs.vllm.ai/en/stable/features/lora.html#dynamically-serving-lora-adapters.

For the base model you can load the weights in safetensors format using the run:ai model streamer: https://docs.vllm.ai/en/v0.7.1/models/extensions/runai_model_streamer.html

If you try to load LoRAs (adapter_model.safetensors/adapter_config.json) from s3 you get a validation error shown below:

```
ERROR 03-27 15:27:33 [utils.py:229] Error downloading the HuggingFace model
ERROR 03-27 15:27:33 [utils.py:229] Traceback (most recent call last):
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/utils.py", line 223, in get_adapter_absolute_path
ERROR 03-27 15:27:33 [utils.py:229]     local_snapshot_path = huggingface_hub.snapshot_download(
ERROR 03-27 15:27:33 [utils.py:229]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
ERROR 03-27 15:27:33 [utils.py:229]     validate_repo_id(arg_value)
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
ERROR 03-27 15:27:33 [utils.py:229]     raise HFValidationError(
ERROR 03-27 15:27:33 [utils.py:229] huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://lora-adapter/llama-3.1-8b-abliterated-lora/'. Use `repo_type` argument if needed.
```

This behaviour comes from `def get_adapter_abolute_path` (https://github.com/vllm-project/vllm/blob/main/vllm/lora/utils.py#L192). But looking at where the tensor is actually loaded I dont see why you cant just stream the weights from s3 using RunAI streamer (https://github.com/vllm-project/vllm/blob/main/vllm/lora/models.py#L187).

My feature request is that we start supporting loading LoRA weights from s3 using RunAI model streamer.

### Alternatives

You can download the LoRA weights manually and then reference the model using an local path. But this adds complexity to the dynamic serving setup.

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support loading LoRA adapters directly from s3 bucket #15633

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support loading LoRA adapters directly from s3 bucket #15633

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions