Skip to content

[Feature]: Support loading LoRA adapters directly from s3 bucket #15633

@robert-moyai

Description

@robert-moyai

🚀 The feature, motivation and pitch

Currently vLLM supports dynamically loading and unloading LoRA adapters at runtime through dedicated API endpoints. The supported behaviour is described in the documentation: https://docs.vllm.ai/en/stable/features/lora.html#dynamically-serving-lora-adapters.

For the base model you can load the weights in safetensors format using the run:ai model streamer: https://docs.vllm.ai/en/v0.7.1/models/extensions/runai_model_streamer.html

If you try to load LoRAs (adapter_model.safetensors/adapter_config.json) from s3 you get a validation error shown below:

ERROR 03-27 15:27:33 [utils.py:229] Error downloading the HuggingFace model
ERROR 03-27 15:27:33 [utils.py:229] Traceback (most recent call last):
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/utils.py", line 223, in get_adapter_absolute_path
ERROR 03-27 15:27:33 [utils.py:229]     local_snapshot_path = huggingface_hub.snapshot_download(
ERROR 03-27 15:27:33 [utils.py:229]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
ERROR 03-27 15:27:33 [utils.py:229]     validate_repo_id(arg_value)
ERROR 03-27 15:27:33 [utils.py:229]   File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
ERROR 03-27 15:27:33 [utils.py:229]     raise HFValidationError(
ERROR 03-27 15:27:33 [utils.py:229] huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://lora-adapter/llama-3.1-8b-abliterated-lora/'. Use `repo_type` argument if needed.

This behaviour comes from def get_adapter_abolute_path (https://github.com/vllm-project/vllm/blob/main/vllm/lora/utils.py#L192). But looking at where the tensor is actually loaded I dont see why you cant just stream the weights from s3 using RunAI streamer (https://github.com/vllm-project/vllm/blob/main/vllm/lora/models.py#L187).

My feature request is that we start supporting loading LoRA weights from s3 using RunAI model streamer.

Alternatives

You can download the LoRA weights manually and then reference the model using an local path. But this adds complexity to the dynamic serving setup.

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requeststaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions