-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Description
🚀 The feature, motivation and pitch
Currently vLLM supports dynamically loading and unloading LoRA adapters at runtime through dedicated API endpoints. The supported behaviour is described in the documentation: https://docs.vllm.ai/en/stable/features/lora.html#dynamically-serving-lora-adapters.
For the base model you can load the weights in safetensors format using the run:ai model streamer: https://docs.vllm.ai/en/v0.7.1/models/extensions/runai_model_streamer.html
If you try to load LoRAs (adapter_model.safetensors/adapter_config.json) from s3 you get a validation error shown below:
ERROR 03-27 15:27:33 [utils.py:229] Error downloading the HuggingFace model
ERROR 03-27 15:27:33 [utils.py:229] Traceback (most recent call last):
ERROR 03-27 15:27:33 [utils.py:229] File "/opt/venv/lib/python3.12/site-packages/vllm/lora/utils.py", line 223, in get_adapter_absolute_path
ERROR 03-27 15:27:33 [utils.py:229] local_snapshot_path = huggingface_hub.snapshot_download(
ERROR 03-27 15:27:33 [utils.py:229] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-27 15:27:33 [utils.py:229] File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
ERROR 03-27 15:27:33 [utils.py:229] validate_repo_id(arg_value)
ERROR 03-27 15:27:33 [utils.py:229] File "/opt/venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
ERROR 03-27 15:27:33 [utils.py:229] raise HFValidationError(
ERROR 03-27 15:27:33 [utils.py:229] huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://lora-adapter/llama-3.1-8b-abliterated-lora/'. Use `repo_type` argument if needed.
This behaviour comes from def get_adapter_abolute_path
(https://github.com/vllm-project/vllm/blob/main/vllm/lora/utils.py#L192). But looking at where the tensor is actually loaded I dont see why you cant just stream the weights from s3 using RunAI streamer (https://github.com/vllm-project/vllm/blob/main/vllm/lora/models.py#L187).
My feature request is that we start supporting loading LoRA weights from s3 using RunAI model streamer.
Alternatives
You can download the LoRA weights manually and then reference the model using an local path. But this adds complexity to the dynamic serving setup.
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.