Multi-LoRA - Support for providing /load and /unload API

### Problem statement:

In the production system, there should be an API to add\\\\remove fine-tuned weights dynamically. Inference caller should not have to specify LoRA location with each call.

Current Multi-LoRA support allows adaptor load during inference calls, which doesn't check if finetune weights are already loaded and ready for inferencing.

### Proposal:

Introduce an API - /load and /unload to allow fine-tuned weights inclusions in vllm.

`POST /load` -> add finetunes weight as part of models.
`POST /unload` -> remove finetunes weight from models list.

This will allow the set of finetuned weights present in vllm server.

This will infer no need to specify finetune weight names, and locations as part of each inference request.

Sample code:

```python
lora_request = None
index = 1
 
 
@app.post("/load")
async def load(request: Request) -> Response:
    request_dict = await request.json()
    global lora_request
 
    lora_local_path = request_dict.pop("lora_path", "/models/lora/")
    global index
    lora_request = LoRARequest(
        lora_name=lora_local_path,
        lora_int_id=index,
        lora_local_path=lora_local_path)
 
    index = index + 1
    return Response(status_code=201)
 
@app.post("/unload")
async def unload(request: Request) -> Response:
    """
    Unload API
    :param request:
    :return:
    """
    global lora_request
    lora_request = None
 
    global index
    if not index <= 1:
        index = index - 1
 
    return Response(status_code=201)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multi-LoRA - Support for providing /load and /unload API #3308

Problem statement:

Proposal:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multi-LoRA - Support for providing /load and /unload API #3308

Description

Problem statement:

Proposal:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions