Skip to content

Multi-LoRA - Support for providing /load and /unload API #3308

Closed
@gauravkr2108

Description

@gauravkr2108

Problem statement:

In the production system, there should be an API to add\\remove fine-tuned weights dynamically. Inference caller should not have to specify LoRA location with each call.

Current Multi-LoRA support allows adaptor load during inference calls, which doesn't check if finetune weights are already loaded and ready for inferencing.

Proposal:

Introduce an API - /load and /unload to allow fine-tuned weights inclusions in vllm.

POST /load -> add finetunes weight as part of models.
POST /unload -> remove finetunes weight from models list.

This will allow the set of finetuned weights present in vllm server.

This will infer no need to specify finetune weight names, and locations as part of each inference request.

Sample code:

lora_request = None
index = 1
 
 
@app.post("/load")
async def load(request: Request) -> Response:
    request_dict = await request.json()
    global lora_request
 
    lora_local_path = request_dict.pop("lora_path", "/models/lora/")
    global index
    lora_request = LoRARequest(
        lora_name=lora_local_path,
        lora_int_id=index,
        lora_local_path=lora_local_path)
 
    index = index + 1
    return Response(status_code=201)
 
@app.post("/unload")
async def unload(request: Request) -> Response:
    """
    Unload API
    :param request:
    :return:
    """
    global lora_request
    lora_request = None
 
    global index
    if not index <= 1:
        index = index - 1
 
    return Response(status_code=201)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions