Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : add lora hotswap endpoint #8857

Merged
merged 9 commits into from
Aug 6, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 4, 2024

TODO:

  • Update docs
  • Add tests

New argument: --lora-init-without-apply

If --lora-init-without-apply is specified, lora adapter will be loaded but not being apply with llama_init_from_gpt_params.

User can apply it later with the POST /lora-adapters endpoint below

New endpoints

GET /lora-adapters

Get list of all adapters. If an adapter is disabled, the scale will be set to 0.

Response:

[
    {
        "id": 0,
        "path": "my_adapter_1.gguf",
        "scale": 0.0
    },
    {
        "id": 1,
        "path": "my_adapter_2.gguf",
        "scale": 0.0
    }
]

POST /lora-adapters

Set list of adapters. To disable an adapter, either remove it from the list below, or set scale to 0.

Request:

[
  {"id": 0, "scale": 0.2},
  {"id": 1, "scale": 0.8}
]

Response:

{ "success": true }

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 4, 2024

self note: maybe wait for changes from #8823 and add the list of loaded lora to struct

@Green-Sky
Copy link
Collaborator

--lora-no-apply sounds kind of contrived, maybe --lora-available or similar is better.

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 4, 2024

I don't get what you mean. The option means "load the adapter to memory, but do not apply it right away"

probably something like --lora-apply-later or --lora-init-without-apply is more stupidly simple to understand?

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 5, 2024
@github-actions github-actions bot added the python python script changes label Aug 6, 2024
@ngxson ngxson marked this pull request as ready for review August 6, 2024 11:45
@ngxson ngxson requested a review from ggerganov August 6, 2024 11:45
@ngxson
Copy link
Collaborator Author

ngxson commented Aug 6, 2024

@ggerganov I added test and docs to this PR, plus adapt to change from #8823

Could you re-review this? Thank you.

@ngxson ngxson merged commit 1e6f655 into ggerganov:master Aug 6, 2024
54 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 7, 2024
* server : add lora hotswap endpoint

* handle lora_no_apply

* fix build

* updae docs

* clean up struct def

* fix build

* add LoRA test

* fix style
@ltoniazzi ltoniazzi mentioned this pull request Aug 17, 2024
7 tasks
@ngxson ngxson changed the title server : add lora hotswap endpoint (WIP) server : add lora hotswap endpoint Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants