-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : add lora hotswap endpoint #8857
Conversation
self note: maybe wait for changes from #8823 and add the list of loaded lora to struct |
|
I don't get what you mean. The option means "load the adapter to memory, but do not apply it right away" probably something like |
@ggerganov I added test and docs to this PR, plus adapt to change from #8823 Could you re-review this? Thank you. |
* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style
TODO:
New argument: --lora-init-without-apply
If
--lora-init-without-apply
is specified, lora adapter will be loaded but not being apply withllama_init_from_gpt_params
.User can apply it later with the
POST /lora-adapters
endpoint belowNew endpoints
GET
/lora-adapters
Get list of all adapters. If an adapter is disabled, the scale will be set to 0.
Response:
POST
/lora-adapters
Set list of adapters. To disable an adapter, either remove it from the list below, or set scale to 0.
Request:
Response: