You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/stable/features/peft_lora_serving.md
+19-8Lines changed: 19 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -3,23 +3,36 @@ sidebar_position: 2
3
3
---
4
4
# PEFT LoRA Serving
5
5
6
+
This example illustrates the process of deploying and serving a base large language model enhanced with LoRA (Low-Rank Adaptation) adapters in a ServerlessLLM cluster. It demonstrates how to start the cluster, deploy a base model with multiple LoRA adapters, perform inference using different adapters, and update or remove the adapters dynamically.
7
+
6
8
## Pre-requisites
7
9
8
10
To run this example, we will use Docker Compose to set up a ServerlessLLM cluster. Before proceeding, please ensure you have read the [Quickstart Guide](../getting_started.md).
9
11
10
-
We will use the following example base model & LoRA adapter
12
+
We will use the following example base model & LoRA adapters
If you wish to switch to a different set of LoRA adapters, you can still use `sllm-cli deploy` command with updated adapter configurations. ServerlessLLM will automatically reload the new adapters without restarting the backend.
90
-
91
-
For example, to update the adapter (located at `ft_facebook/opt-125m_adapter1`) used by facebook/opt-125m:
0 commit comments