You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: program-data-separation/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This directory provides an example of the Program Data Separation APIs in ExecuTorch.
4
4
1. Program data separation examples using a linear model with the portable operators and XNNPACK.
5
-
2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
5
+
2. LoRA inference example with multiple LoRA models sharing a single foundation weight file.
6
6
7
7
## Program Data Separation
8
8
@@ -16,7 +16,7 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
16
16
For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
17
17
18
18
## Linear example
19
-
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
19
+
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](cpp/linear_example/README.md). This example generates and runs a program-data separated linear model, with program in a pte file and weights and bias in a separate .ptd file.
20
20
21
21
## LoRA example
22
22
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
@@ -27,4 +27,4 @@ To enable LoRA, we generate:
27
27
28
28
Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.
29
29
30
-
Please take a look at [program-data-separation/cpp/lora_example](lora_example/) for a demo of the program-data separation APIs with LoRA. This example generates and runs a LoRA and a non-LoRA model that share foundation weights. At runtime, we see that memory usage does not double.
30
+
Please take a look at [program-data-separation/cpp/lora_example](cpp/lora_example/README.md) for a demo of the program-data separation APIs with LoRA. This example shows how to generate and run multiple LoRA adapter PTEs with a shared foundation weight file.
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:00 nobel_lora.pte # Nobel prize winners lora model.
129
125
```
130
126
131
127
Notice the adapter PTE files are about the same size as the `adapter_model.safetensors`/`adapter_model.pt` files generated during training. The PTE contains the adapter weights (which are not shared) and the program.
@@ -167,15 +163,15 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
167
163
DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
168
164
./build/bin/executorch_program_data_separation \
169
165
--tokenizer_path="${DOWNLOADED_PATH}" \
170
-
--model1="et.pte" \
171
-
--model2="nobel.pte" \
166
+
--model1="executorch_lora.pte" \
167
+
--model2="nobel_lora.pte" \
172
168
--weights="foundation.ptd" \
173
169
--prompt="Who were the winners of the Nobel Prize in Physics in 2025?" \
174
170
--apply_chat_template
175
171
```
176
172
Passing in the `DOWNLOADED_PATH` as the tokenizer directory will invoke the HFTokenizer, and parse additional tokenizers files: `tokenizer_config.json` and `special_tokens_map.json`. `special_tokens_map.json` tells us which bos/eos token to use, especially if there are multiple.
177
173
178
-
`apply_chat_template` formats the prompt according to the LLAMA chat template, which is what the adapter was trained on.
174
+
`apply_chat_template` formats the prompt according to the LLAMA chat template.
179
175
180
176
Sample output:
181
177
```
@@ -202,8 +198,8 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
202
198
DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
203
199
./build/bin/executorch_program_data_separation \
204
200
--tokenizer_path="${DOWNLOADED_PATH}" \
205
-
--model1="et.pte" \
206
-
--model2="nobel.pte" \
201
+
--model1="executorch_lora.pte" \
202
+
--model2="nobel_lora.pte" \
207
203
--weights="foundation.ptd" \
208
204
--prompt="Help me get started with ExecuTorch in 3 steps" \
0 commit comments