Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions program-data-separation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This directory provides an example of the Program Data Separation APIs in ExecuTorch.
1. Program data separation examples using a linear model with the portable operators and XNNPACK.
2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
2. LoRA inference example with multiple LoRA models sharing a single foundation weight file.

## Program Data Separation

Expand All @@ -16,7 +16,7 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.

## Linear example
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](cpp/linear_example/README.md). This example generates and runs a program-data separated linear model, with program in a pte file and weights and bias in a separate .ptd file.

## LoRA example
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
Expand All @@ -27,4 +27,4 @@ To enable LoRA, we generate:

Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.

Please take a look at [program-data-separation/cpp/lora_example](lora_example/) for a demo of the program-data separation APIs with LoRA. This example generates and runs a LoRA and a non-LoRA model that share foundation weights. At runtime, we see that memory usage does not double.
Please take a look at [program-data-separation/cpp/lora_example](cpp/lora_example/README.md) for a demo of the program-data separation APIs with LoRA. This example shows how to generate and run multiple LoRA adapter PTEs with a shared foundation weight file.
13 changes: 6 additions & 7 deletions program-data-separation/cpp/linear_example/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# ExecuTorch Program Data Separation Demo C++.

This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).

This directory contains the C++ code to demo program-data separation on a linear model.

## Virtual environment setup.
Create and activate a Python virtual environment:
Expand All @@ -10,12 +9,12 @@ python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
```
Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
```bash
conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
conda create -yn executorch python=3.10.0 && conda activate executorch
```

Install dependencies:
```bash
pip install executorch==0.7.0
pip install executorch==1.0.0
```

## Export the model/s.
Expand All @@ -37,7 +36,7 @@ Note:
- PTE: contains the program execution logic.
- PTD: contains the constant tensors used by the PTE.

See [program-data-separation](../../program-data-separation/README.md) for instructions.
See [program-data-separation](../../README.md) for instructions.

## Install runtime dependencies.
The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it:
Expand All @@ -53,15 +52,15 @@ cd ~/executorch-examples/program-data-separation/cpp/executorch
pip install -r requirements-dev.txt
```

## Build the runtime.
## Build and run
Build the executable:
```bash
cd ~/executorch-examples/program-data-separation/cpp/linear_example
chmod +x build_example.sh
./build_example.sh
```

## Run the executable.
Run the executable.
```
./build/bin/executorch_program_data_separation --model-path ../../models/linear.pte --data-path ../../models/linear.ptd

Expand Down
32 changes: 14 additions & 18 deletions program-data-separation/cpp/lora_example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ Note:
- There are many ways to fine-tune LoRA adapters. We will go through a few examples to create a demo.

## Table of Contents
- [Size Savings](#size-savings)
- [Fine-tuning](#finetune-from-scratch-with-unsloth-and-llama)
- [Installation](#install-executorch)
- [Export models](#export-models)
- [Run models](#install-runtime-dependencies)
- [Size savings](#size-savings)
- [Finetune lora adapters from scratch with unsloth and Llama](#finetune-from-scratch-with-unsloth-and-llama)
- [Install executorch](#install-executorch)
- [Export lora models](#export-models)
- [Run lora models](#install-runtime-dependencies)
- [Demo video](#demo-video)

## Size savings
Expand Down Expand Up @@ -118,14 +118,10 @@ You can also run `~/executorch-examples/program-data-separation/export_lora.sh`.

Example files, trained on executorch/docs/source/ and recent Nobel prize winners.
```bash
# executorch docs trained adapter model.
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:05 et.pte
# foundation weight file
-rw-r--r-- 1 lfq users 5994013600 Oct 17 18:05 foundation.ptd
# dummy lora model.
-rw-r--r-- 1 lfq users 27628928 Oct 17 14:31 llama_3_2_1B_lora.pte
# Nobel prize winners trained adapter model.
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:00 nobel.pte
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:05 executorch_lora.pte # executorch docs lora model.
-rw-r--r-- 1 lfq users 5994013600 Oct 17 18:05 foundation.ptd # foundation weight file
-rw-r--r-- 1 lfq users 27628928 Oct 17 14:31 llama_3_2_1B_lora.pte # dummy lora model.
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:00 nobel_lora.pte # Nobel prize winners lora model.
```

Notice the adapter PTE files are about the same size as the `adapter_model.safetensors`/`adapter_model.pt` files generated during training. The PTE contains the adapter weights (which are not shared) and the program.
Expand Down Expand Up @@ -167,15 +163,15 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
./build/bin/executorch_program_data_separation \
--tokenizer_path="${DOWNLOADED_PATH}" \
--model1="et.pte" \
--model2="nobel.pte" \
--model1="executorch_lora.pte" \
--model2="nobel_lora.pte" \
--weights="foundation.ptd" \
--prompt="Who were the winners of the Nobel Prize in Physics in 2025?" \
--apply_chat_template
```
Passing in the `DOWNLOADED_PATH` as the tokenizer directory will invoke the HFTokenizer, and parse additional tokenizers files: `tokenizer_config.json` and `special_tokens_map.json`. `special_tokens_map.json` tells us which bos/eos token to use, especially if there are multiple.

`apply_chat_template` formats the prompt according to the LLAMA chat template, which is what the adapter was trained on.
`apply_chat_template` formats the prompt according to the LLAMA chat template.

Sample output:
```
Expand All @@ -202,8 +198,8 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
./build/bin/executorch_program_data_separation \
--tokenizer_path="${DOWNLOADED_PATH}" \
--model1="et.pte" \
--model2="nobel.pte" \
--model1="executorch_lora.pte" \
--model2="nobel_lora.pte" \
--weights="foundation.ptd" \
--prompt="Help me get started with ExecuTorch in 3 steps" \
--apply_chat_template
Expand Down