update readmes + links (#103)

lucylq · web-flow · commit 9ad6e1402be3 · 2025-10-21T10:51:12.000-07:00
diff --git a/program-data-separation/README.md b/program-data-separation/README.md
@@ -2,7 +2,7 @@
 
 This directory provides an example of the Program Data Separation APIs in ExecuTorch.
 1. Program data separation examples using a linear model with the portable operators and XNNPACK.
-2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
+2. LoRA inference example with multiple LoRA models sharing a single foundation weight file.
 
 ## Program Data Separation
 
@@ -16,7 +16,7 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
 For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
 
 ## Linear example
-For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
+For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](cpp/linear_example/README.md). This example generates and runs a program-data separated linear model, with program in a pte file and weights and bias in a separate .ptd file.
 
 ## LoRA example
 A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
@@ -27,4 +27,4 @@ To enable LoRA, we generate:
 
 Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.
 
-Please take a look at [program-data-separation/cpp/lora_example](lora_example/) for a demo of the program-data separation APIs with LoRA. This example generates and runs a LoRA and a non-LoRA model that share foundation weights. At runtime, we see that memory usage does not double.
+Please take a look at [program-data-separation/cpp/lora_example](cpp/lora_example/README.md) for a demo of the program-data separation APIs with LoRA. This example shows how to generate and run multiple LoRA adapter PTEs with a shared foundation weight file.
diff --git a/program-data-separation/cpp/linear_example/README.md b/program-data-separation/cpp/linear_example/README.md
@@ -1,7 +1,6 @@
 # ExecuTorch Program Data Separation Demo C++.
 
-This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).
-
+This directory contains the C++ code to demo program-data separation on a linear model.
 
 ## Virtual environment setup.
 Create and activate a Python virtual environment:
@@ -10,12 +9,12 @@ python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
 ```
 Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
 ```bash
-conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
+conda create -yn executorch python=3.10.0 && conda activate executorch
 ```
 
 Install dependencies:
 ```bash
-pip install executorch==0.7.0
+pip install executorch==1.0.0
 ```
 
 ## Export the model/s.
@@ -37,7 +36,7 @@ Note:
 - PTE: contains the program execution logic.
 - PTD: contains the constant tensors used by the PTE.
 
-See [program-data-separation](../../program-data-separation/README.md) for instructions.
+See [program-data-separation](../../README.md) for instructions.
 
 ## Install runtime dependencies.
 The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`.  To initialize it:
@@ -53,15 +52,15 @@ cd ~/executorch-examples/program-data-separation/cpp/executorch
 pip install -r requirements-dev.txt
 ```
 
-## Build the runtime.
+## Build and run
 Build the executable:
 ```bash
 cd ~/executorch-examples/program-data-separation/cpp/linear_example
 chmod +x build_example.sh
 ./build_example.sh
 ```
 
-## Run the executable.
+Run the executable.
 ```
 ./build/bin/executorch_program_data_separation --model-path ../../models/linear.pte --data-path ../../models/linear.ptd
 
diff --git a/program-data-separation/cpp/lora_example/README.md b/program-data-separation/cpp/lora_example/README.md
@@ -12,11 +12,11 @@ Note:
 - There are many ways to fine-tune LoRA adapters. We will go through a few examples to create a demo.
 
 ## Table of Contents
-- [Size Savings](#size-savings)
-- [Fine-tuning](#finetune-from-scratch-with-unsloth-and-llama)
-- [Installation](#install-executorch)
-- [Export models](#export-models)
-- [Run models](#install-runtime-dependencies)
+- [Size savings](#size-savings)
+- [Finetune lora adapters from scratch with unsloth and Llama](#finetune-from-scratch-with-unsloth-and-llama)
+- [Install executorch](#install-executorch)
+- [Export lora models](#export-models)
+- [Run lora models](#install-runtime-dependencies)
 - [Demo video](#demo-video)
 
 ## Size savings
@@ -118,14 +118,10 @@ You can also run `~/executorch-examples/program-data-separation/export_lora.sh`.
 
 Example files, trained on executorch/docs/source/ and recent Nobel prize winners.
 ```bash
-# executorch docs trained adapter model.
--rw-r--r-- 1 lfq users   45555712 Oct 17 18:05 et.pte
-# foundation weight file
--rw-r--r-- 1 lfq users 5994013600 Oct 17 18:05 foundation.ptd
-# dummy lora model.
--rw-r--r-- 1 lfq users   27628928 Oct 17 14:31 llama_3_2_1B_lora.pte
-# Nobel prize winners trained adapter model.
--rw-r--r-- 1 lfq users   45555712 Oct 17 18:00 nobel.pte
+-rw-r--r-- 1 lfq users   45555712 Oct 17 18:05 executorch_lora.pte # executorch docs lora model.
+-rw-r--r-- 1 lfq users 5994013600 Oct 17 18:05 foundation.ptd # foundation weight file
+-rw-r--r-- 1 lfq users   27628928 Oct 17 14:31 llama_3_2_1B_lora.pte # dummy lora model.
+-rw-r--r-- 1 lfq users   45555712 Oct 17 18:00 nobel_lora.pte # Nobel prize winners lora model.
 ```
 
 Notice the adapter PTE files are about the same size as the `adapter_model.safetensors`/`adapter_model.pt` files generated during training. The PTE contains the adapter weights (which are not shared) and the program.
@@ -167,15 +163,15 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
 DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
 ./build/bin/executorch_program_data_separation \
     --tokenizer_path="${DOWNLOADED_PATH}" \
-    --model1="et.pte" \
-    --model2="nobel.pte"  \
+    --model1="executorch_lora.pte" \
+    --model2="nobel_lora.pte"  \
     --weights="foundation.ptd" \
     --prompt="Who were the winners of the Nobel Prize in Physics in 2025?" \
     --apply_chat_template
 ```
 Passing in the `DOWNLOADED_PATH` as the tokenizer directory will invoke the HFTokenizer, and parse additional tokenizers files: `tokenizer_config.json` and `special_tokens_map.json`. `special_tokens_map.json` tells us which bos/eos token to use, especially if there are multiple.
 
-`apply_chat_template` formats the prompt according to the LLAMA chat template, which is what the adapter was trained on.
+`apply_chat_template` formats the prompt according to the LLAMA chat template.
 
 Sample output:
 ```
@@ -202,8 +198,8 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
 DOWNLOADED_PATH=~/path/to/Llama-3.2-1B-Instruct/
 ./build/bin/executorch_program_data_separation \
     --tokenizer_path="${DOWNLOADED_PATH}" \
-    --model1="et.pte" \
-    --model2="nobel.pte"  \
+    --model1="executorch_lora.pte" \
+    --model2="nobel_lora.pte"  \
     --weights="foundation.ptd" \
     --prompt="Help me get started with ExecuTorch in 3 steps" \
     --apply_chat_template