Skip to content

Commit 3c8d290

Browse files
Sanggyu Leeglistening
authored andcommitted
[ggma] Add gyu (ggma yielding utility) tool
Implement gyu CLI tool to automate GGMA model package creation: - Merge prefill.py and decode.py into unified export.py - Create modular gpm tool structure: - gyu/init.py: Setup venv, install deps (CPU-only torch), clone TICO, extract o2o tools - gyu/import.py: Download complete model from HuggingFace - gyu/export.py: Run conversion pipeline and create .ggma package - gyu/common.py: Shared utilities and constants - gyu/clean.py: Remove building directory - gyu/gyu: Bash wrapper to dispatch commands Documentation: - Rename README.md → DEVELOPER.md (technical guide) - Add USER.md (user-facing guide)
1 parent fd78f13 commit 3c8d290

File tree

17 files changed

+764
-269
lines changed

17 files changed

+764
-269
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# TinyLlama Text Generation Developer Guide
2+
3+
This document provides a detailed technical guide for generating, processing, and optimizing the TinyLlama text-generation model. For basic usage, see [USER.md](USER.md).
4+
5+
## Summary
6+
7+
1. Set up the environment and install dependencies.
8+
2. Generate the initial `prefill` and `decode` Circle model files.
9+
3. Run the pipeline to optimize, reshape, and prune the model, producing a final `decode.circle` ready for inference.
10+
11+
## Prerequisites
12+
13+
### 1. Python virtual environment
14+
```bash
15+
$ cd runtime/ggma/examples/generate_text/
16+
$ python3 -m venv _
17+
$ source _/bin/activate
18+
```
19+
20+
### 2. Prepare [gyu](tools/gyu/README.md) and o2o tools
21+
Install dependencies and setup `o2o` tools (similar to what `tools/gyu/init.py` does).
22+
23+
> **Note**: We install the CPU version of `torch` first because `gyu` depends on `TICO`, which by default pulls in the large NVIDIA version of `torch`. Installing the CPU version beforehand prevents this.
24+
25+
```bash
26+
# 1. Install torch (CPU) and gyu requirements
27+
$ pip install torch --index-url https://download.pytorch.org/whl/cpu
28+
$ pip install -r tools/gyu/requirements.txt
29+
30+
# 2. Fetch o2o tools from PR #16233
31+
$ git fetch origin pull/16233/head:pr-16233
32+
$ git checkout pr-16233 -- tools/o2o
33+
$ chmod +x tools/o2o/*.py
34+
35+
# 3. Add tools to PATH
36+
$ export PATH=$PWD/tools/o2o:$PWD/tools/gyu:$PATH
37+
```
38+
39+
40+
41+
## Generating Model Files
42+
43+
### 1. Install model dependencies
44+
```bash
45+
$ pip install -r tinyllama/tinyllama.requirements
46+
```
47+
48+
### 2. Create the prefill and decode Circle model files
49+
```bash
50+
$ python tinyllama/tinyllama.py --mode prefill # Generates prefill.circle
51+
$ python tinyllama/tinyllama.py --mode decode # Generates decode_.circle
52+
```
53+
54+
Verify the generated files:
55+
```bash
56+
$ ls -lh *.circle
57+
-rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 decode_.circle
58+
-rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 prefill.circle
59+
```
60+
61+
### 3. Update `tinyllama.decode.circle`
62+
Fuse attention and normalize KV-cache inputs for the decode model.
63+
64+
```bash
65+
$ fuse.attention.py < decode_.circle \
66+
| reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] \
67+
| transpose.io.kvcache.py > decode.circle
68+
```
69+
70+
### 4. Merge prefill and decode circles
71+
Merge the models, retype input IDs, and clean up.
72+
73+
```bash
74+
$ merge.circles.py prefill.circle decode.circle \
75+
| fuse.bmm_lhs_const.py \
76+
| downcast.input_ids.py \
77+
| gc.py > model.circle
78+
```
79+
80+
Verify final model files:
81+
```bash
82+
$ ls -l {decode,prefill,model}.circle
83+
-rw-rw-r-- 1 gyu gyu 18594868 Nov 22 17:26 decode.circle
84+
-rw-rw-r-- 1 gyu gyu 18642052 Nov 22 07:53 prefill.circle
85+
-rw-rw-r-- 1 gyu gyu 18629520 Nov 22 17:28 model.circle
86+
```
87+
88+
## Create a GGMA package
89+
90+
1. Create the package root directory and move `model.circle` there:
91+
```bash
92+
$ cd runtime/ggma/examples/generate_text
93+
$ mkdir tinyllama
94+
$ mv model.circle tinyllama/
95+
```
96+
97+
2. Copy the tokenizer files (replace `{your_snapshot}` with the actual snapshot hash):
98+
```bash
99+
$ cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/tokenizer.* tinyllama/
100+
$ cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/config.json tinyllama/
101+
```
102+
103+
```bash
104+
$ tree tinyllama/
105+
tinyllama/
106+
├── model.circle
107+
├── tokenizer.json
108+
└── tokenizer.model
109+
```
110+
111+
## Build and run `ggma_run`
112+
113+
```bash
114+
$ make -j$(nproc)
115+
$ make install
116+
```
117+
118+
Check version:
119+
```bash
120+
$ Product/out/bin/ggma_run --version
121+
ggma_run v0.1.0 (nnfw runtime: v1.31.0)
122+
```
123+
124+
Run the model:
125+
```bash
126+
$ Product/out/bin/ggma_run tinyllama
127+
prompt: Lily picked up a flower.
128+
generated: { 1100, 7899, 289, 826, 351, 600, 2439, 288, 266, 3653, 31843, 1100, 7899, 289, 1261, 291, 5869, 291, 1261, 31843, 1100, 7899 }
129+
detokenized: She liked to play with her friends in the park. She liked to run and jump and run. She liked
130+
```

runtime/ggma/examples/generate_text/README.md

Lines changed: 0 additions & 125 deletions
This file was deleted.
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Text Generation User Guide
2+
3+
This guide shows how to create a GGMA package for text generation models using the `gyu` (GGMA Yielding Utility) tool.
4+
5+
We use TinyLlama as an example throughout this guide.
6+
7+
## Creating a GGMA package
8+
9+
NOTE: Start from the ONE repository root directory.
10+
11+
### 1. Initialize environment (one-time setup)
12+
13+
Add [gyu](../../../../tools/gyu/README.md) to PATH:
14+
```bash
15+
$ export PATH=$PWD/tools/gyu:$PATH
16+
```
17+
18+
Then, change directory to tinyllama example directory and run gyu init:
19+
```bash
20+
$ cd runtime/ggma/examples/generate_text/tinyllama
21+
$ gyu init
22+
```
23+
24+
Python environment and o2o tools are prepared:
25+
```bash
26+
$ ls -ld o2o venv
27+
drwxrwxr-x 2 gyu gyu 4096 Nov 24 09:44 o2o
28+
drwxrwxr-x 6 gyu gyu 4096 Nov 24 09:42 venv
29+
```
30+
31+
> **Note**: The `o2o` directory will be removed once [#13689](https://github.com/Samsung/ONE/pull/13689) is merged.
32+
33+
### 2. Import model from HuggingFace
34+
35+
```bash
36+
$ gyu import Maykeye/TinyLLama-v0
37+
```
38+
39+
The HuggingFace model is downloaded to `build/tinyllama-v0/`:
40+
```
41+
$ tree build
42+
build
43+
└── tinyllama-v0
44+
├── backup
45+
├── config.json
46+
├── demo.py
47+
├── generation_config.json
48+
├── model.onnx
49+
├── model.safetensors
50+
├── pytorch_model.bin
51+
├── README.md
52+
├── special_tokens_map.json
53+
├── tokenizer_config.json
54+
├── tokenizer.json
55+
├── tokenizer.model
56+
├── train.ipynb
57+
└── valid.py
58+
```
59+
60+
### 3. Export to GGMA package
61+
62+
```bash
63+
$ gyu export -s tinyllama.py
64+
```
65+
66+
The GGMA package is generated in `build/out/`:
67+
```
68+
$ tree build/out
69+
build/out/
70+
├── config.json
71+
├── model.circle
72+
├── tokenizer.json
73+
└── tokenizer.model
74+
```
75+
76+
## Building GGMA and Running a GGMA package
77+
78+
NOTE: Start from the ONE repository root directory.
79+
80+
### Build
81+
82+
```bash
83+
$ make -j$(nproc)
84+
$ make install
85+
```
86+
87+
For detailed build instructions, see the [ONE Runtime Build Guide](https://github.com/Samsung/ONE/blob/master/docs/runtime/README.md).
88+
89+
Confirm that `ggma_run` is built and show its version:
90+
```bash
91+
$ Product/out/bin/ggma_run --version
92+
ggma_run v0.1.0 (nnfw runtime: v1.31.0)
93+
```
94+
95+
### Run
96+
97+
Execute the GGMA package (default prompt) to see a sample output:
98+
```bash
99+
$ Product/out/bin/ggma_run build/out
100+
prompt: Lily picked up a flower.
101+
generated: { 1100, 7899, 289, 826, 351, 600, 2439, 288, 266, 3653, 31843, 1100, 7899, 289, 1261, 291, 5869, 291, 1261, 31843, 1100, 7899 }
102+
detokenized: She liked to play with her friends in the park. She liked to run and jump and run. She liked
103+
```
104+
105+
For detailed run instructions, see the [ggma_run guide](https://github.com/Samsung/ONE/blob/master/runtime/tests/tools/ggma_run/README.md).
106+
107+
108+
For developers who want to understand what happens under the hood, see [DEVELOPER.md](DEVELOPER.md).

0 commit comments

Comments
 (0)