Skip to content

Commit dda726c

Browse files
authored
[llama examples] Update README.md to refer directly consolidated.00.pth, instead of standard single file checkpoints (checkpoint.pth)
1 parent 8e2737c commit dda726c

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

examples/models/llama/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
164164
```
165165
# No quantization
166166
# Set these paths to point to the downloaded files
167-
LLAMA_CHECKPOINT=path/to/checkpoint.pth
167+
LLAMA_CHECKPOINT=path/to/consolidated.00.pth
168168
LLAMA_PARAMS=path/to/params.json
169169
170170
python -m examples.models.llama.export_llama \
@@ -186,7 +186,7 @@ For convenience, an [exported ExecuTorch bf16 model](https://huggingface.co/exec
186186
```
187187
# SpinQuant
188188
# Set these paths to point to the exported files
189-
LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/checkpoint.pth
189+
LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/consolidated.00.pth.pth
190190
LLAMA_PARAMS=path/to/spinquant/params.json
191191
192192
python -m examples.models.llama.export_llama \
@@ -215,7 +215,7 @@ For convenience, an [exported ExecuTorch SpinQuant model](https://huggingface.co
215215
```
216216
# QAT+LoRA
217217
# Set these paths to point to the exported files
218-
LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/checkpoint.pth
218+
LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/consolidated.00.pth.pth
219219
LLAMA_PARAMS=path/to/qlora/params.json
220220
221221
python -m examples.models.llama.export_llama \
@@ -248,7 +248,7 @@ You can export and run the original Llama 3 8B instruct model.
248248
2. Export model and generate `.pte` file
249249
```
250250
python -m examples.models.llama.export_llama \
251-
--checkpoint <consolidated.00.pth> \
251+
--checkpoint <consolidated.00.pth.pth> \
252252
-p <params.json> \
253253
-kv \
254254
--use_sdpa_with_kv_cache \
@@ -396,7 +396,7 @@ First export your model for lowbit quantization (step 2 above):
396396
397397
```
398398
# Set these paths to point to the downloaded files
399-
LLAMA_CHECKPOINT=path/to/checkpoint.pth
399+
LLAMA_CHECKPOINT=path/to/consolidated.00.pth.pth
400400
LLAMA_PARAMS=path/to/params.json
401401

402402
# Set low-bit quantization parameters
@@ -476,7 +476,7 @@ We use [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness) to evaluat
476476
For base models, use the following example command to calculate its perplexity based on WikiText.
477477
```
478478
python -m examples.models.llama.eval_llama \
479-
-c <checkpoint.pth> \
479+
-c <consolidated.00.pth.pth> \
480480
-p <params.json> \
481481
-t <tokenizer.model/bin> \
482482
-kv \
@@ -489,7 +489,7 @@ python -m examples.models.llama.eval_llama \
489489
For instruct models, use the following example command to calculate its MMLU score.
490490
```
491491
python -m examples.models.llama.eval_llama \
492-
-c <checkpoint.pth> \
492+
-c <consolidated.00.pth.pth> \
493493
-p <params.json> \
494494
-t <tokenizer.model/bin> \
495495
-kv \

0 commit comments

Comments
 (0)