@@ -164,7 +164,7 @@ Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
164
164
```
165
165
# No quantization
166
166
# Set these paths to point to the downloaded files
167
- LLAMA_CHECKPOINT=path/to/checkpoint .pth
167
+ LLAMA_CHECKPOINT=path/to/consolidated.00 .pth
168
168
LLAMA_PARAMS=path/to/params.json
169
169
170
170
python -m examples.models.llama.export_llama \
@@ -186,7 +186,7 @@ For convenience, an [exported ExecuTorch bf16 model](https://huggingface.co/exec
186
186
```
187
187
# SpinQuant
188
188
# Set these paths to point to the exported files
189
- LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/checkpoint .pth
189
+ LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/consolidated.00.pth .pth
190
190
LLAMA_PARAMS=path/to/spinquant/params.json
191
191
192
192
python -m examples.models.llama.export_llama \
@@ -215,7 +215,7 @@ For convenience, an [exported ExecuTorch SpinQuant model](https://huggingface.co
215
215
```
216
216
# QAT+LoRA
217
217
# Set these paths to point to the exported files
218
- LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/checkpoint .pth
218
+ LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/consolidated.00.pth .pth
219
219
LLAMA_PARAMS=path/to/qlora/params.json
220
220
221
221
python -m examples.models.llama.export_llama \
@@ -248,7 +248,7 @@ You can export and run the original Llama 3 8B instruct model.
248
248
2 . Export model and generate ` .pte ` file
249
249
```
250
250
python -m examples.models.llama.export_llama \
251
- --checkpoint <consolidated.00.pth> \
251
+ --checkpoint <consolidated.00.pth.pth > \
252
252
-p <params.json> \
253
253
-kv \
254
254
--use_sdpa_with_kv_cache \
@@ -396,7 +396,7 @@ First export your model for lowbit quantization (step 2 above):
396
396
397
397
```
398
398
# Set these paths to point to the downloaded files
399
- LLAMA_CHECKPOINT=path/to/checkpoint .pth
399
+ LLAMA_CHECKPOINT=path/to/consolidated.00.pth .pth
400
400
LLAMA_PARAMS=path/to/params.json
401
401
402
402
# Set low-bit quantization parameters
@@ -476,7 +476,7 @@ We use [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness) to evaluat
476
476
For base models, use the following example command to calculate its perplexity based on WikiText.
477
477
```
478
478
python -m examples.models.llama.eval_llama \
479
- -c <checkpoint .pth> \
479
+ -c <consolidated.00.pth .pth> \
480
480
-p <params.json> \
481
481
-t <tokenizer.model/bin> \
482
482
-kv \
@@ -489,7 +489,7 @@ python -m examples.models.llama.eval_llama \
489
489
For instruct models, use the following example command to calculate its MMLU score.
490
490
```
491
491
python -m examples.models.llama.eval_llama \
492
- -c <checkpoint .pth> \
492
+ -c <consolidated.00.pth .pth> \
493
493
-p <params.json> \
494
494
-t <tokenizer.model/bin> \
495
495
-kv \
0 commit comments