[llama examples] Update README.md to refer directly consolidated.00.pth, instead of standard single file checkpoints (checkpoint.pth)

psiddh · web-flow · commit dda726c2525c · 2025-06-02T14:46:35.000-07:00
diff --git a/examples/models/llama/README.md b/examples/models/llama/README.md
@@ -164,7 +164,7 @@ Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
 ```
 # No quantization
 # Set these paths to point to the downloaded files
-LLAMA_CHECKPOINT=path/to/checkpoint.pth
+LLAMA_CHECKPOINT=path/to/consolidated.00.pth
 LLAMA_PARAMS=path/to/params.json
 
 python -m examples.models.llama.export_llama \
@@ -186,7 +186,7 @@ For convenience, an [exported ExecuTorch bf16 model](https://huggingface.co/exec
 ```
 # SpinQuant
 # Set these paths to point to the exported files
-LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/checkpoint.pth
+LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/consolidated.00.pth.pth
 LLAMA_PARAMS=path/to/spinquant/params.json
 
 python -m examples.models.llama.export_llama \
@@ -215,7 +215,7 @@ For convenience, an [exported ExecuTorch SpinQuant model](https://huggingface.co
 ```
 # QAT+LoRA
 # Set these paths to point to the exported files
-LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/checkpoint.pth
+LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/consolidated.00.pth.pth
 LLAMA_PARAMS=path/to/qlora/params.json
 
 python -m examples.models.llama.export_llama \
@@ -248,7 +248,7 @@ You can export and run the original Llama 3 8B instruct model.
 2. Export model and generate `.pte` file
     ```
     python -m examples.models.llama.export_llama \
-	    --checkpoint <consolidated.00.pth> \
+	    --checkpoint <consolidated.00.pth.pth> \
 		-p <params.json> \
 		-kv \
 		--use_sdpa_with_kv_cache \
@@ -396,7 +396,7 @@ First export your model for lowbit quantization (step 2 above):
 
 ```
 # Set these paths to point to the downloaded files
-LLAMA_CHECKPOINT=path/to/checkpoint.pth
+LLAMA_CHECKPOINT=path/to/consolidated.00.pth.pth
 LLAMA_PARAMS=path/to/params.json
 
 # Set low-bit quantization parameters
@@ -476,7 +476,7 @@ We use [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness) to evaluat
 For base models, use the following example command to calculate its perplexity based on WikiText.
 ```
 python -m examples.models.llama.eval_llama \
-	-c <checkpoint.pth> \
+	-c <consolidated.00.pth.pth> \
 	-p <params.json> \
 	-t <tokenizer.model/bin> \
 	-kv \
@@ -489,7 +489,7 @@ python -m examples.models.llama.eval_llama \
 For instruct models, use the following example command to calculate its MMLU score.
 ```
 python -m examples.models.llama.eval_llama \
-	-c <checkpoint.pth> \
+	-c <consolidated.00.pth.pth> \
 	-p <params.json> \
 	-t <tokenizer.model/bin> \
 	-kv \