Description
use the latest executorch codebase:
get the pte file:
the file size is:
-rw-rw-r-- 1 2.9G Apr 16 06:53 test.pte
while the float model size is:
-rw-rw-r-- 1 2.4G Oct 23 03:12 assets/models/Llama-3.2-1B/original/consolidated.00.pth
the convert script is:
#Export Llama Model
function export_llama {
model_path="$1"
# 4 bits weight only quantize
python -m examples.models.llama.export_llama
-t "$model_path/original/tokenizer.model"
--checkpoint "$model_path/original/consolidated.00.pth"
-p "$model_path/original/params.json"
--disable_dynamic_shape
--qnn
--pt2e_quantize qnn_16a4w
--model llama3_2
-d fp32
--use_kv_cache
--num_sharding 1
--soc_model SM8650
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
-v
--output_name="test.pte"
}
why the pte file size is larger than float model ?
when I use the v0.4 executorch codebase to get the pte with the same configure, the pte's size is normal:
-rw-rw-r-- 1 1.1G Apr 16 06:57 output.pte
cc @cccclai @winskuo-quic @shewu-quic @cbilgin @larryliu0820 @mergennachin @helunwencser @jackzhxng
Metadata
Metadata
Assignees
Labels
Type
Projects
Status