Open
Description
🐞Describing the bug
i used the bisect_model() function to split a quantized model into 2 chunks, i tried with 7.1 and 7.0 with reference to this file:
prog = _load_prog_from_mlmodel(model)
# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
prog)
print(f"First chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)
System environment (please complete the following information):
coremltools version:8.0b2
here is the code to reproduce,
coremltools version 7.01, i know with 8.0b2 the chunking has moved to CoreMLtools but i think it has the same issue when chunking a quantized or palletized model
Model is simple MobileNet that can be downloaded from coremltools tutorial:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
import coremltools as ct
from python_coreml_stable_diffusion.chunk_mlprogram import (
_load_prog_from_mlmodel,
_get_op_idx_split_location,
_make_second_chunk_prog,
_make_first_chunk_prog,
)
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)
# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]
print(f"op_idx = {op_idx}")
print(f"First chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
op_idx = 187
First chunk size = 1.68 MB
Second chunk size = 0.15 MB
Metadata
Metadata
Assignees
Labels
No labels