Quanitzed Chunking

🐞Describing the bug
i used the bisect_model() function to split a quantized model into 2 chunks, i tried with 7.1 and 7.0 with reference to this file:https://github.com/apple/ml-stable-diffusion/blob/cf16df8207dfcba685a9391bad04f7402ea87b73/python_coreml_stable_diffusion/chunk_mlprogram.py#L123 , but was facing same issue.
```

 prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)
System environment (please complete the following information):
coremltools version:8.0b2
```





here is the code to reproduce,
coremltools version 7.01, i know with 8.0b2 the chunking has moved to CoreMLtools but i think it has the same issue when chunking a quantized or palletized model

Model is simple MobileNet that can be downloaded from coremltools [tutorial](https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html):https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
```

import coremltools as ct
from python_coreml_stable_diffusion.chunk_mlprogram import (
    _load_prog_from_mlmodel,
    _get_op_idx_split_location,
    _make_second_chunk_prog,
    _make_first_chunk_prog,
)
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]

print(f"op_idx = {op_idx}")
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
op_idx = 187
First  chunk size = 1.68 MB
Second chunk size = 0.15 MB
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Quanitzed Chunking #356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Quanitzed Chunking #356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions