Prevent compilation every launch in python #203

rbourgeat · 2023-07-07T22:54:10Z

I'm looking to optimize the use of ml-stable-diffusion python.
When i use python_coreml_stable_diffusion.pipeline, the model is compiled and dont use precompiled .mlmodelc files...

INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.

Why only swift package can use that ???
Is there a solution?

The text was updated successfully, but these errors were encountered:

atiorh · 2023-07-08T22:38:07Z

Hello @rbourgeat, we have a radar to improve this behavior for Core ML models running with coremltools in Python but it is currently pending prioritization so it wouldn't be immediately available. I recommend:

Initializing the Python pipeline in your own script so the compile-on-load overhead is amortized throughout a session of your script.
Building on top of the Swift package which is more feature-rich (image-to-image, ControlNet etc.) compared to the Python pipeline (only offers text-to-image today)

atiorh · 2023-07-08T22:38:51Z

On a separate note, the concept "compile" is overloaded in this context. .mlpackage files are compiled into .mlmodelc files for deployment which doesn't generally take more than a few seconds. The main overhead comes when the Core ML model is being loaded for a particular compute engine. Currently, it takes ~1-2 minutes for the model to be compiled for the Neural Engine and there is another radar to improve this overall compile time. When loading for GPU and CPU, this overhead should not be more than a few seconds.

rbourgeat · 2023-07-08T22:53:21Z

Hello @atiorh ! Thank you for your answer.

I tried to initialize the python script directly in my code and the compilation takes 7min at each launch and after generating an image is almost instantaneous (like 2-3s). But I don't want to wait 7min each time...

And for the Swift package, my app is in Python because I'm trying to do cross-platform so it's not an option for me...

So in the meantime I found a solution, it is to use the Pytorch MPS device with Diffuser from HuggingFace:

https://huggingface.co/docs/diffusers/optimization/mps

It does not use quantized Models unfortunately but it is the best solution in Python on M1 in the meantime.

And thank you for all you do, can't wait to see the next updates!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent compilation every launch in python #203

Prevent compilation every launch in python #203

rbourgeat commented Jul 7, 2023 •

edited

Loading

atiorh commented Jul 8, 2023

atiorh commented Jul 8, 2023

rbourgeat commented Jul 8, 2023

Prevent compilation every launch in python #203

Prevent compilation every launch in python #203

Comments

rbourgeat commented Jul 7, 2023 • edited Loading

atiorh commented Jul 8, 2023

atiorh commented Jul 8, 2023

rbourgeat commented Jul 8, 2023

rbourgeat commented Jul 7, 2023 •

edited

Loading