Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent compilation every launch in python #203

Open
rbourgeat opened this issue Jul 7, 2023 · 3 comments
Open

Prevent compilation every launch in python #203

rbourgeat opened this issue Jul 7, 2023 · 3 comments

Comments

@rbourgeat
Copy link

rbourgeat commented Jul 7, 2023

I'm looking to optimize the use of ml-stable-diffusion python.
When i use python_coreml_stable_diffusion.pipeline, the model is compiled and dont use precompiled .mlmodelc files...

INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.

Why only swift package can use that ???
Is there a solution?

@atiorh
Copy link
Collaborator

atiorh commented Jul 8, 2023

Hello @rbourgeat, we have a radar to improve this behavior for Core ML models running with coremltools in Python but it is currently pending prioritization so it wouldn't be immediately available. I recommend:

  • Initializing the Python pipeline in your own script so the compile-on-load overhead is amortized throughout a session of your script.
  • Building on top of the Swift package which is more feature-rich (image-to-image, ControlNet etc.) compared to the Python pipeline (only offers text-to-image today)

@atiorh
Copy link
Collaborator

atiorh commented Jul 8, 2023

On a separate note, the concept "compile" is overloaded in this context. .mlpackage files are compiled into .mlmodelc files for deployment which doesn't generally take more than a few seconds. The main overhead comes when the Core ML model is being loaded for a particular compute engine. Currently, it takes ~1-2 minutes for the model to be compiled for the Neural Engine and there is another radar to improve this overall compile time. When loading for GPU and CPU, this overhead should not be more than a few seconds.

@rbourgeat
Copy link
Author

Hello @atiorh ! Thank you for your answer.

I tried to initialize the python script directly in my code and the compilation takes 7min at each launch and after generating an image is almost instantaneous (like 2-3s). But I don't want to wait 7min each time...

And for the Swift package, my app is in Python because I'm trying to do cross-platform so it's not an option for me...

So in the meantime I found a solution, it is to use the Pytorch MPS device with Diffuser from HuggingFace:

https://huggingface.co/docs/diffusers/optimization/mps

It does not use quantized Models unfortunately but it is the best solution in Python on M1 in the meantime.

And thank you for all you do, can't wait to see the next updates!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants