Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image2image - swift #116

Merged
merged 15 commits into from
Feb 9, 2023
Merged

Image2image - swift #116

merged 15 commits into from
Feb 9, 2023

Conversation

littleowl
Copy link
Contributor

@littleowl littleowl commented Jan 28, 2023

Adds image2image functionality.

This is only the swift portion. A separate library that generates the model for the VAE Encoder exists here: #115

The original PR with both Swift and Python libs can be found here: #73

Sadly, I could not complete the request for the python simplification. I believe it is better not to do so. Consequently, sorry for the delay splitting up the PR. There are not many changes from the previous PR apart from removing the python and rebasing the project.

In Swift, an Encoder class is created. Various changes to the scheduler, pipeline, and CLI to support input image and strength. CGImage creation from MLShapedArray is moved into its own file along with the new function to create a MLShapedArray from a CGImage. Image loading and preparation is currently handled / optimized with vImage.

This should work with both Schedulers that we have so far. (Thanks @pcuenca) for the fix to work with DPMSolverMultistepScheduler.

Do not erase the below when submitting your pull request:
#########

  • I agree to the terms outlined in CONTRIBUTING.md

This was referenced Jan 28, 2023
@atiorh
Copy link
Collaborator

atiorh commented Jan 28, 2023

Thank you @littleowl ! I reviewed #115 and leaving the noise situation as is should be fine, please see my comment there :) I believe the only implication of my review on #115 on this PR is that the model I/O names should be changed from camelCase to underscore and that is it. Thank you again for your amazing contributions!

Copy link
Collaborator

@alejandro-isaza alejandro-isaza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of small things.

@littleowl
Copy link
Contributor Author

@atiorh, @alejandro-isaza - Thanks for your reviews! I believe that I resolved all the comments. Please let me know if there is anything more that I can do.

@littleowl littleowl requested review from alejandro-isaza and atiorh and removed request for msiracusa, alejandro-isaza and atiorh January 31, 2023 06:50
Copy link
Collaborator

@alejandro-isaza alejandro-isaza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment needs to be fixed. Other than that just a couple more nits.

@littleowl
Copy link
Contributor Author

@alejandro-isaza, thanks again for the review. The additional comments have been resolved.

@atiorh atiorh requested a review from msiracusa January 31, 2023 19:01
}

/// Image generation configuration
public struct Configuration: Hashable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more thing. Let's rename this file StableDiffusionPipeline.Configuration.swift

@atiorh
Copy link
Collaborator

atiorh commented Feb 1, 2023

@littleowl In addition to the file name change that @alejandro-isaza asked for, if you could just add a quick note in the Example CLI Usage section to indicate that an image argument is now supported, that would be great!

@atiorh
Copy link
Collaborator

atiorh commented Feb 2, 2023

I just tested the CLI with different images and it seems that we need to guard against or fix a few things:
1-) When the --image file's resolution does not match the VAEEncoder input resolution, we get zsh: trace trap
2-) Loading a PNG image with the correct resolution works but trying to load the same image encoded as JPEG fails without a sufficient explanation

Copy link
Collaborator

@atiorh atiorh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@littleowl I just realized that we left scattered comments after the initial review, apologies! Here are the two remaining items before we can merge:

  • swift/StableDiffusion/pipeline/StableDiffusionPipeline+SampleInput.swift renamed to swift/StableDiffusion/pipeline/StableDiffusionPipeline.Configuration.swift
  • specific error message when input image file format and/or resolution do not match expectations

@pj4533
Copy link

pj4533 commented Feb 6, 2023

I know you all have this awesome PR under control, but just wanted to say that I successfully converted the v2 base model VAEEncoder using the main (w/ the python image2image PR), and then converted the rest of v2 base using this PR, and am currently generating image2image using my own swift code. So from an end-user perspective: well done!

@saiedg
Copy link

saiedg commented Feb 8, 2023

@pj4533 well done! I'm so jealous! how can I try??

@littleowl
Copy link
Contributor Author

Again, sorry for the delay @atiorh.
I added some error detection. It needs to happen right before prediction due to the dynamic model.
I fixed a hard coded instance of 512.
Renamed the file.
Fixed Jpeg Support by utilizing Cocoa.

@atiorh
Copy link
Collaborator

atiorh commented Feb 9, 2023

Thanks for your hard work @littleowl, this is going to benefit many developers! I just tested your changes and they are working like a charm. Merging now.

@atiorh atiorh merged commit fa7bbdc into apple:main Feb 9, 2023
@saiedg
Copy link

saiedg commented Feb 10, 2023

@littleowl congratulations on merging! I'm sooo excited to use it but don't know how. Can you please add instructions in the readme or make a YouTube video?

@ynagatomo
Copy link

here is a sample;
Screenshot 2023-02-10 at 15 50 52

@martinlexow
Copy link

Thanks for your exceptional work @littleowl and thank you for sharing the example @ynagatomo!

Can you please explain what the strength parameter does in this context? It seems to just decrease the steps in my observation (like: steps × strength) and it also doesn’t seem obvious to me why strength has to be less than 1.0 to trigger the imageToImage mode.

@ynagatomo
Copy link

an experiment on strength;
i2i_1280
(code: https://github.com/ynagatomo/ImgGenSD2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants