-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Image to Image option #16
Conversation
@Xuzzo Again, thanks for the contribution. Very nice that you spotted and fixed the Regarding this feature, do you have a good example of it working? I only tried it quickly with the schnell model for 4 steps, and did not get any special result from it (I basically got back the input image with minor variations). Maybe it might not work as well with the distilled |
Oh great, this is a very nice example! I think we should add this to the readme. |
Btw, I have looked at your PR, and will share some comments later today. Thanks for the patience :) |
self.inference_steps = list(range(num_inference_steps)) | ||
self.guidance = guidance | ||
|
||
class ConfigImg2Img: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a personal preference, but I would prefer one file per config class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current position is we probably don't need multiple config classes that map 1 config to 1 CLI entry point.
def __init__( | ||
self, | ||
num_inference_steps: int = 4, | ||
width: int = 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the img2img option, does it make sense to also provide a height and width? I am thinking that it will always be based on the input base image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this depends on the training resolution of Flux, my understanding (from reading forums) is that snapping the output image to the training resolutions gets better results. Worth doing some evals here, or just follow evals from similar projects upstream.
ImageUtil.save_image(image, args.output) | ||
|
||
|
||
if __name__ == '__main__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding main_img2img.py
I think I prefer this option, instead of unifying things too much at this point (not fully sure how often this feature will be used compared to text2img), so I actually prefer to keep this a bit separate for now as you have done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after working on my own take in anthonywu#1 - my current opinion is the config classes should be mixins, instead of RuntimeConfig being parent object of ModelConfig, etc.
image_latents = self._pack_latents(image_latents, runtime_config.height, runtime_config.width) | ||
latents = runtime_config.sigmas[config.init_timestep] * noise + (1.0 - runtime_config.sigmas[config.init_timestep]) * image_latents | ||
|
||
return self._generate_from_latents(latents, prompt, runtime_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a bit more of a personal opinion, but I think I even would prefer a separate flux_img2img.py
file that holds the new class at this point. Right now, I would also not even subclass it and deal with some code duplication for the img2img case.
I guess the thing I like is for the standard text2img workflow to be as straight forward as possible and I also kinda like being able to view the whole generate_image
method at once, without it delegating to another method like _generate_from_latents
. This would mean some more code duplication right now, but I think it is OK at the moment.
Based on your img2img logic, I have made a branch based on this one of what this could look like.
To be clear, since you have added the extra logic for img2img, I think we should continue on your current branch here, but make some changes similar to how it looks like in my other branch.
Would be nice to hear your opinions on this, and also let me know if you have the time or not to do it, otherwise I can add some commits on top of your branch here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem for me. Feel free to push directly here
any progress for this? |
I have a draft PR that re-constructs this PR using some WIP-refactoring code. Open to review: https://github.com/anthonywu/mflux/pull/1/files Interface is ready. The image generation is buggy, I think I'm a "few lines of code" away from getting it to work, just ran out of time today. The structure of the re-implementation is worth a preview though for anyone interested in helping expedite this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks to @Xuzzo's ref implementation here I found a bug fix for my PR over at anthonywu#1
mflux-generate --model dev --init-image cake.png --init-image-strength 0.25 --steps 12 --prompt "cake, blue frosting, bananas and cherries, blue flower pot with yellow sunflowers in the background, plated with gold leafs"
The above result was achieved with dev
and 10+ steps.
In contrast, I'm finding that schnell
with low number of steps like 4
isn't sufficient to alter the image meaningfully. Unsure whether image to image should be recommended for schnell
ImageUtil.save_image(image, args.output) | ||
|
||
|
||
if __name__ == '__main__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after working on my own take in anthonywu#1 - my current opinion is the config classes should be mixins, instead of RuntimeConfig being parent object of ModelConfig, etc.
self.inference_steps = list(range(num_inference_steps)) | ||
self.guidance = guidance | ||
|
||
class ConfigImg2Img: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current position is we probably don't need multiple config classes that map 1 config to 1 CLI entry point.
def __init__( | ||
self, | ||
num_inference_steps: int = 4, | ||
width: int = 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this depends on the training resolution of Flux, my understanding (from reading forums) is that snapping the output image to the training resolutions gets better results. Worth doing some evals here, or just follow evals from similar projects upstream.
Thanks @anthonywu for moving this forward. Im closing this PR and leaving all development to #77 |
Hi,
this PR: