Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image to Image option #16

Closed
wants to merge 9 commits into from
Closed

Conversation

Xuzzo
Copy link
Contributor

@Xuzzo Xuzzo commented Aug 21, 2024

Hi,
this PR:

  • introduces the possibility to modify another image through flux1
  • fixes a bug with width <-> height (they were swapped in config init)
  • adds main_img2img for generation. Note: not sure this is needed, we could probably unify cli and keep only one main

@filipstrand
Copy link
Owner

@Xuzzo Again, thanks for the contribution. Very nice that you spotted and fixed the height/width mistake.

Regarding this feature, do you have a good example of it working? I only tried it quickly with the schnell model for 4 steps, and did not get any special result from it (I basically got back the input image with minor variations). Maybe it might not work as well with the distilled schnell model??

@Xuzzo
Copy link
Contributor Author

Xuzzo commented Aug 24, 2024

Using the schnell model I found it is quite important to tune the strength parameter. For example:

Original image:
image

CLI:
python main_img2img.py --prompt "Luxury food photograph" --steps 10 --seed 2 --base-image image.png --strength 0.3

Result:
image(37)

I found the dev to be a bit less sensitive to strength.

@filipstrand
Copy link
Owner

Using the schnell model I found it is quite important to tune the strength parameter. For example:

Original image: image

CLI: python main_img2img.py --prompt "Luxury food photograph" --steps 10 --seed 2 --base-image image.png --strength 0.3

Result: image(37)

I found the dev to be a bit less sensitive to strength.

Oh great, this is a very nice example! I think we should add this to the readme.

@filipstrand
Copy link
Owner

Btw, I have looked at your PR, and will share some comments later today. Thanks for the patience :)

self.inference_steps = list(range(num_inference_steps))
self.guidance = guidance

class ConfigImg2Img:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a personal preference, but I would prefer one file per config class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my current position is we probably don't need multiple config classes that map 1 config to 1 CLI entry point.

def __init__(
self,
num_inference_steps: int = 4,
width: int = 1024,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the img2img option, does it make sense to also provide a height and width? I am thinking that it will always be based on the input base image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this depends on the training resolution of Flux, my understanding (from reading forums) is that snapping the output image to the training resolutions gets better results. Worth doing some evals here, or just follow evals from similar projects upstream.

ImageUtil.save_image(image, args.output)


if __name__ == '__main__':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding main_img2img.py I think I prefer this option, instead of unifying things too much at this point (not fully sure how often this feature will be used compared to text2img), so I actually prefer to keep this a bit separate for now as you have done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after working on my own take in anthonywu#1 - my current opinion is the config classes should be mixins, instead of RuntimeConfig being parent object of ModelConfig, etc.

image_latents = self._pack_latents(image_latents, runtime_config.height, runtime_config.width)
latents = runtime_config.sigmas[config.init_timestep] * noise + (1.0 - runtime_config.sigmas[config.init_timestep]) * image_latents

return self._generate_from_latents(latents, prompt, runtime_config)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a bit more of a personal opinion, but I think I even would prefer a separate flux_img2img.py file that holds the new class at this point. Right now, I would also not even subclass it and deal with some code duplication for the img2img case.

I guess the thing I like is for the standard text2img workflow to be as straight forward as possible and I also kinda like being able to view the whole generate_image method at once, without it delegating to another method like _generate_from_latents. This would mean some more code duplication right now, but I think it is OK at the moment.

Based on your img2img logic, I have made a branch based on this one of what this could look like.

To be clear, since you have added the extra logic for img2img, I think we should continue on your current branch here, but make some changes similar to how it looks like in my other branch.

Would be nice to hear your opinions on this, and also let me know if you have the time or not to do it, otherwise I can add some commits on top of your branch here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem for me. Feel free to push directly here

@iLoveBug
Copy link

iLoveBug commented Oct 8, 2024

any progress for this?

@anthonywu
Copy link
Contributor

👍🏼 to getting img2img implemented, though this change would need to be re-constructed on latest main changes. I propose doing this after #70 and #66 and maybe another refactoring where the Flux classes become sub-classes of a general FluxBase where shared implementation go.

@anthonywu
Copy link
Contributor

I have a draft PR that re-constructs this PR using some WIP-refactoring code.

Open to review: https://github.com/anthonywu/mflux/pull/1/files

Interface is ready. The image generation is buggy, I think I'm a "few lines of code" away from getting it to work, just ran out of time today. The structure of the re-implementation is worth a preview though for anyone interested in helping expedite this.

Copy link
Contributor

@anthonywu anthonywu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks to @Xuzzo's ref implementation here I found a bug fix for my PR over at anthonywu#1

mflux-generate --model dev --init-image cake.png --init-image-strength 0.25 --steps 12 --prompt "cake, blue frosting, bananas and cherries, blue flower pot with yellow sunflowers in the background, plated with gold leafs"

init image
image

output image
image

composite of stepwise
image

The above result was achieved with dev and 10+ steps.

In contrast, I'm finding that schnell with low number of steps like 4 isn't sufficient to alter the image meaningfully. Unsure whether image to image should be recommended for schnell

ImageUtil.save_image(image, args.output)


if __name__ == '__main__':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after working on my own take in anthonywu#1 - my current opinion is the config classes should be mixins, instead of RuntimeConfig being parent object of ModelConfig, etc.

self.inference_steps = list(range(num_inference_steps))
self.guidance = guidance

class ConfigImg2Img:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my current position is we probably don't need multiple config classes that map 1 config to 1 CLI entry point.

def __init__(
self,
num_inference_steps: int = 4,
width: int = 1024,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this depends on the training resolution of Flux, my understanding (from reading forums) is that snapping the output image to the training resolutions gets better results. Worth doing some evals here, or just follow evals from similar projects upstream.

@Xuzzo
Copy link
Contributor Author

Xuzzo commented Oct 16, 2024

Thanks @anthonywu for moving this forward. Im closing this PR and leaving all development to #77

@Xuzzo Xuzzo closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants