Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Timestep Sampling Function from SD3 Branch to SD (dev base) #1671

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

gesen2egee
Copy link
Contributor

This PR introduces the timestep_sampling feature from SD3 Branch into the original SD model. The new timestep sampling options offer a more concentrated probability distribution compared to the default uniform sampling, which helps the model focus on specific aspects of learning. The new options can be used via the --timestep_sampling argument.

  • Default (uniform): Keeps the original uniform timestep sampling, which evenly distributes the learning steps.
  • Sigmoid: Behaves similarly to shift by default --discrete_flow_shift = 1.
  • Shift: Skews the timestep distribution, helping the model concentrate on style learning or specific objects.
  • Flux Shift: This option shift with the image size, similar to how FLUX behaves in larger images

Additional Parameters:

  • --discrete_flow_shift: By default set to 1, uses random normal distribution to sample timesteps. A rightward shift helps the model focus more on style, while a leftward shift aids in learning specific objects. The default value is 1, providing a balanced form.
  • --sigmoid_scale: Adjusts the shape of the sigmoid function.
    image

While flux_shift maintains the distortion effects of FLUX, its application may vary in SD due to differences in model nature and training at fixed resolutions.

It is recommended to use the --timestep_sampling sigmoid option, combined with --soft_min_snr_gamma = 1
By rockerBOO #1068
#1068
for better results, as these settings seem to significantly improve model performance.

Suggested to merge this PR along with the soft min snr gamma PR.

@recris
Copy link

recris commented Oct 4, 2024

Great job! But I wonder if we could improve the approach to adding additional sampling functions.

The space of sampling strategies is vast and there are other valid functions, for example:

If we keep adding more parameters every time a new approach shows up it will make maintenance more difficult. I've been thinking on this problem for some time and I'd like to propose a more flexible mechanism.

The idea is to make timestep sampling pluggable in the same way the choice of LR schedule or optimizer are. We could have a generic interface (class), say TimestepSampler, and different functions would be implemented as subclasses. Then we chose the class to use via some configuration parameter (like we do for optimizers using optimizer_type and optimizer_args).

This way we can add other functions in the future without touching the existing code, and would also simplify the maintenance of private forks by reducing the need to deal with rebase conflicts (which is the main reason I am interested in this).

I leave it to @kohya-ss to comment on this as long term approach.

@FurkanGozukara
Copy link

@gesen2egee thank you so much

for flux dev i found that Model Prediction Type = raw and Timestep Sampling = sigmoid

What do you think about this?

Update train_util.py

Update train_util.py
@kohya-ss
Copy link
Owner

kohya-ss commented Oct 6, 2024

Thank you for updating the PR! This seems to be effective, but also what recris said makes sense. Please give me some time to consider it.

@bghira
Copy link

bghira commented Oct 16, 2024

sampling continuous timesteps for a discrete model seems like a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants