-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix Lumina reversed timestep handling (#2201) and add "lognorm" sampling #2215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sd3
Are you sure you want to change the base?
Conversation
… for lumina image v2 and add new timestep Resolve the issue reported at kohya-ss#2201 and introduce a new timestep type called "lognorm".
…ed_timesteps Fix Lumina reversed timestep handling (kohya-ss#2201) and add "lognorm" sampling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It looks good.
However, it seems that Diffusers calculates 1-timestep
just before calling DiT. One idea would be to unify the timestep calculation to that method. What do you think?
library/lumina_train_util.py
Outdated
t = t.view(-1, 1, 1, 1) | ||
noisy_model_input = (1 - t) * noise + t * latents | ||
|
||
elif args.timestep_sampling == "lognorm": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add a new timestep sampling method, it seems that you also need to add it to --timestep_sampling
for add_lumina_train_arguments
in lumina_train_util.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i will add it into add_lumina_train_arguments in lumina_train_util in the next pull request
+1. I was also trying to reverse all timestep samplings, but found that there are other funcs outside lumina files rely on this timesteps, e.g. min_snr_gamma. Reversing it in one place makes things much easier. Won't have to change other funcs
|
Change the apply_model_prediction_type function to suitable new call_dit
…ed_timesteps Fix lumina image v2 reversed timesteps
I have made the adjustments as you requested. Plus, the new timestep type which i implemented is similar to sigmoid so i deleted it and dont add to this pull request. In addition, I have also fixed the issue related to fine-tune the lumina model with multi-GPU. Moreover, according to the current fine-tune model code for Lumina, when args.blockwise_fused_optimizers is enabled, the model’s parameters are not being updated. At the moment, I don’t know how to fix this, so I have disabled this feature to prevent errors for users. Sorry for committing multiple times; I’m making changes using my tablet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update! I think it would be great to set the timestep to 1-t
.
It seems that training is possible with the current code (without PR) when using next_dit
, but is it correct to understand that no changes to next_dit
are necessary?
Edit: time_shift
seems to need to update.
( | ||
DistributedDataParallelKwargs(find_unused_parameters=True) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the purpose of this addition? I would appreciate an explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my testing for full fintune lumina image model on multigpu you will get this error "expected gradient for parameter … but none found", so adding will handle this problem and train normal on multi-gpu without error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. This function is commonly called by all model training scripts, so any changes made here will require testing all models.
I think it might be a good idea to find out why Lumina needs this argument and solve that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a flag to enable this when fine-tuning all Lumina models. Could improve flexibility.
library/lumina_train_util.py
Outdated
t = time_shift(mu, 1.0, t) | ||
|
||
timesteps = t * 1000.0 | ||
timesteps = 1 - t * 1000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also reversed the sampling of the ‘nextdit_shift’ timestep to synchronize it with the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any change needed here? https://github.com/duongve13112002/sd-scripts/blob/4d24b71c1647f674951f482857c12c74a5a46440/library/lumina_train_util.py#L480-L482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don’t need to change anything here because another function calls this one. Changing the code in this function could potentially break the training pipeline, for example, in this function
https://github.com/duongve13112002/sd-scripts/blob/4d24b71c1647f674951f482857c12c74a5a46440/library/lumina_train_util.py#L507-L537
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that get_schedule
will not get the correct value unless modifying time_shift
.
In this PR, the model input has been inverted to 1-t
, so if you leave time_shift
unmodified, the shift value will be inverted. In other words, the implementation of time_shift
should be the same as in FLUX.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i will change it right now.
Description
This PR fixes the issue where Lumina's reversed timesteps (using t=0 as noise and t=1 as image) were not properly handled in some functions. As a result, certain timestep sampling methods (other than nextdit_shift) did not work as expected, causing the model to fail to learn even after thousands of steps.
The fix ensures that timestep handling is consistent with Lumina’s reversed convention.
In addition, this PR introduces a new timestep type named lognorm.
Changes