Fix Lumina reversed timestep handling (#2201) #2215

duongve13112002 · 2025-09-29T09:22:36Z

Description

This PR fixes the issue where Lumina's reversed timesteps (using t=0 as noise and t=1 as image) were not properly handled in some functions. As a result, certain timestep sampling methods (other than nextdit_shift) did not work as expected, causing the model to fail to learn even after thousands of steps.

The fix ensures that timestep handling is consistent with Lumina’s reversed convention.

In addition, this PR introduces a new timestep type named lognorm.

Changes

Fixed reversed timestep handling in lumina_train_util.py and related functions.
Adjusted affected methods so that they properly account for t=0 noise / t=1 image convention.
Added support for a new timestep type: lognorm.

… for lumina image v2 and add new timestep Resolve the issue reported at kohya-ss#2201 and introduce a new timestep type called "lognorm".

…ed_timesteps Fix Lumina reversed timestep handling (kohya-ss#2201) and add "lognorm" sampling

kohya-ss

Thank you. It looks good.
However, it seems that Diffusers calculates 1-timestep just before calling DiT. One idea would be to unify the timestep calculation to that method. What do you think?

https://github.com/huggingface/diffusers/blob/0a151115bbe493de74a4565e57352a0890e94777/src/diffusers/pipelines/lumina/pipeline_lumina.py#L846

kohya-ss · 2025-09-29T12:20:48Z

library/lumina_train_util.py

        t = t.view(-1, 1, 1, 1)
        noisy_model_input = (1 - t) * noise + t * latents
+
+    elif args.timestep_sampling == "lognorm":


If you add a new timestep sampling method, it seems that you also need to add it to --timestep_sampling for add_lumina_train_arguments in lumina_train_util.py.

Ok i will add it into add_lumina_train_arguments in lumina_train_util in the next pull request

Change the apply_model_prediction_type function to suitable new call_dit

…ed_timesteps Fix lumina image v2 reversed timesteps

…ltigpu

duongve13112002 · 2025-09-29T13:58:37Z

I have made the adjustments as you requested. Plus, the new timestep type which i implemented is similar to sigmoid so i deleted it and dont add to this pull request. In addition, I have also fixed the issue related to fine-tune the lumina model with multi-GPU. Moreover, according to the current fine-tune model code for Lumina, when args.blockwise_fused_optimizers is enabled, the model’s parameters are not being updated. At the moment, I don’t know how to fix this, so I have disabled this feature to prevent errors for users. Sorry for committing multiple times; I’m making changes using my tablet.

kohya-ss

Thank you for update! I think it would be great to set the timestep to 1-t.

It seems that training is possible with the current code (without PR) when using next_dit, but is it correct to understand that no changes to next_dit are necessary?

Edit: time_shift seems to need to update.

kohya-ss · 2025-10-01T11:49:30Z

library/train_util.py

+        (
+            DistributedDataParallelKwargs(find_unused_parameters=True)
+        ),


What was the purpose of this addition? I would appreciate an explanation.

According to my testing for full fintune lumina image model on multigpu you will get this error "expected gradient for parameter … but none found", so adding will handle this problem and train normal on multi-gpu without error

Thank you for the explanation. This function is commonly called by all model training scripts, so any changes made here will require testing all models.

I think it might be a good idea to find out why Lumina needs this argument and solve that problem.

Maybe we can add a flag to enable this when fine-tuning all Lumina models. Could improve flexibility.

duongve13112002 · 2025-10-01T12:14:46Z

library/lumina_train_util.py

        t = time_shift(mu, 1.0, t)

-        timesteps = t * 1000.0
+        timesteps = 1 - t * 1000.0


I also reversed the sampling of the ‘nextdit_shift’ timestep to synchronize it with the current code.

Is there any change needed here? https://github.com/duongve13112002/sd-scripts/blob/4d24b71c1647f674951f482857c12c74a5a46440/library/lumina_train_util.py#L480-L482

I think we don’t need to change anything here because another function calls this one. Changing the code in this function could potentially break the training pipeline, for example, in this function
https://github.com/duongve13112002/sd-scripts/blob/4d24b71c1647f674951f482857c12c74a5a46440/library/lumina_train_util.py#L507-L537

I think that get_schedule will not get the correct value unless modifying time_shift.

In this PR, the model input has been inverted to 1-t, so if you leave time_shift unmodified, the shift value will be inverted. In other words, the implementation of time_shift should be the same as in FLUX.1.

Ok i will change it right now.

duongve13112002 added 2 commits September 29, 2025 16:17

Fix timestep sampling in get_noisy_model_input_and_timesteps function…

a9aa707

… for lumina image v2 and add new timestep Resolve the issue reported at kohya-ss#2201 and introduce a new timestep type called "lognorm".

Merge pull request #1 from duongve13112002/fix_lumina_image_v2_revers…

f69a8f9

…ed_timesteps Fix Lumina reversed timestep handling (kohya-ss#2201) and add "lognorm" sampling

kohya-ss reviewed Sep 29, 2025

View reviewed changes

duongve13112002 added 7 commits September 29, 2025 19:47

Update lumina_train_network.py

b32d66c

Update lumina_train_util.py

b869b5d

Change the apply_model_prediction_type function to suitable new call_dit

Update lumina_train.py

8ad9172

Update lumina_train.py

e222084

Merge pull request #2 from duongve13112002/fix_lumina_image_v2_revers…

9109b6d

…ed_timesteps Fix lumina image v2 reversed timesteps

Update prepare_accelerator to handle got an error when training on mu…

fe7005c

…ltigpu

Update lumina_train_util.py

a5f3804

duongve13112002 added 2 commits September 29, 2025 21:12

Update lumina_train_util.py

717502b

Update lumina_train to fix reversed timestep

4d24b71

kohya-ss reviewed Oct 1, 2025

View reviewed changes

duongve13112002 commented Oct 1, 2025

View reviewed changes

Update lumina_train_util.py

4883a1b

duongve13112002 changed the title ~~Fix Lumina reversed timestep handling (#2201) and add "lognorm" sampling~~ Fix Lumina reversed timestep handling (#2201) Oct 15, 2025

kohya-ss mentioned this pull request Oct 16, 2025

fix: lumina 2 timesteps handling #2225

Open

Uh oh!

Fix Lumina reversed timestep handling (#2201) #2215

Are you sure you want to change the base?

Fix Lumina reversed timestep handling (#2201) #2215

Conversation

duongve13112002 commented Sep 29, 2025

Uh oh!

kohya-ss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duongve13112002 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duongve13112002 Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

duongve13112002 commented Sep 29, 2025 •

edited

Loading

kohya-ss left a comment •

edited

Loading

duongve13112002 Oct 1, 2025 •

edited

Loading