Handling mixed precision for dreambooth flux lora training #9565

icsl-Jeon · 2024-10-01T13:12:00Z

What does this PR do?

Hello 😄 Thank you for the awesome example!
Here, I want to make a PR that helped me train dreambooth LoRA successfully.

Avoid dtype change for transfermer after log_validation (especially for fp16). For mixed training, the original code upcast fp16 to fp32 for mixed precision training. However, after switching pipeline dtype in log_validation, transformer dtype returns to fp16, which can lead to fp16 unscaling error. Actually, I had this problem when I use fp16 option. (For some reason, T5 yielded nan output in bf16, that is why I came to use fp16) train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476
Use the same preset dtype for text encoders for validation. For some reason, the two text encoders were in fp32 in the pipeline.
Add unwarp to access the config field of transformer. train_dreambooth_lora_flux.py distributed bugs #9161 (comment)

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

linoytsaban @sayakpaul

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul

Thank you! Just a single comment.

@linoytsaban could you also give this a look?

sayakpaul · 2024-10-02T13:33:20Z

examples/dreambooth/train_dreambooth_lora_flux.py

-    pipeline = pipeline.to(accelerator.device, dtype=torch_dtype)
+    pipeline = pipeline.to(accelerator.device)


Why are we doing it?

We should keep the pipeline model-level components (such as text encoders, VAE, etc.) to a reduced precision no?

text encoders, vae are already in reduced precision :)
As I described in the PR description, this will change dtype of transformers
For mixed precision training, transformer was upcast into fp32 if fp16 training.
But this changes back to fp16, which leads to fp16 unscale error in clip gradient.

Would something like this work?
#9549 (comment)

Thank you for the suggestion! But, in this thread, I was interested in unwanted switch of fp32 into fp16 after validation, not in the computation of T5 :)

Ah okay. Can you provide an example command for us to verify this? Maybe @linoytsaban could give it a try?

@icsl-Jeon a friendly reminder :)

This could be reproduced with any launch commands in the README.
accelerate launch ... --mixed_precision="fp16" ..

I checked the lora precision with

for name, param in transformer.named_parameters(): if 'lora' in name: print(f"Layer: {name}, dtype: {param.dtype}, requires_grad: {param.requires_grad}")

Hope this help you reproduce!

sayakpaul · 2024-10-02T13:35:36Z

Avoid dtype change for transfermer after log_validation (especially for fp16). For mixed training, the original code upcast fp16 to fp32 for mixed precision training. However, after switching pipeline dtype in log_validation, transformer dtype returns to fp16, which can lead to #6442 Actually, I had this problem when I use fp16 option. (For some reason, T5 yielded nan output in bf16, that is why I came to use fp16)

We could avoid it by running inference in autocast, no? Here's an example:

diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py

Line 215 in 61d3764

autocast_ctx = torch.autocast(accelerator.device.type)

HuggingFaceDocBuilderDev · 2024-10-02T13:40:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

LGTM actually!

@linoytsaban could you also review?

icsl-Jeon · 2024-10-10T10:39:46Z

@linoytsaban thank you in advance

linoytsaban

Thanks @icsl-Jeon, LGTM!

icsl-Jeon · 2024-10-12T07:03:00Z

Is there any action to be done for merge ?

sayakpaul · 2024-10-12T07:33:21Z

Will merge this once the CI run is complete. Thanks a ton!

… dreambooth as well

* add latent caching + smol updates * update license * replace with free_memory * add --upcast_before_saving to allow saving transformer weights in lower precision * fix models to accumulate * fix mixed precision issue as proposed in #9565 * smol update to readme * style * fix caching latents * style * add tests for latent caching * style * fix latent caching --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

… + small bug fix (#9646) * make lora target modules configurable and change the default * style * make lora target modules configurable and change the default * fix bug when using prodigy and training te * fix mixed precision training as proposed in #9565 for full dreambooth as well * add test and notes * style * address sayaks comments * style * fix test --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

sayakpaul · 2024-11-01T04:46:19Z

Thanks for your contributions!

Handling mixed precision and add unwarp Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>

* add latent caching + smol updates * update license * replace with free_memory * add --upcast_before_saving to allow saving transformer weights in lower precision * fix models to accumulate * fix mixed precision issue as proposed in #9565 * smol update to readme * style * fix caching latents * style * add tests for latent caching * style * fix latent caching --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

… + small bug fix (#9646) * make lora target modules configurable and change the default * style * make lora target modules configurable and change the default * fix bug when using prodigy and training te * fix mixed precision training as proposed in #9565 for full dreambooth as well * add test and notes * style * address sayaks comments * style * fix test --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Handling mixed precision and add unwarp Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>

Handling mixed precision and add unwarp

7393fae

icsl-Jeon changed the title ~~Handling mixed precision and add unwarp~~ Handling mixed precision for dreambooth flux lora training Oct 1, 2024

sayakpaul reviewed Oct 2, 2024

View reviewed changes

Merge branch 'main' into fix_dreambooth_lora

4de70ab

sayakpaul mentioned this pull request Oct 3, 2024

Fixed the issue on flux dreambooth lora training #9549

Closed

6 tasks

sayakpaul added 2 commits October 3, 2024 09:07

Merge branch 'main' into fix_dreambooth_lora

d746f9d

Merge branch 'main' into fix_dreambooth_lora

7fa4ba1

sayakpaul approved these changes Oct 7, 2024

View reviewed changes

Merge branch 'main' into fix_dreambooth_lora

1cd0969

Merge branch 'main' into fix_dreambooth_lora

23644a1

linoytsaban approved these changes Oct 11, 2024

View reviewed changes

Merge branch 'main' into fix_dreambooth_lora

d20a040

sayakpaul and others added 2 commits October 12, 2024 13:07

Merge branch 'main' into fix_dreambooth_lora

b4918c0

Merge branch 'main' into fix_dreambooth_lora

456eb2f

linoytsaban added a commit to linoytsaban/diffusers that referenced this pull request Oct 15, 2024

fix mixed precision issue as proposed in huggingface#9565

5c07de9

linoytsaban added a commit to linoytsaban/diffusers that referenced this pull request Oct 15, 2024

fix mixed precision training as proposed in huggingface#9565 for full…

b17f9bf

… dreambooth as well

This was referenced Oct 15, 2024

[flux dreambooth lora training] make LoRA target modules configurable + small bug fix #9646

Merged

[SD3 dreambooth-lora training] small updates + bug fixes #9682

Merged

Merge branch 'main' into fix_dreambooth_lora

15d8256

sayakpaul mentioned this pull request Nov 1, 2024

Still Issue on flux dreambooth lora training #9237 #9548

Closed

sayakpaul merged commit 3deed72 into huggingface:main Nov 1, 2024
8 checks passed

luchaoqi mentioned this pull request Dec 16, 2024

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

Open

linoytsaban mentioned this pull request Dec 30, 2024

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling mixed precision for dreambooth flux lora training #9565

Handling mixed precision for dreambooth flux lora training #9565

icsl-Jeon commented Oct 1, 2024 •

edited

Loading

sayakpaul left a comment

sayakpaul Oct 2, 2024

icsl-Jeon Oct 2, 2024

sayakpaul Oct 3, 2024

icsl-Jeon Oct 3, 2024

sayakpaul Oct 3, 2024

sayakpaul Oct 6, 2024

icsl-Jeon Oct 7, 2024

sayakpaul commented Oct 2, 2024

HuggingFaceDocBuilderDev commented Oct 2, 2024

sayakpaul left a comment

icsl-Jeon commented Oct 10, 2024

linoytsaban left a comment

icsl-Jeon commented Oct 12, 2024

sayakpaul commented Oct 12, 2024

sayakpaul commented Nov 1, 2024

		pipeline = pipeline.to(accelerator.device, dtype=torch_dtype)
		pipeline = pipeline.to(accelerator.device)

Handling mixed precision for dreambooth flux lora training #9565

Handling mixed precision for dreambooth flux lora training #9565

Conversation

icsl-Jeon commented Oct 1, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Oct 2, 2024

Choose a reason for hiding this comment

icsl-Jeon Oct 2, 2024

Choose a reason for hiding this comment

sayakpaul Oct 3, 2024

Choose a reason for hiding this comment

icsl-Jeon Oct 3, 2024

Choose a reason for hiding this comment

sayakpaul Oct 3, 2024

Choose a reason for hiding this comment

sayakpaul Oct 6, 2024

Choose a reason for hiding this comment

icsl-Jeon Oct 7, 2024

Choose a reason for hiding this comment

sayakpaul commented Oct 2, 2024

HuggingFaceDocBuilderDev commented Oct 2, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

icsl-Jeon commented Oct 10, 2024

linoytsaban left a comment

Choose a reason for hiding this comment

icsl-Jeon commented Oct 12, 2024

sayakpaul commented Oct 12, 2024

sayakpaul commented Nov 1, 2024

icsl-Jeon commented Oct 1, 2024 •

edited

Loading