[HiDream LoRA] optimizations + small updates #11381

linoytsaban · 2025-04-22T08:52:03Z

some memory optimizations:

add pre-computation of prompt embeddings when custom prompts are used as well
add pre-computation of validation prompt as well
add --skip_final_inference - to allow to run with validation, but skip the final loading of the pipeline with the lora weights to reduce memory reqs

other changes:

update default trained layers
save model card even if model is not pushed to hub
remove scheduler initialization from code example - not necessary anymore (it's now if the base model's config)

todo:

update readme with better defaults

Yarn Art LoRA

training config

import os
os.environ['MODEL_NAME'] = "HiDream-ai/HiDream-I1-Full"
os.environ['DATASET_NAME'] ="Norod78/Yarn-art-style"
os.environ['OUTPUT_DIR'] = "hidream-yarn-art-lora-v2-trainer"

!accelerate launch train_dreambooth_lora_hidream.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --dataset_name=$DATASET_NAME \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="bf16" \
  --lora_layers="to_k,to_q,to_v,to_out"\
  --instance_prompt="a dog, yarn art style" \
  --validation_prompt="yoda, yarn art style" \
  --caption_column="text" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --use_8bit_adam\
  --rank=16 \
  --learning_rate=1e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant_with_warmup" \
  --lr_warmup_steps=200 \
  --max_train_steps=1000 \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

…sed as well 2. save model card even if model is not pushed to hub 3. remove scheduler initialization from code example - not necessary anymore (it's now if the base model's config) 4. add skip_final_inference - to allow to run with validation, but skip the final loading of the pipeline with the lora weights to reduce memory reqs

HuggingFaceDocBuilderDev · 2025-04-22T08:58:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

linoytsaban · 2025-04-22T09:46:42Z

@sayakpaul now that I'm thinking about it, even when validation is enabled, since we're not optimizing the text encoders, can't we just pre-encode the validation prompt embeddings as well? and then we don't need to keep or load text encoders for validation at all and simply pass the embeddings to log_validation

sayakpaul · 2025-04-22T09:54:41Z

@sayakpaul now that I'm thinking about it, even when validation is enabled, since we're not optimizing the text encoders, can't we just pre-encode the validation prompt embeddings as well? and then we don't need to keep or load text encoders for validation at all and simply pass the embeddings to log_validation

We should. Let's do that.

sayakpaul

Thanks for the optims. LMK if the comments make sense or if anything is unclear.

examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

…ollowup

linoytsaban · 2025-04-22T13:38:08Z

@bot /style

github-actions · 2025-04-22T13:39:04Z

Style fixes have been applied. View the workflow run here.

examples/dreambooth/train_dreambooth_lora_hidream.py

sayakpaul

Thank you! Left some more comments. I think we can merge this today. Also would be good to see if we test this on a 40GB GPU.

linoytsaban · 2025-04-23T10:02:04Z

Thanks @sayakpaul!

I added prints using your snippet (here) - right before caching & and pre-computation of prompt embeddings and straight after deleting them.
freeing the memory seems to work as expected getting us to ~33GB, but we hit ~60GB prior when we move the text encoding pipeline to gpu (this is with resolution==1024)
with offloading -

=== CUDA Memory Stats before caching ===
Current allocated: 32.24 GB
Max allocated: 32.24 GB
Current reserved: 58.29 GB
Max reserved: 58.29 GB

=== CUDA Memory Stats after caching ===
Current allocated: 58.41 GB
Max allocated: 58.41 GB
Current reserved: 60.84 GB
Max reserved: 60.84 GB

=== CUDA Memory Stats after freeing ===
Current allocated: 32.87 GB
Max allocated: 32.87 GB
Current reserved: 33.82 GB
Max reserved: 33.82 GB

without offloading & caching -

=== CUDA Memory Stats before caching ===
Current allocated: 57.77 GB
Max allocated: 57.77 GB
Current reserved: 58.50 GB
Max reserved: 58.50 GB

=== CUDA Memory Stats after caching ===
Current allocated: 58.35 GB
Max allocated: 58.35 GB
Current reserved: 59.11 GB
Max reserved: 59.11 GB

=== CUDA Memory Stats after freeing ===
Current allocated: 32.98 GB
Max allocated: 32.98 GB
Current reserved: 33.39 GB
Max reserved: 33.39 GB

…s only pre-encoded if custom prompts are provided, but should be pre-encoded either way)

linoytsaban · 2025-04-23T10:05:29Z

@bot /style

github-actions · 2025-04-23T10:06:23Z

Style fixes have been applied. View the workflow run here.

…ollowup

linoytsaban · 2025-04-23T12:23:41Z

@bot /style

github-actions · 2025-04-23T12:24:34Z

Style fixes have been applied. View the workflow run here.

…ollowup

sayakpaul

Looks great! Thanks Linoy!

sayakpaul · 2025-04-23T16:46:43Z

examples/dreambooth/train_dreambooth_lora_hidream.py

@@ -1140,7 +1131,7 @@ def main(args):
    if args.lora_layers is not None:
        target_modules = [layer.strip() for layer in args.lora_layers.split(",")]
    else:
-        target_modules = ["to_k", "to_q", "to_v", "to_out.0"]
+        target_modules = ["to_k", "to_q", "to_v", "to_out"]


Perhaps we can add a comment explaining that including to_out will target all the expert layers.

…rovided and change example to 3d icons

…ollowup

linoytsaban · 2025-04-23T23:14:21Z

@bot /style

sayakpaul reviewed Apr 22, 2025

View reviewed changes

linoytsaban and others added 6 commits April 22, 2025 15:18

pre encode validation prompt as well

ca8e79b

Update examples/dreambooth/train_dreambooth_lora_hidream.py

4b0aa84

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update examples/dreambooth/train_dreambooth_lora_hidream.py

bb12f88

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update examples/dreambooth/train_dreambooth_lora_hidream.py

e4d365d

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

pre encode validation prompt as well

65832ee

Merge remote-tracking branch 'origin/hidream-followup' into hidream-f…

8fd8d42

…ollowup

Apply style fixes

b27d9bc

linoytsaban and others added 4 commits April 22, 2025 16:59

empty commit

c8b2f07

change default trained modules

9e2091d

empty commit

2c59748

Merge branch 'main' into hidream-followup

2652029

sayakpaul reviewed Apr 23, 2025

View reviewed changes

examples/dreambooth/train_dreambooth_lora_hidream.py Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 23, 2025

View reviewed changes

examples/dreambooth/train_dreambooth_lora_hidream.py Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 23, 2025

View reviewed changes

examples/dreambooth/train_dreambooth_lora_hidream.py Show resolved Hide resolved

sayakpaul reviewed Apr 23, 2025

View reviewed changes

address comments + change encoding of validation prompt (before it wa…

2bbeb9f

…s only pre-encoded if custom prompts are provided, but should be pre-encoded either way)

Apply style fixes

d4dd84f

linoytsaban added 3 commits April 23, 2025 13:45

empty commit

dd67962

Merge remote-tracking branch 'origin/hidream-followup' into hidream-f…

3f84f96

…ollowup

fix validation_embeddings definition

e6b8f01

linoytsaban added 2 commits April 23, 2025 14:50

fix final inference condition

9c976d2

fix pipeline deletion in last inference

75eaaa7

Apply style fixes

44a9846

linoytsaban added 2 commits April 23, 2025 15:30

empty commit

46fcd76

Merge remote-tracking branch 'origin/hidream-followup' into hidream-f…

5dc3468

…ollowup

linoytsaban requested review from sayakpaul April 23, 2025 12:39

linoytsaban and others added 3 commits April 23, 2025 16:47

Merge branch 'main' into hidream-followup

d093d08

layers

af42f02

Merge remote-tracking branch 'origin/hidream-followup' into hidream-f…

d795bb3

…ollowup

sayakpaul approved these changes Apr 23, 2025

View reviewed changes

linoytsaban and others added 5 commits April 23, 2025 19:50

Merge branch 'main' into hidream-followup

0e6fa1b

remove readme remarks on only pre-computing when instance prompt is p…

1efdc2a

…rovided and change example to 3d icons

Merge remote-tracking branch 'origin/hidream-followup' into hidream-f…

cb53d9c

…ollowup

smol fix

12afa54

Merge branch 'main' into hidream-followup

dab9132

empty commit

bbf4f1a

linoytsaban merged commit edd7880 into huggingface:main Apr 24, 2025
9 checks passed

linoytsaban deleted the hidream-followup branch April 28, 2025 12:14

[HiDream LoRA] optimizations + small updates #11381

[HiDream LoRA] optimizations + small updates #11381

Uh oh!

Conversation

linoytsaban commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 22, 2025

Uh oh!

linoytsaban commented Apr 22, 2025

Uh oh!

sayakpaul commented Apr 22, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linoytsaban commented Apr 22, 2025

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

linoytsaban commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linoytsaban commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

linoytsaban commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

linoytsaban commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

linoytsaban commented Apr 22, 2025 •

edited

Loading

linoytsaban commented Apr 23, 2025 •

edited

Loading