Add cross attention type for Sana-Sprint training in diffusers. by scxue · Pull Request #11514 · huggingface/diffusers

scxue · 2025-05-07T08:28:51Z

Add cross attention type for Sana-Sprint training in diffusers. @sayakpaul

sayakpaul · 2025-05-07T08:30:40Z

src/diffusers/models/transformers/sana_transformer.py

+        elif cross_attention_type == "vanilla":
+            cross_attention_processor = SanaAttnProcessor3_0()


Can't we modify the SanaAttnProcessor2_0() class to handle the changes of SanaAttnProcessor3_0?

If we merge 2_0 and 3_0, we then need a variance to check when to use the function here:

diffusers/src/diffusers/models/transformers/sana_transformer.py

Line 257 in 9cb050b

hidden_states = self.scaled_dot_product_attention(

which will be similar with cross_attention_type: str = "flash",

@sayakpaul

sayakpaul · 2025-05-07T08:31:06Z

src/diffusers/models/transformers/sana_transformer.py

        guidance_embeds_scale: float = 0.1,
        qk_norm: Optional[str] = None,
        timestep_scale: float = 1.0,
+        cross_attention_type: str = "flash",


This goes a bit against our design.

Then can we just separate it into two classes and let u to help for better implementation?

Actually, the only difference is that F.scaled_dot_product_attention is not supported by torch.JVP. Therefore, during training we need to replace with the vanilla attention implementation. Any good idea how to merge these two? @sayakpaul

Ah I see. If that is the case, I think we should through the attention processor mechanism wherein, we use something like set_attn_processor and use the vanilla attention processor class.

If this is only needed for training, I think we should have the following methods added to the model class:

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 315 in fb29132

def set_attn_processor(self, processor: Union[AttentionProcessor, Dict[str, AttentionProcessor]]):

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 291 in fb29132

def attn_processors(self) -> Dict[str, AttentionProcessor]:

We can then just include the vanilla attention processor implementation in the training utility and do something like

model = SanaTransformer2DModel(...) model.set_attn_processor(SanaVanillaAttnProcessor())

WDYT? @DN6 any suggestion?

Oh this is cool and nusty IMO, thanks. I'll change the code.

sayakpaul · 2025-05-07T15:29:39Z

examples/research_projects/sana/train_sana_sprint_diffusers.py

@@ -0,0 +1,1656 @@
+#!/usr/bin/env python


This is perfect! This is 100 percent the way to go here. We can include the attention processor here in a file (attention_processor.py) and use it from there in the training script.

Based on https://github.com/huggingface/diffusers/pull/11514/files#r2077921763.

Since we're using a folder for the training script, I won't mind if we want move out the dataloader into a separate script, utilities in a separate script. But completely up to you.

I don't mind it. Could you help for this one? :)

Yes, after the https://github.com/huggingface/diffusers/pull/11514/files#r2077921763 comments are addressed, I will help with that

…SanaAttnProcessor3_0` to `SanaVanillaAttnProcessor`

lawrence-cj · 2025-05-08T07:30:55Z

I have changed the code as recommended here: https://github.com/huggingface/diffusers/pull/11514/files#r2077921763. I hope it's what you mean. @sayakpaul
Let @scxue help to check if my change is correct.

scxue · 2025-05-08T10:06:20Z

Tested locally after adding SanaVanillaAttnProcessor imports — the changes work as expected. LGTM! @lawrence-cj @sayakpaul

sayakpaul · 2025-05-08T10:14:14Z

examples/research_projects/sana/train_sana_sprint_diffusers.sh

+
+huggingface-cli download Efficient-Large-Model/SANA_Sprint_1.6B_1024px_teacher_diffusers  --local-dir $your_local_path/SANA_Sprint_1.6B_1024px_teacher_diffusers
+
+python train_sana_sprint_diffusers.py \


This is perfect!

Do we want to move the dataset class into a separate file dataset.py? I am okay if we want to do that since it's already under research_projects.

Also, let's add a readme with instructions on how to acquire the dataset, etc. Currently, we're only using three shards I think.

I agree with u. Please help for this separated script!

@scxue Help for the readme part pls!

README updated! @sayakpaul – Feel free to share any feedback or suggestions.

sayakpaul

Looking very nice. Some minor comments and we should be able to merge soon.

sayakpaul

Let's go!

examples/research_projects/sana/README.md

HuggingFaceDocBuilderDev · 2025-05-08T12:29:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

examples/research_projects/sana/README.md

PeiqinSun · 2025-06-28T11:05:04Z

Has anyone successfully run this? I'm encountering various runtime errors, especially when using the log_validation function. The __call__ method in the sana_sprint_pipeline seems incorrect. Could you please review it before releasing it as open source?

scxue · 2025-06-29T02:08:41Z

Thanks for pointing it out! It looks like the log_validation function has some buggy implementations. I will fix it with @lawrence-cj @sayakpaul.

sayakpaul · 2025-06-29T05:33:58Z

@PeiqinSun mistakes can happen and there are ways to point it out. This is point of making things openly available.

PeiqinSun · 2025-06-29T08:55:33Z

Thanks to @scxue and @sayakpaul for the prompt response. I look forward to your fixes.

scxue added 2 commits May 7, 2025 16:16

test permission

5d9a5da

Add cross attention type for Sana-Sprint.

1123ee7

sayakpaul reviewed May 7, 2025

View reviewed changes

scxue and others added 2 commits May 7, 2025 17:33

Add Sana-Sprint training script in diffusers.

acefec8

make style && make quality;

9cb050b

lawrence-cj mentioned this pull request May 7, 2025

Add train_sana_sprint_diffusers file NVlabs/Sana#251

Draft

sayakpaul reviewed May 7, 2025

View reviewed changes

lawrence-cj mentioned this pull request May 8, 2025

Any diffusers version training code? NVlabs/Sana#259

Open

lawrence-cj added 2 commits May 8, 2025 15:29

modify the attention processor with set_attn_processor and change `…

86bef58

…SanaAttnProcessor3_0` to `SanaVanillaAttnProcessor`

Merge branch 'main' into main

c190600

Add import for SanaVanillaAttnProcessor

6c3a398

sayakpaul reviewed May 8, 2025

View reviewed changes

sayakpaul approved these changes May 8, 2025

View reviewed changes

Add README file.

04e1b02

sayakpaul approved these changes May 8, 2025

View reviewed changes

sayakpaul added 3 commits May 8, 2025 17:46

Apply suggestions from code review

5951f8f

Merge branch 'main' into main

93c3b4d

style

566aa64

lawrence-cj reviewed May 8, 2025

View reviewed changes

examples/research_projects/sana/README.md Outdated Show resolved Hide resolved

Update examples/research_projects/sana/README.md

740baa9

sayakpaul merged commit 784db0e into huggingface:main May 8, 2025
8 of 9 checks passed

		elif cross_attention_type == "vanilla":
		cross_attention_processor = SanaAttnProcessor3_0()


		huggingface-cli download Efficient-Large-Model/SANA_Sprint_1.6B_1024px_teacher_diffusers --local-dir $your_local_path/SANA_Sprint_1.6B_1024px_teacher_diffusers

		python train_sana_sprint_diffusers.py \

Conversation

scxue commented May 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lawrence-cj May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lawrence-cj May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lawrence-cj commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scxue commented May 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

PeiqinSun commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scxue commented Jun 29, 2025

Uh oh!

sayakpaul commented Jun 29, 2025

Uh oh!

PeiqinSun commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lawrence-cj May 7, 2025 •

edited

Loading

lawrence-cj May 7, 2025 •

edited

Loading

lawrence-cj commented May 8, 2025 •

edited

Loading

PeiqinSun commented Jun 28, 2025 •

edited

Loading

PeiqinSun commented Jun 29, 2025 •

edited

Loading