Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] PixArt-Sigma training pipeline #1341

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from
Draft

Conversation

kabachuha
Copy link
Contributor

@kabachuha kabachuha commented May 19, 2024

Current state

  • Backbone (w/o external deps)
  • Model save-loading (orig format)
  • T5 support, T5's attention_mask carryover, load in 4bit
  • T5 Text embeddings, attention mask caching on disc
  • Wrap up Pixart as a NetworkTrainer for use in any train loops
  • Add pixart blocks for lora/etc listings
  • Combine T5 and SDXL vae in checkpoint when saving, recommended by FurkanGozukara
  • Model inference for sampling
  • Do test launch on base and lora/etc to test compat and debug
  • Test aspect-ratio conditioning
  • Diffusers format save mode and other leftover TODOs
  • Setup a ShareCaption/CogVLM2/LLaVA*/etc. InternLM-XComposer2-4KHD-based multimodal prompt enhancer
  • ControlNet-Transformer support for training and inference (waiting for PA-Sigma's repo code release)

I'm very excited to work with PixArt and its great size/prompt adherence ratio, in addition to the awesome lora techniques in this repo, so if it goes well it should start working in a couple of days

Addresses #979

PixArt repo: https://github.com/PixArt-alpha/PixArt-sigma

I'm very likely going to edit this post with updates, comments, pictures quite often

@FurkanGozukara
Copy link

awesome will this have single safetensors file ? PixArt could be the future if SD3 never get released

@kabachuha
Copy link
Contributor Author

awesome will this have single safetensors file

Yes, ofc the diffusion transformer is saved in safetensors format, if the safetensors option is specified.

As to embedding T5 and the SD vae inside along with it, it may be an option, but not sure about it's being practical, as the original PA format loads them from the separated hf files and the existing workflows such as in comfy load them from the different subfolders (T5 and SDXL VAE) as well

PixArt could be the future if SD3 never get released

Yes the SD3's future is clouded. Additionally, it's parameters count is much bigger than PA-Sigma with comparable quality and prompt adherence as we see from the api examples, meaning it's more accessible and faster, enabling easier community fine-tuning. And regardless of the things, we have PA now, and we can experiment with different things and techniques on Diffusion Transformers, that can be handy when another transformer-like diffusion is released

@FurkanGozukara
Copy link

@kabachuha can we make it include text encoder and vae as well like single sdxl checkpoints and load from it?

I plan to make a standalone gradio app and maybe auto1111 adds support later

@kabachuha
Copy link
Contributor Author

Very well, I see your point :)

@raulc0399
Copy link

hi @kabachuha just a fyi:

the sigma and alpha models have scripts for HF Lora training.
https://github.com/PixArt-alpha/PixArt-sigma/blob/master/train_scripts/train_pixart_lora_hf.py
https://github.com/PixArt-alpha/PixArt-alpha/blob/master/train_scripts/train_pixart_lora_hf.py

there are also PRs for text-encoder training in parallel.
if you want to integrate those i can provide help with that.

@kabachuha
Copy link
Contributor Author

Yes, I know. But it's a kohya repo, and the lora modules and the training scrips have different styles, so need to suit them :)

If you'd like to help, it would be nice (I can add you to the fork, so you may be able to take over the pr, as I have load this week. Dm or comment here)

@raulc0399
Copy link

@kabachuha yes, please add me i will start next week.
do you also have an overview of what has been done so far and what needs to be done?

@kabachuha
Copy link
Contributor Author

Added :) And see the list

@raulc0399
Copy link

raulc0399 commented May 31, 2024

@kabachuha so the still open todos are following. looks like mostly testing?

  • Combine T5 and SDXL vae in checkpoint when saving, recommended by FurkanGozukara
  • Do test launch on base and lora/etc to test compat and debug
  • Test aspect-ratio conditioning
  • Diffusers format save mode and other leftover TODOs
  • Setup a ShareCaption/CogVLM2/LLaVA*/etc. InternLM-XComposer2-4KHD-based multimodal prompt enhancer

do you have any more details on the "Combine T5 and SDXL vae" and "other leftover TODOs" ?

@AtomisteBX
Copy link

Any news? SD3 is a disappointment so maybe there should be more focus towards training Pixart Sigma...

@DanPli
Copy link

DanPli commented Jul 4, 2024

"Combine T5 and SDXL vae in checkpoint when saving" Is not a good idea, if the T5/vae (mostly the T5 checkpoint) gets multiplied by this, people will just go use other training scripts that make alot smaller checkpoints, and rightfully so. Storage space reqs would explode with increasing amounts of custom networks trained or downloaded. You should have one T5 and a VAE separate for not only this reason, also to be able to modularly approach and substitute other future Text or Autoencoders and combine them freely with any finetune/network you have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants