-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] PixArt-Sigma training pipeline #1341
base: main
Are you sure you want to change the base?
Conversation
awesome will this have single safetensors file ? PixArt could be the future if SD3 never get released |
Yes, ofc the diffusion transformer is saved in safetensors format, if the safetensors option is specified. As to embedding T5 and the SD vae inside along with it, it may be an option, but not sure about it's being practical, as the original PA format loads them from the separated hf files and the existing workflows such as in comfy load them from the different subfolders (T5 and SDXL VAE) as well
Yes the SD3's future is clouded. Additionally, it's parameters count is much bigger than PA-Sigma with comparable quality and prompt adherence as we see from the api examples, meaning it's more accessible and faster, enabling easier community fine-tuning. And regardless of the things, we have PA now, and we can experiment with different things and techniques on Diffusion Transformers, that can be handy when another transformer-like diffusion is released |
@kabachuha can we make it include text encoder and vae as well like single sdxl checkpoints and load from it? I plan to make a standalone gradio app and maybe auto1111 adds support later |
Very well, I see your point :) |
hi @kabachuha just a fyi: the sigma and alpha models have scripts for HF Lora training. there are also PRs for text-encoder training in parallel. |
Yes, I know. But it's a kohya repo, and the lora modules and the training scrips have different styles, so need to suit them :) If you'd like to help, it would be nice (I can add you to the fork, so you may be able to take over the pr, as I have load this week. Dm or comment here) |
@kabachuha yes, please add me i will start next week. |
Added :) And see the list |
@kabachuha so the still open todos are following. looks like mostly testing?
do you have any more details on the "Combine T5 and SDXL vae" and "other leftover TODOs" ? |
Any news? SD3 is a disappointment so maybe there should be more focus towards training Pixart Sigma... |
"Combine T5 and SDXL vae in checkpoint when saving" Is not a good idea, if the T5/vae (mostly the T5 checkpoint) gets multiplied by this, people will just go use other training scripts that make alot smaller checkpoints, and rightfully so. Storage space reqs would explode with increasing amounts of custom networks trained or downloaded. You should have one T5 and a VAE separate for not only this reason, also to be able to modularly approach and substitute other future Text or Autoencoders and combine them freely with any finetune/network you have. |
Current state
ShareCaption/CogVLM2/LLaVA*/etc.InternLM-XComposer2-4KHD-based multimodal prompt enhancerI'm very excited to work with PixArt and its great size/prompt adherence ratio, in addition to the awesome lora techniques in this repo, so if it goes well it should start working in a couple of days
Addresses #979
PixArt repo: https://github.com/PixArt-alpha/PixArt-sigma
I'm very likely going to edit this post with updates, comments, pictures quite often