-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support HunYuan DiT #1378
base: dev
Are you sure you want to change the base?
support HunYuan DiT #1378
Conversation
For anyone who want to try HunYuan but don't want to download the original 44GB files: 8GB only here. |
Requirements for lora/lycoris training on HunYuan:
|
currently fp16(mixed) will cause nan loss |
FP16 fixed |
* support hunyuan lora in CLIP text encoder and DiT blocks * add hunyuan lora test script * append lora blocks in target module --------- Co-authored-by: leoriojhli <leoriojhli@tencent.com>
* support hunyuan lora in CLIP text encoder and DiT blocks * add hunyuan lora test script * append lora blocks in target module * Support HunYuanDiT v1.1 and v1.2 lora --------- Co-authored-by: leoriojhli <leoriojhli@tencent.com>
* add use_extra_cond for hy_train_network * change model version * Update hunyuan_train_network.py
* add use_extra_cond for hy_train_network * change model version * Update hunyuan_train_network.py * Update hunyuan_train_network.py * Update hunyuan_train.py
If HunyuanDIT also uses the VAE from sdxl, does that mean the prepare bucket latents can reuse the data from sdxl? |
Yes, and kohya's latent caching will only check the size. |
OK, Thanks |
I was trying to run the code below in root = "/workspace/models/hunyuan/HunYuanDiT-V1.2-fp16-pruned/"
denoiser, patch_size, head_dim = DiT_g_2(input_size=(128, 128))
sd = torch.load(os.path.join(root, "denoiser/pytorch_model_module.pt"))
denoiser.load_state_dict(sd)
denoiser.half().cuda()
denoiser.enable_gradient_checkpointing()
clip_tokenizer = AutoTokenizer.from_pretrained(os.path.join(root, "clip"))
clip_encoder = BertModel.from_pretrained(os.path.join(root, "clip")).half().cuda()
mt5_embedder = MT5Embedder(os.path.join(root, "mt5"), torch_dtype=torch.float16, max_length=256)
vae = AutoencoderKL.from_pretrained(os.path.join(root, "vae")).half().cuda()
print(sum(p.numel() for p in denoiser.parameters()) / 1e6)
print(sum(p.numel() for p in mt5_embedder.parameters()) / 1e6)
print(sum(p.numel() for p in clip_encoder.parameters()) / 1e6)
print(sum(p.numel() for p in vae.parameters()) / 1e6) but failed with
|
1.2 delete style_embedder.weight, wait they fix it. |
I think the problem is we need --extra_cond arg for v1.0/1.1 to enable extra cond Not sure if you have implemented this into train network script |
v1.2 with |
You should disable extra cond for v1.2 |
I mean, v1.2 is expected to set |
I followed the instructions for installing kohya-ss (sd-scripts) for hunyuan that they provided on their website. It seems to start training okay, but crashes when generating sample images. Any ideas what the issue might be here?
I'll try to look into it tomorrow, but thought I'd post here in case someone else has seen this one. Edit: or is sample generation while training just not supported yet? I've now found the separate inference script that's provided, and it works to produce images:
|
sample images during training for HY currently haven't been implemented |
hunyuandit 1.2 loss nan. what may the issue be? |
[WIP] This PR is a draft PR for contributors to check the progress and review the codes.
This PR starts with a simple implementation by me for minimal inference and some modifications:
HunYuanDiT
to avoid the requirements of argparsemax_length_clip=77
hunyuan_test.py
. But it can be seen as a minimal inference scriptNotes about loading model
The directory structure I used is:
basically download files from the t2i folder of HunYuanDiT
and put the content of
clip_text_encoder
andtokenizer
intoclip
.put
mt5
intomt5
, putmodel
intodenoiser
, putsdxl-vae-fp16-fix
intovae
This spec can be changed if needed.
TODO List
sdxl_train.py
)sdxl_train_network.py
)Low Priority TODO List
Notification to contributors
create_network
method from imported network module will work correctly.sdxl_train.py
andsdxl_train_network.py
and the dataset things carefully before starting development. It is very likely that we only need few modification to make things work. Try to avoid any "fully rework".