Description
When I tried using the code at https://github.com/THUDM/CogVideo/blob/main/tools/convert_weight_sat2hf.py with modified parameters—based on the comments in the code, since I wanted to convert the sat version of Tora's i2v to the diffusers version—I only changed a few argument defaults:
--num_layers=42
,--num_attention_heads=48
,--use_rotary_positional_embeddings=True
,--scaling_factor=0.7
,--snr_shift_scale=1.0
,--i2v=True
, and--version=1.0
also tried--version=1.5
.After running the code, I encountered the following errors:
- Size mismatch for
patch_embed.proj.weight
: copying a parameter with shapetorch.Size([3072, 32, 2, 2])
from checkpoint while current model expects shapetorch.Size([3072, 128])
.- Size mismatch for
proj_out.weight
: copying a parameter with shapetorch.Size([64, 3072])
from checkpoint while current model expects shapetorch.Size([128, 3072])
.- Size mismatch for
proj_out.bias
: copying a parameter with shape torch.Size([64]) from checkpoint while current model expects torch.Size([128]).A similar issue was also mentioned here: #26 (comment).
Could you provide suggestions on how to resolve this problem? Additionally, if converting i2v to diffusers version is necessary for execution—aside from whether the transformer checkpoint (
transformer_ckpt
) needs conversion—do existing VAE checkpoints (vae_ckpt
) and text encoders also require conversion using this script? Or does Alibaba plan to release an official diffusers-compatible version of i2v?Looking forward to your reply.
Indeed the weight file of I2V diffusers version is not available. But the i2v_pipeline code is available and diffusers-version/inference.py supports I2V. One needs to convert sat version weights to diffusers version. You can refer to https://github.com/THUDM/CogVideo/blob/main/tools/convert_weight_sat2hf.py in https://github.com/THUDM/CogVideo/#tools to do this.
Originally posted by @zenmequmingzia in #30