Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| def patchify(x, patch_size): | ||
| # YiYi TODO: refactor this | ||
| from einops import rearrange |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Hi, I think it might work for newer versions of torch: https://github.com/arogozhnikov/einops/wiki/Using-torch.compile-with-einops
|
@yiyixuxu thanks for releasing this so quickly! we are having some issues trying to get 5b i2v work. afai understand 5b is both for t2v and i2v. i tried a naive hack to copy the model.index.json of the 14b i2v but it didn't quite help. |
|
@okaris 5b i2v is not supported yet - will look to add it today |
|
@yiyixuxu thanks for the quick reply. happy to contribute if you can point me in the right direction. |
Co-authored-by: bagheera <59658056+bghira@users.noreply.github.com>
a-r-r-o-w
left a comment
There was a problem hiding this comment.
Thanks YiYi! Just nits. Will add docs in follow-up as discussed. I think we should remove the changes to the test files here (Wan2.2 dual transformer should be tested separately instead of combining with Wan2.1 tests, such that both are fully tested).
| CACHE_T = 2 | ||
|
|
||
|
|
||
| class AvgDown3D(nn.Module): |
There was a problem hiding this comment.
Maybe prefix these classes with Wan to follow same naming convention
| 2.8251, | ||
| 1.9160, | ||
| ], | ||
| is_residual: bool = False, |
There was a problem hiding this comment.
LGTM for now, but ideally, we should just make a separate AutoencoderKLWan2_2 because the structure and internal blocks is different and try to standardize having single-file implementations per model type, similar to transformers. All the if-branching makes things a little harder to reverse engineer and increases barrier for entry for someone wanting to look at the implementations for study purposes IMO.
| shift_msa, scale_msa, gate_msa, c_shift_msa, c_scale_msa, c_gate_msa = ( | ||
| self.scale_shift_table + temb.float() | ||
| ).chunk(6, dim=1) | ||
| if temb.ndim == 4: |
There was a problem hiding this comment.
Same comment as VAE, ideally this should be in separate transformer implementation, transformer_wan_2_2.py, if we want to adopt single file properly
There was a problem hiding this comment.
sounds good
I think vae can have its own class, feel free to refactor it if you prefer!
transformer change is really minimum and we could refactor further so it only a single code path, i.e. we just need to always expand timesteps inputs to be 2d. ( I did not have time to test it out so I kept if else here)
|
Hello, @yiyixuxu, I generated a video (https://github.com/user-attachments/assets/ce6ebaf1-8478-4c29-9170-57d5ae854a7d) using the code below and noticed a slight grainy texture. Is this expected behavior, and does it match the results you observed during your testing? `import torch dtype = torch.bfloat16 model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers" height = 704 prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." output = pipe( |
|
Hi, It does not seem to work? Although the checklist mentions multi-gpu support, I'm not sure if that's for the diffusers version? |
* support wan 2.2 i2v * add t2v + vae2.2 * add conversion script for vae 2.2 * add * add 5b t2v * conversion script * refactor out reearrange * remove a copied from in skyreels * Apply suggestions from code review Co-authored-by: bagheera <59658056+bghira@users.noreply.github.com> * Update src/diffusers/models/transformers/transformer_wan.py * fix fast tests * style --------- Co-authored-by: bagheera <59658056+bghira@users.noreply.github.com>
install from PR
TI2V (only Text-to-image is supported for now, adding I2V soon)
14B T2V
14B I2V