Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
asomoza
left a comment
There was a problem hiding this comment.
thanks, looks great, just a a couple of comments that aren't blockers, just my opinion.
|
|
||
| We will first need to install some addtional dependencies. | ||
|
|
||
| ```shell |
There was a problem hiding this comment.
maybe we should start telling the users what the additional dependencies are and a link to them so they feel more secure and understand what are they installing?
we can add just a link to the pypi page too: https://pypi.org/project/ftfy/
Also now that I see it, maybe this shouldn't be an required dependency but an optional one? I'll take a look later on how it's used.
docs/source/en/api/pipelines/wan.md
Outdated
|
|
||
| ## Recommendations for Inference: | ||
| - Keep `AutencoderKLWan` in `torch.float32` for better decoding quality. | ||
| - `num_frames` should be of the form `4 * k + 1`, for example `49` or `81`. |
There was a problem hiding this comment.
maybe we can be more clear here at write that k is the frames per second or fps in a more common language?
|
|
||
| #### Block Level Group Offloading | ||
|
|
||
| We can reduce our VRAM requirements by applying group offloading to the larger model components of the pipeline; the `WanTransformer3DModel` and `UMT5EncoderModel`. Group offloading will break up the individual modules of a model and offload/onload them onto your GPU as needed during inference. In this example, we'll apply `block_level` offloading, which will group the modules in a model into blocks of size `num_blocks_per_group` and offload/onload them to GPU. Moving to between CPU and GPU does add latency to the inference process. You can trade off between latency and memory savings by increasing or decreasing the `num_blocks_per_group`. |
|
Thank you for this, super useful information. Have been struggling to get Wan i2v and Group Offloading working. I've tried many things to get Wan i2v to work, and properly bnb too. Are quantizations (w. ex. bitsandbytes) supposed to work on Wan too? |
| from diffusers import AutoencoderKLWan, WanTransformer3DModel, WanImageToVideoPipeline | ||
| from diffusers.hooks.group_offloading import apply_group_offloading | ||
| from diffusers.utils import export_to_video, load_image | ||
| from transformers import UMT5EncoderModel, CLIPVisionMode |
There was a problem hiding this comment.
CLIPVisionMode is missing CLIPVisionModel
| "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in " | ||
| "the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." | ||
| ) | ||
| negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards |
What does this PR do?
Based on feedback here
https://huggingface.slack.com/archives/C065E480NN9/p1742176300453069
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.