Skip to content

SkyReels Hunyuan T2V & I2V #10837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 21, 2025
Merged

SkyReels Hunyuan T2V & I2V #10837

merged 11 commits into from
Feb 21, 2025

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Feb 20, 2025

Adds support for SkyReels-V1. Also fixes #10768 and overrides #10567

Repo: https://github.com/SkyworkAI/SkyReels-V1

The authors used the diffusers model implementation and created the pipeline themselves, so all credits to them! This is just a quick ctrl c+v

T2V:

import torch
import torch._dynamo.config
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

torch._dynamo.config.inline_inbuilt_nn_modules = True

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer_model_id = "Skywork/SkyReels-V1-Hunyuan-T2V"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    transformer_model_id, torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
pipe.vae.enable_tiling()
pipe.to("cuda")

output = pipe(
    prompt="A cat walks on the grass, realistic",
    negative_prompt="Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion",
    height=544,
    width=960,
    num_frames=97,
    guidance_scale=1.0,
    true_cfg_scale=6.0,
    num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)

I2V:

import torch
from diffusers import HunyuanSkyreelsImageToVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import load_image, export_to_video

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer_model_id = "Skywork/SkyReels-V1-Hunyuan-I2V"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    transformer_model_id, torch_dtype=torch.bfloat16
)
pipe = HunyuanSkyreelsImageToVideoPipeline.from_pretrained(
    model_id, transformer=transformer, torch_dtype=torch.float16
)
pipe.vae.enable_tiling()
pipe.to("cuda")

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
negative_prompt = "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion"
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30
).frames[0]
export_to_video(output, "output_i2v.mp4", fps=15)
T2V I2V
output.mp4
output_i2v.mp4

cc @asomoza @Langdx @Howe2018

a-r-r-o-w and others added 8 commits February 20, 2025 00:18
Co-Authored-By: Langdx <82783347+Langdx@users.noreply.github.com>
Co-Authored-By: howe <howezhang2018@gmail.com>
@a-r-r-o-w a-r-r-o-w added the roadmap Add to current release roadmap label Feb 20, 2025
@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu February 20, 2025 02:11
@@ -325,7 +325,7 @@ def encode_prompt(
)

if pooled_prompt_embeds is None:
if prompt_2 is None and pooled_prompt_embeds is None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pooled_prompt_embeds can never be None otherwise we'll error out in transformer. Change not required for integration but we should do it anyway (perhaps in separate PR if not here)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! thank you!

@a-r-r-o-w a-r-r-o-w merged commit e3bc4aa into main Feb 21, 2025
14 of 15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the integrations/skyreels-v1 branch February 21, 2025 01:18
@tin2tin
Copy link

tin2tin commented Feb 21, 2025

Pretty heavy models to run locally. I hope will be some hints on how to run it on consumer hardware?

@nitinmukesh
Copy link

Pretty heavy models to run locally. I hope will be some hints on how to run it on consumer hardware?

int4 didn't run on 24GB?

@tin2tin
Copy link

tin2tin commented Feb 22, 2025

I had to bump torch to get the dynamo working. After that I got int4 working. Thank you @nitinmukesh for the int4 version of the transformer.

Is support for Skyworks' A1 (motion transfer) considered?

@a-r-r-o-w
Copy link
Member Author

Oh... the dynamo part is not required 🫠 I was just benchmarking torch compile with the model and forgot to remove when copy-pasting the minimal snippet.

Regarding Skyreels A1, could you open an issue for tracking? Currently, we haven't planned for it due to some other priorities but any community contribution from looking at an open issue would be helpful from the interested folks!

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Feb 22, 2025

Also, ideally with group offloading (and possibly combining layerwise upcasting), we have the lowest VRAM requirements (without much generation time overhead) I believe. Currently, we're limited by CPU RAM, but we'll try to improve on that asap :)

@tin2tin
Copy link

tin2tin commented Feb 22, 2025

Lol, didn't understand the last part. No worries. In general, if a new model is VRAM intensive and the Diffusers team has knowledge of ways to improve that, please include some hints in the model card.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hunyuan video not support negative prompt?
5 participants