Skip to content

Add stochastic sampling to FlowMatchEulerDiscreteScheduler #11369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 22, 2025

Conversation

apolinario
Copy link
Collaborator

@apolinario apolinario commented Apr 19, 2025

What does this PR do?

This PR adds stochastic sampling to FlowMatchEulerDiscreteScheduler based on Lightricks/LTX-Video@b1aeddd ltx_video/schedulers/rf.py, which was added with th release of 0.9.6-distilled. I decoupled the next and current sigma to try to get closer to the rf.py implementation of the stochastic sampling, but a second pair of eyes on this would be great.

To try it:

import torch
from diffusers import LTXVideoTransformer3DModel, FlowMatchEulerDiscreteScheduler, LTXPipeline
from diffusers.utils import export_to_video

transformer = LTXVideoTransformer3DModel.from_pretrained(
    "multimodalart/ltxv-2b-0.9.6-distilled",
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
    variant="bf16"
)

scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(
    "multimodalart/ltxv-2b-0.9.6-distilled",
    subfolder="scheduler"
)

pipe = LTXPipeline.from_pretrained(
    "Lightricks/LTX-Video-0.9.5",
    transformer=transformer,
    scheduler=scheduler, #add or remove the scheduler to see the difference
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

prompt = "A woman eating a burger"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
generator = torch.Generator(device="cuda").manual_seed(42)
video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=1216,
    height=704,
    num_frames=121,
    num_inference_steps=8,
    guidance_scale=1,
    generator=generator
).frames[0]

export_to_video(video, "distilled_scheduler.mp4", fps=24)

Who can review?

@yiyixuxu

This PR adds stochastic sampling to FlowMatchEulerDiscreteScheduler based on Lightricks/LTX-Video@b1aeddd  ltx_video/schedulers/rf.py
@apolinario apolinario requested a review from yiyixuxu April 19, 2025 17:54
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@apolinario
Copy link
Collaborator Author

@bot /style

Copy link
Contributor

Style fixes have been applied. View the workflow run here.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @apolinario

dt = sigma_next - sigma

prev_sample = sample + dt * model_output
# Determine whether to use stochastic sampling for this step
use_stochastic = stochastic_sampling if stochastic_sampling is not None else self.config.stochastic_sampling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just have this in config is enough no?

@apolinario
Copy link
Collaborator Author

@bot /style

Copy link
Contributor

Style fixes have been applied. View the workflow run here.


current_sigma = per_token_sigmas[..., None]
next_sigma = lower_sigmas[..., None]
dt = next_sigma - current_sigma # Equivalent to sigma_next - sigma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apolinario
here it seems to reversed, no?
before:
dt = (per_token_sigmas - lower_sigmas)[..., None]

now:
dt = ower_sigmas - per_token_sigmas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@nitinmukesh
Copy link

Quick question, Should it be LTXPipeline or LTXConditionPipeline?
0.9.5 support was added in LTXConditionPipeline.

@yiyixuxu yiyixuxu merged commit 6ab62c7 into main Apr 22, 2025
15 checks passed
@yiyixuxu
Copy link
Collaborator

thanks @apolinario !

@yiyixuxu
Copy link
Collaborator

@nitinmukesh I think it's probably better in LTXConditionPipeline, can you try it out?

@nitinmukesh
Copy link

@apolinario

Thank you for adding the sampling.

Please could you share few sample outputs which you created. I am not getting good results so want to compare if something wrong in code.
#11359

@Ednaordinary
Copy link
Contributor

Ednaordinary commented May 8, 2025

Quick question, Should it be LTXPipeline or LTXConditionPipeline? 0.9.5 support was added in LTXConditionPipeline.

LTXPipeline. The model here is 0.9.6-distilled (the only one that uses the stochastic sampling as of now). "0.9.5" is included because the transformer and scheduler from 0.9.6 are inserted, which is fine because nothing else in the pipeline is different from 0.9.5 and there's currently nothing the lightricks 0.9.6 repos. 0.9.6-distilled is guidance distilled so it does not work in the condition pipeline, while 0.9.6 does

@Ednaordinary
Copy link
Contributor

Ednaordinary commented May 8, 2025

Also, I find increasing the schedulers shift while using distilled helps to boost coherence. This is inline with FastVideo (PCM distillation) which says to set the shift to 17. The 1.0 in https://huggingface.co/multimodalart/ltxv-2b-0.9.6-distilled/blob/main/scheduler/scheduler_config.json doesn't seem like a good default. I'm unsure what it is in the original LTX repo

Some different shifts

0.25

0.25.mp4

0.5

0.5.mp4

1.0

1.mp4

2.0

2.mp4

4.0

4.mp4

8.0

8.mp4

16.0

16.mp4

32

32.mp4

64

64.mp4

16 seems like a good default

(tested by adding pipe.scheduler._shift = 16.0 somewhere between pipe init and pipe call)

@nitinmukesh
Copy link

Thank you @Ednaordinary

The information you provided is very helpful. Getting better results than before.

distilled_scheduler1.mp4

@nitinmukesh
Copy link

249 frames

distilled_scheduler2.mp4

@Ednaordinary
Copy link
Contributor

Looks great! What I've noticed so far is that the background is often very repetitive in a weird way like shown in your 249 frame example. Sometimes this can be solved by increasing the shift to an insanely large amount (think in the 200s) but that also incurs everything else that comes from running shift that high (eventually, everything just turns into blobs)

@nitinmukesh
Copy link

Sure will try that, thank you. Next gonna try if distilled support image to video (LTXImageToVideoPipeline).
The speed of the models is OoTW, all these generated on 8 GB VRAM + 16 GB RAM and only 3 minutes for 249 frames.

@nitinmukesh
Copy link

nitinmukesh commented May 10, 2025

I2V is working good.

distilled_scheduler6.mp4

Result from 0.9.1 using same image
newgenai79/sd-diffuser-webui#11

@nitinmukesh
Copy link

@Ednaordinary

What do you suggest for 0.9.6 dev, LTXPipeline or Conditioning pipeline, in case you tried.

@Ednaordinary
Copy link
Contributor

0.9.6 should be the condition pipeline I'm pretty sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants