candle-video

Rust library for AI video generation built on the Candle ML framework. High-performance, standalone video generation inference without Python runtime dependencies.

📚 Table of Contents

What is this?
Key Features
Demonstration
System Requirements
Installation & Setup
How to Start Using
CLI Options
Supported Model Versions
Memory Optimization
Project Structure
Acknowledgments
License

✨ What is this?

candle-video is a Rust-native implementation of video generation models, targeting deployment scenarios where startup time, binary size, and memory efficiency matter. It provides inference for state-of-the-art text-to-video models without requiring a Python runtime.

Supported Models

LTX-Video — Text-to-video generation using DiT (Diffusion Transformer) architecture
- 2B and 13B parameter variants
- Standard and distilled versions (0.9.5 – 0.9.8)
- T5-XXL text encoder with GGUF quantization support
- 3D VAE for video encoding/decoding
- Flow Matching scheduler

🚀 Key Features

High Performance — Native Rust with GPU acceleration via CUDA/cuDNN
Memory Efficient — BF16 inference, VAE tiling/slicing, GGUF quantized text encoders
Flexible — Run on CPU or GPU, with optional Flash Attention v2
Standalone — No Python runtime required in production
Fast Startup — ~2 seconds vs ~15-30 seconds for Python/PyTorch

Hardware Acceleration

Feature	Description
`flash-attn`	Flash Attention v2 for efficient attention (default)
`cudnn`	cuDNN for faster convolutions (default)
`mkl`	Intel MKL for optimized CPU operations (x86_64)
`accelerate`	Apple Accelerate for Metal (macOS)
`nccl`	Multi-GPU support via NCCL

🎬 Demonstration

Model	Video	Prompt
LTX-Video-0.9.5		The waves crash against the jagged rocks of the shoreline, sending spray high into the air...
LTX-Video-0.9.8-2b-distilled		A woman with blood on her face and a white tank top looks down and to her right...

More examples in examples.

🖥️ System Requirements

Prerequisites

Rust 1.82+ (Edition 2024)
CUDA Toolkit 12.x (for GPU acceleration)
cuDNN 8.x/9.x (optional, for faster convolutions)
hf

Approximate VRAM Requirements (512×768, 97 frames)

Full model: ~8-12GB
With VAE tiling: ~8GB
With GGUF T5: saves ~8GB additional

🛠️ Installation & Setup

Add to your project

[dependencies]
candle-video = { git = "https://github.com/FerrisMind/candle-video" }

Build from source

# Clone the repository
git clone https://github.com/FerrisMind/candle-video.git
cd candle-video

# Default build (CUDA + cuDNN + Flash Attention)
cargo build --release

# CPU-only build
cargo build --release --no-default-features

# With specific features
cargo build --release --features "cudnn,flash-attn"

Model Weights

Download from oxide-lab/LTX-Video-0.9.8-2B-distilled:

huggingface-cli download oxide-lab/LTX-Video-0.9.8-2B-distilled --local-dir ./models/ltx-video

Note: This is the same official version of Lightricks/LTX-Video model, , but the repository contains all the necessary files at once. You don't need to individually search for everything

Required files for diffusers model versions::

transformer/diffusion_pytorch_model.safetensors — DiT model
vae/diffusion_pytorch_model.safetensors — 3D VAE
text_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf — Quantized T5
text_encoder_gguf/tokenizer.json — T5 tokenizer

Required files for official model versions:

ltxv-2b-0.9.8-distilled.safetensors — DiT + 3D VAE in single file
text_encoder_gguf/t5-v1_1-xxl-encoder-Q5_K_M.gguf — Quantized T5
text_encoder_gguf/tokenizer.json — T5 tokenizer

📖 How to Start Using

Using Local Weights Examples (Recommended)

For diffusers model versions:

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video \
    --ltxv-version 0.9.5 \
    --prompt "A cat playing with a ball of yarn"

For official model versions:

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video-model \
    --unified-weights ./models/ltx-video-model.safetensors \
    --ltxv-version 0.9.8-2b-distilled \
    --prompt "A cat playing with a ball of yarn"

Fast Preview (Lower Resolution)

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video-model \
    --unified-weights ./models/ltx-video-model.safetensors \
    --ltxv-version 0.9.8-2b-distilled \
    --prompt "A cat playing with a ball of yarn" \
    --height 256 --width 384 --num-frames 25

Low VRAM Mode

cargo run --example ltx-video --release --features flash-attn,cudnn -- \
    --local-weights ./models/ltx-video \
    --prompt "A majestic eagle soaring over mountains" \
    --vae-tiling --vae-slicing

CLI Options

Argument	Default	Description
`--prompt`	"A video of a cute cat..."	Text prompt for generation
`--negative-prompt`	""	Negative prompt
`--height`	512	Video height (divisible by 32)
`--width`	768	Video width (divisible by 32)
`--num-frames`	97	Number of frames (should be 8n + 1)
`--steps`	(from version config)	Diffusion steps
`--guidance-scale`	(from version config)	Classifier-free guidance scale
`--ltxv-version`	"0.9.5"	Model version
`--local-weights`	(None)	Path to local weights
`--output-dir`	"output"	Directory to save results
`--seed`	random	Random seed for reproducibility
`--vae-tiling`	false	Enable VAE tiling for memory efficiency
`--vae-slicing`	false	Enable VAE batch slicing
`--frames`	false	Save individual PNG frames
`--gif`	false	Save as GIF animation
`--cpu`	false	Run on CPU instead of GPU
`--use-bf16-t5`	false	Use BF16 T5 instead of GGUF quantized
`--unified-weights`	(None)	Path to unified safetensors file

Supported Model Versions

Version	Parameters	Steps	Guidance	Notes
`0.9.5`	2B	40	3.0	Standard model
`0.9.6-dev`	2B	40	3.0	Development version
`0.9.6-distilled`	2B	8	1.0	Fast inference
`0.9.8-2b-distilled`	2B	7	1.0	Latest distilled
`0.9.8-13b-dev`	13B	30	8.0	Large model
`0.9.8-13b-distilled`	13B	7	1.0	Large distilled

Memory Optimization

For limited VRAM:

# VAE tiling - processes image in tiles
--vae-tiling

# VAE slicing - processes batches sequentially
--vae-slicing

# Lower resolution
--height 256 --width 384

# Fewer frames
--num-frames 25

Project Structure

candle-video/
├── src/
│   ├── lib.rs                    # Library entry point
│   └── models/
│       └── ltx_video/            # LTX-Video implementation
│           ├── ltx_transformer.rs    # DiT transformer
│           ├── vae.rs                # 3D VAE
│           ├── text_encoder.rs       # T5 text encoder
│           ├── quantized_t5_encoder.rs # GGUF T5 encoder
│           ├── scheduler.rs          # Flow matching scheduler
│           ├── t2v_pipeline.rs       # Text-to-video pipeline
│           ├── loader.rs             # Weight loading
│           └── configs.rs            # Model version configs
├── examples/
│   └── ltx-video/                # Main CLI example
├── tests/                        # Parity and unit tests
├── scripts/                      # Python reference generators
└── benches/                      # Performance benchmarks

🙏 Acknowledgments

Candle — Minimalist ML framework for Rust
Lightricks LTX-Video — Original LTX-Video model
diffusers — Reference implementation

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
benches		benches
docs		docs
examples		examples
prebuilt		prebuilt
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.PT_BR.md		README.PT_BR.md
README.RU.md		README.RU.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

candle-video

📚 Table of Contents

✨ What is this?

Supported Models

🚀 Key Features

Hardware Acceleration

🎬 Demonstration

🖥️ System Requirements

Prerequisites

Approximate VRAM Requirements (512×768, 97 frames)

🛠️ Installation & Setup

Add to your project

Build from source

Model Weights

📖 How to Start Using

Using Local Weights Examples (Recommended)

Fast Preview (Lower Resolution)

Low VRAM Mode

CLI Options

Supported Model Versions

Memory Optimization

Project Structure

🙏 Acknowledgments

License

About

Uh oh!

Releases 1

Languages

License

FerrisMind/candle-video

Folders and files

Latest commit

History

Repository files navigation

candle-video

📚 Table of Contents

✨ What is this?

Supported Models

🚀 Key Features

Hardware Acceleration

🎬 Demonstration

🖥️ System Requirements

Prerequisites

Approximate VRAM Requirements (512×768, 97 frames)

🛠️ Installation & Setup

Add to your project

Build from source

Model Weights

📖 How to Start Using

Using Local Weights Examples (Recommended)

Fast Preview (Lower Resolution)

Low VRAM Mode

CLI Options

Supported Model Versions

Memory Optimization

Project Structure

🙏 Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages