draw-realtime

Real-time video-to-video AI diffusion with StreamDiffusion. Transform videos using AI-powered style transfer with side-by-side comparison, multiple model options, and optional 1.58-bit quantization for faster inference.

Features

Core Features

Video-to-Video Processing - Transform entire videos with AI diffusion models
Side-by-Side Comparison - Synchronized playback of input and output
Multiple Models - SD-Turbo, SD 1.5 + LCM, Hyper-SDXL, FLUX.2 Klein
1.58-bit Quantization - BitNet-style PTQ for faster inference and lower memory
Real-time Preview - Watch generation progress with live frame updates
Multi-Style Generation - Generate 5 artistic styles from a single video using LLaVA + FLUX
Text-to-Video Generation - Generate videos from text prompts using MonarchRT / Wan2.1

Input Options

Upload MP4 from browser
Server-side video library
Webcam capture (experimental)

Output Options

Web UI with synchronized playback
CLI for batch processing
REST API for integration

Quick Start

Prerequisites

NVIDIA GPU with CUDA support (RTX 2060+ recommended, 8GB+ VRAM)
Miniconda or Anaconda
Node.js 18+ (for frontend)
ffmpeg

Installation

# Clone repository
git clone https://github.com/jasperan/draw-realtime.git
cd draw-realtime

# Create conda environment
conda create -n streamdiffusion python=3.10 -y
conda activate streamdiffusion

# Install PyTorch with CUDA (choose your CUDA version)
# For CUDA 11.8:
pip install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

# Install StreamDiffusion with TensorRT
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt

# Install project dependencies
pip install -r requirements.txt

# Build frontend
cd frontend && npm install && npm run build && cd ..

Run

./start.sh
# Open http://localhost:7860

First Run:

Models download automatically (~3GB for SD-Turbo)
TensorRT engines compile on first use (5-10 minutes)
Subsequent runs start instantly

Models

Model	FPS*	Quality	VRAM	Description
SD-Turbo	~94	Good	4-5 GB	Default, single-step, fastest
SD-Turbo 1.58-bit	~110+	Good	2-3 GB	Quantized, lower memory
SD 1.5 + LCM	~37	Higher	5-6 GB	4-step with LCM-LoRA
SD 1.5 + LCM 1.58-bit	~45+	Higher	3-4 GB	Quantized, lower memory
Hyper-SDXL	~20	SDXL	8 GB	1-step SDXL quality
FLUX.2 Klein	~8	Highest	10 GB	4B parameter, best quality
MonarchRT Self-Forcing	16*	Good	8+ GB	Real-time autoregressive text-to-video
MonarchRT Wan2.1	0.3*	High	8+ GB	Bidirectional text-to-video, 1.3B params

*FPS measured on RTX 4090 (Self-Forcing) / A10 (Wan2.1)

Text-to-Video Generation (MonarchRT)

Generate videos from text prompts using MonarchRT with Wan2.1 models. MonarchRT uses Monarch matrix attention for efficient Diffusion Transformers.

Sample output (Wan2.1-T2V-1.3B, 21 frames, 832x480, 30 steps on A10):

Prompt: "A golden retriever running through a sunlit meadow with wildflowers, cinematic, beautiful lighting"

View full video

MonarchRT Usage

Web UI: Select "MonarchRT Wan2.1" from the model dropdown. The UI switches to text-to-video mode automatically.

CLI:

# Generate with default settings (21 frames, 832x480)
python cli.py generate "a cat sitting in a garden, cinematic"

# Specify model and frame count
python cli.py generate "ocean waves crashing on rocks" -m monarchrt-wan --frames 81

# Custom output path and seed
python cli.py generate "a futuristic city at night" -o output.mp4 --seed 42

API:

curl -X POST http://localhost:7860/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a cat in a garden", "model": "monarchrt-wan", "num_frames": 21}'

MonarchRT Installation

# Clone MonarchRT into the project
git clone https://github.com/Infini-AI-Lab/MonarchRT.git
cd MonarchRT && pip install -r requirements.txt && python setup.py develop && cd ..

# Download Wan2.1-T2V-1.3B model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir MonarchRT/wan_models/Wan2.1-T2V-1.3B

Requires PyTorch >= 2.8.0, flash-attn, and CUDA GPU with 8+ GB VRAM.

1.58-bit Quantization

This project supports BitNet-style Post-Training Quantization (PTQ) to convert model weights to 1.58-bit ternary format ({-1, 0, +1}). This provides:

~8x smaller weights - Reduced memory bandwidth
15-25% faster inference - Simpler computations
~50% lower VRAM - Run on smaller GPUs
Minimal quality loss - <15% LPIPS degradation

How It Works

The quantization uses absmean scaling:

scale = mean(|W|)           # Per-tensor scale factor
W_ternary = round(W/scale).clamp(-1, 1)  # Ternarize to {-1, 0, +1}

Only the U-Net linear layers are quantized. VAE and text encoder remain in FP16 for quality.

Quantizing Models

# Quantize SD-Turbo
python scripts/quantize_model.py --model sd-turbo

# Quantize SD 1.5 + LCM
python scripts/quantize_model.py --model sd15-lcm

# Quantize both
python scripts/quantize_model.py --model all

# Skip verification (faster)
python scripts/quantize_model.py --model sd-turbo --no-verify

Quantized models are saved to models/quantized/.

Using Quantized Models

Web UI: Select "SD-Turbo 1.58-bit" or "SD 1.5 + LCM 1.58-bit" from the model dropdown.

CLI:

python cli.py input.mp4 -m sd-turbo-1.58bit -s anime-ghibli
python cli.py input.mp4 -m sd15-lcm-1.58bit -p "oil painting style"

API:

curl -X POST http://localhost:7860/api/process \
  -F "video=@input.mp4" \
  -F "model=sd-turbo-1.58bit" \
  -F "prompt=cyberpunk neon city"

Benchmarking

Compare original vs quantized performance:

# Benchmark SD-Turbo
python scripts/benchmark.py --model sd-turbo --iterations 100

# Benchmark all models
python scripts/benchmark.py --all --iterations 50

# Quick benchmark (no quality metrics)
python scripts/benchmark.py --model sd-turbo --no-quality

CLI Usage

# Style preset
python cli.py input.mp4 -s anime-ghibli

# Custom prompt
python cli.py input.mp4 -p "oil painting, vibrant colors"

# Specific model
python cli.py input.mp4 -m sd15-lcm -s fantasy

# Quantized model
python cli.py input.mp4 -m sd-turbo-1.58bit -s cyberpunk-neon

# Process all server videos
python cli.py --process-all -s watercolor

# Multi-style generation (LLaVA + FLUX)
python cli.py multistyle input.mp4

# List options
python cli.py --list-styles
python cli.py --list-models
python cli.py --list-videos

Style Presets

Preset	Description
`anime-ghibli`	Studio Ghibli inspired, soft colors
`anime-cyberpunk`	Anime + cyberpunk, neon, Makoto Shinkai style
`cyberpunk-neon`	Cyberpunk city, neon lights, rain
`oil-painting`	Classical oil painting, rich colors
`watercolor`	Soft watercolor, flowing colors
`fantasy`	Magical fantasy art, ethereal
`dark-gothic`	Dark gothic, moody atmosphere
`comic-pop`	Comic book / pop art style
`photorealistic`	Ultra-detailed photorealistic
`impressionist`	Impressionist painting, Monet style
`pixel-art`	16-bit retro pixel art
`sketch`	Pencil sketch, detailed linework

Multi-Style Generation

Generate 5 artistic variations of a video automatically:

python cli.py multistyle input.mp4

This:

Analyzes the video content using LLaVA (local vision model via Ollama)
Generates descriptions of key frames
Creates 5 style variations using FLUX.2 Klein:
- Oil painting
- Watercolor
- Impressionist
- Pop Art
- Ukiyo-e (Japanese woodblock)
Produces a comparison grid video

Requirements: Ollama with llava and llama3.2 models installed.

API Reference

Endpoint	Method	Description
`/api/settings`	GET	Get models, presets, configuration
`/api/videos`	GET	List server-side videos
`/api/upload`	POST	Upload a video file
`/api/process`	POST	Start video processing job
`/api/job/{id}`	GET	Get job status and progress
`/api/jobs`	GET	List all jobs
`/api/output/{file}`	GET	Download processed video
`/api/preview/{file}`	GET	Get real-time preview frame
`/api/generate`	POST	Start text-to-video generation (MonarchRT)
`/api/multistyle/process`	POST	Start multi-style generation
`/api/multistyle/job/{id}`	GET	Get multi-style job status

Project Structure

draw-realtime/
├── app/                      # Python backend
│   ├── main.py              # FastAPI server
│   ├── pipeline.py          # Model wrapper with switching
│   ├── config.py            # Models, presets, configuration
│   ├── video_processor.py   # Batch video processing
│   ├── monarchrt_pipeline.py # MonarchRT text-to-video wrapper
│   ├── multistyle.py        # LLaVA + FLUX multi-style
│   └── quantization/        # 1.58-bit PTQ module
│       ├── bitlinear.py     # BitLinear layer implementation
│       ├── quantize.py      # Quantization functions
│       └── utils.py         # Save/load utilities
├── scripts/
│   ├── quantize_model.py    # One-time quantization script
│   └── benchmark.py         # Performance comparison
├── frontend/                 # Svelte web UI
│   ├── src/App.svelte       # Main UI component
│   └── build/               # Production build
├── models/
│   └── quantized/           # Quantized model weights
├── videos/                   # Server-side videos
├── uploads/                  # User uploads
├── outputs/                  # Processed videos
├── engines/                  # TensorRT cached engines
├── MonarchRT/                # MonarchRT text-to-video (optional)
├── StreamDiffusion/          # StreamDiffusion library
├── cli.py                    # Command-line interface
├── requirements.txt
└── start.sh

Configuration

Environment variables:

HOST=0.0.0.0          # Server bind address
PORT=7860             # Server port
VIDEOS_DIR=videos     # Input videos directory
ENGINES_DIR=engines   # TensorRT engines cache
DEBUG=true            # Enable debug logging

Edit app/config.py for:

Default resolution (512x512)
Acceleration backend (tensorrt/xformers)
TinyVAE toggle
Max queue size

Performance Tips

Maximize Speed

Use TensorRT acceleration (default)
Use SD-Turbo or SD-Turbo-1.58bit
Process at 512x512 resolution
Enable TinyVAE (default)

Minimize VRAM

Use 1.58-bit quantized models
Use SD-Turbo (smallest model)
Reduce resolution in config.py
Process shorter clips

Best Quality

Use FLUX.2 Klein or SD 1.5 + LCM
Process at native resolution
Use descriptive prompts
Avoid quantized models for final output

Troubleshooting

TensorRT fails to compile

System automatically falls back to xformers. Check CUDA version compatibility.

Out of memory

Use 1.58-bit quantized models
Reduce resolution in app/config.py
Use SD-Turbo instead of larger models
Process shorter video clips

Quantized model not found

Run the quantization script first:

python scripts/quantize_model.py --model sd-turbo

Video won't play in browser

Ensure ffmpeg is installed for H.264 encoding
Try Chrome (best compatibility)

Multi-style fails

Ensure Ollama is running with llava and llama3.2 models
Check Ollama is accessible at localhost:11434

Technology

StreamDiffusion - Real-time diffusion pipeline
Stable Diffusion Turbo - Fast single-step model
FLUX.2 Klein - High-quality 4B model
LCM-LoRA - Latent consistency LoRA
TensorRT - NVIDIA inference optimization
BitNet - 1.58-bit quantization inspiration
MonarchRT - Real-time video generation with Monarch attention
Wan2.1 - Text-to-video diffusion model
FastAPI - Python web framework
Svelte - Frontend framework

References

StreamDiffusion: Real-Time Interactive Generation
BitNet: 1-bit LLMs
The Era of 1-bit LLMs - 1.58-bit quantization
FLUX 1.58-bit - Reference implementation
MonarchRT: Real-Time Video Generation - Monarch matrix attention for DiTs

License

MIT License - See LICENSE for details.

Acknowledgements

cumulo-autumn/StreamDiffusion
Stability AI for SD-Turbo
Black Forest Labs for FLUX
Microsoft Research for BitNet
The Hugging Face community

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.playwright-mcp		.playwright-mcp
StreamDiffusion		StreamDiffusion
app		app
docs/samples		docs/samples
frontend		frontend
img		img
install		install
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_operator.md		README_operator.md
cli.py		cli.py
requirements.txt		requirements.txt
start.sh		start.sh
tf-notebook.yaml		tf-notebook.yaml

License

jasperan/draw-realtime

Folders and files

Latest commit

History

Repository files navigation

draw-realtime

Features

Core Features

Input Options

Output Options

Quick Start

Prerequisites

Installation

Run

Models

Text-to-Video Generation (MonarchRT)

MonarchRT Usage

MonarchRT Installation

1.58-bit Quantization

How It Works

Quantizing Models

Using Quantized Models

Benchmarking

CLI Usage

Style Presets

Multi-Style Generation

API Reference

Project Structure

Configuration

Performance Tips

Maximize Speed

Minimize VRAM

Best Quality

Troubleshooting

TensorRT fails to compile

Out of memory

Quantized model not found

Video won't play in browser

Multi-style fails

Technology

References

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages