An unofficial, streamlined, and highly optimized ComfyUI implementation of TeleStyle.
This node is specifically designed for Video Style Transfer using the Wan2.1-T2V architecture and TeleStyle custom weights. Unlike the original repository, this implementation strips away all heavy image-editing components (Qwen weights) to focus purely on video generation with speed/quality.
- GPU VRAM: 6GB minimum
- Disk Space: ~6GB for models and weights
-
High Performance:
- Acceleration: Built-in support for Flash Attention 2 and SageAttention for faster inference.
- Fast Mode: Optimized memory management with aggressive cache cleanup to prevent conflicts between CPU offloading and GPU processing.
-
Simplified Workflow: No need for complex external text encoding nodes. The model uses pre-computed stylistic embeddings (prompt_embeds.pth) for maximum efficiency.
h-7.mp4 |
h-6.mp4 |
w-9.mp4 |
h-8.mp4 |
Navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes/Clone this repository:
git clone https://github.com/neurodanzelus-cmd/ComfyUI-TeleStyle.gitInstall dependencies:
pip install -r requirements.txtNote: For SageAttention support, you may need to install
sageattentionmanually.
This node requires specific weights placed in the ComfyUI/models/telestyle_models/ directory.
The weights are downloaded automatically at the first run
Directory Structure:
ComfyUI/
└── models/
└── telestyle_models/
├── weights/
│ ├── dit.ckpt # Main Video Transformer weights
│ └── prompt_embeds.pth # Pre-computed style embeddings
└── Wan2.1-T2V-1.3B-Diffusers/
│ ├── transformer_config.json
│ ├── vae/
│ │ │ ├── diffusion_pytorch_model.safetensors
│ │ │ └── config.json
│ └── scheduler/
│ └── scheduler_config.json
Where to get weights:
https://huggingface.co/Danzelus/TeleStyle_comfy/tree/main
This node loads the necessary model components.
| Parameter | Description |
|---|---|
dtype |
Choose between bf16 (best quality), fp16 |
The main inference node.
| Parameter | Description |
|---|---|
model |
Connect the output from the Loader |
video_frames |
Input video batch (from Load Video or VHS_LoadVideo) |
style_image |
A reference image to guide the style transfer |
steps |
Inference steps (default: 12) |
cfg |
Guidance scale (default: 1) |
scheduler |
Choose your sampler (FlowMatchEuler, DPM++) |
fast_mode |
Keep True for speed. Set to False for low-VRAM offloading (slower) |
acceleration |
default - Standard PyTorch attentionflash_attn - Faster, requires compatible GPUsage_attn - Ultra-fast, requires sageattention library |
- Initial release
- More samplers
- Consistency for very long videos
Guys, I’d really appreciate any support right now. I’m in a tough spot:
This project is an unofficial implementation based on the amazing work by the original authors. Please refer to their repository for the original research and model weights.