OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu^*, Mengtian Li, Xu Bai, Songtao Zhao^†,
Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao
^*Corresponding author, ^†Project leader

Note: This work is purely academic and non-commercial. Demo reference images/videos are from public domains or AI-generated. For copyright concerns, please contact us for the removal of relevant content.

Built upon Wan 2.1, OmniTransfer seamlessly unifies spatial appearance (ID and style) and temporal video transfer tasks (effect, motion and camera movement) within a single framework, and exhibits strong generalization across unseen task combinations.

📃 Abstract

Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby limiting flexibility and generalization in video generation. To address these limitations, we propose OmniTransfer, a unified framework for spatio-temporal video transfer. It leverages multi-view information across frames to enhance appearance consistency and exploits temporal cues to enable fine-grained temporal control. To unify various video transfer tasks, OmniTransfer incorporates three key designs: Task-aware Positional Bias that adaptively leverages reference video information to improve temporal alignment or appearance consistency; Reference-decoupled Causal Learning separating reference and target branches to enable precise reference transfer while improving efficiency; and Task-adaptive Multimodal Alignment using multimodal semantic guidance to dynamically distinguish and tackle different tasks. Extensive experiments show that OmniTransfer outperforms existing methods in appearance (ID and style) and temporal transfer (camera movement and video effects), while matching pose-guided methods in motion transfer without using pose, establishing a new paradigm for flexible, high-fidelity video generation.

🔥 Latest News

Jan 7, 2026: We release the Project Page of OmniTransfer.

🎬 Show Case

Effect Video Transfer

Zero-Shot Prompt-Free VFX Mastery: Replicate intricate visual effects from unseen videos directly onto your images with seamless temporal consistency.

Motion Video Transfer

Pose-Free Animation: Driven static images by directly injecting fluid, complex motion from unseen sources without explicit pose extraction.

Camera Video Transfer

Trajectory-Free Camera Control: Mirror master-class cinematography from unseen clips onto static landscapes without explicit trajectory or parameter estimation.

ID Video Transfer

Dynamic Identity Anchoring: Synthesize consistent personas by distilling cross-temporal and multi-angle ID cues from reference videos.

Style Video Transfer

Temporal-Style Distillation: Generate consistent stylized videos by inheriting cross-frame aesthetic cues from reference clips.

X Transfer

Beyond Observed Boundaries: OmniTransfer generalizes to unprecedented scenarios from multi-person motion synchronization to unseen task combinations.

OmniTransfer with Seedance 1.0

OmniTransfer supports Seedance 1.0, enabling the delivery of more intricate and high-impact visual effects.

🪄 Framework

OmniTransfer comprises three key components: 1) Task-aware Positional Bias: exploits the model's inherent spatial and temporal context capabilities for diverse tasks. 2) Reference-decoupled Causal Learning: separates reference and target branches for causal and efficient transfer. 3) Task-adaptive Multimodal Alignment: leverages an MLLM to unify and enhance semantic understanding across tasks.

👍 Other Remarkable Video Works

We also invite you to explore our other awesome video works:

Video Style Transfer

DreamStyle: A Unified Framework for Video Stylization. [Project Page] [paper]

Video Face Swapping

DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer. [Project Page] [paper]

Video Insertion

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models. [Project Page] [paper]

Video Edit

InstructX: Towards Unified Visual Editing With MLLM Guidance. [Project Page] [paper]

⭐ Citation

If OmniTransfer is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper.

BibTeX

@misc{zhang2026omnitransfer,
title={OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer}, 
author={Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He and Mingyuan Gao},
year={2026},
eprint={2601.14250},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.14250}, 
}

❤️ Acknowledgement

We would like to thank Junjie Luo, Pengqi Tu, Qi Chen, Qichao Sun and Wanquan Feng for their insightful discussions and valuable data contributions.

📧 Contact

If you have any comments or questions regarding this open-source project, please open a new issue or contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

📃 Abstract

🔥 Latest News