Pengze Zhang, Yanze Wu * , Mengtian Li, Xu Bai, Songtao Zhao† ,
Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao
* Corresponding author, † Project leader
Note: This work is purely academic and non-commercial. Demo reference images/videos are from public domains or AI-generated. For copyright concerns, please contact us for the removal of relevant content.
Built upon Wan 2.1, OmniTransfer seamlessly unifies spatial appearance (ID and style) and temporal video transfer tasks (effect, motion and camera movement) within a single framework, and exhibits strong generalization across unseen task combinations.
Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby limiting flexibility and generalization in video generation. To address these limitations, we propose OmniTransfer, a unified framework for spatio-temporal video transfer. It leverages multi-view information across frames to enhance appearance consistency and exploits temporal cues to enable fine-grained temporal control. To unify various video transfer tasks, OmniTransfer incorporates three key designs: Task-aware Positional Bias that adaptively leverages reference video information to improve temporal alignment or appearance consistency; Reference-decoupled Causal Learning separating reference and target branches to enable precise reference transfer while improving efficiency; and Task-adaptive Multimodal Alignment using multimodal semantic guidance to dynamically distinguish and tackle different tasks. Extensive experiments show that OmniTransfer outperforms existing methods in appearance (ID and style) and temporal transfer (camera movement and video effects), while matching pose-guided methods in motion transfer without using pose, establishing a new paradigm for flexible, high-fidelity video generation.
- Jan 7, 2026: We release the Project Page of OmniTransfer.
Zero-Shot Prompt-Free VFX Mastery: Replicate intricate visual effects from unseen videos directly onto your images with seamless temporal consistency.
Pose-Free Animation: Driven static images by directly injecting fluid, complex motion from unseen sources without explicit pose extraction.
Trajectory-Free Camera Control: Mirror master-class cinematography from unseen clips onto static landscapes without explicit trajectory or parameter estimation.
Dynamic Identity Anchoring: Synthesize consistent personas by distilling cross-temporal and multi-angle ID cues from reference videos.
Temporal-Style Distillation: Generate consistent stylized videos by inheriting cross-frame aesthetic cues from reference clips.
Beyond Observed Boundaries: OmniTransfer generalizes to unprecedented scenarios from multi-person motion synchronization to unseen task combinations.
OmniTransfer supports Seedance 1.0, enabling the delivery of more intricate and high-impact visual effects.
OmniTransfer comprises three key components: 1) Task-aware Positional Bias: exploits the model's inherent spatial and temporal context capabilities for diverse tasks. 2) Reference-decoupled Causal Learning: separates reference and target branches for causal and efficient transfer. 3) Task-adaptive Multimodal Alignment: leverages an MLLM to unify and enhance semantic understanding across tasks.
We also invite you to explore our other awesome video works:
DreamStyle: A Unified Framework for Video Stylization. [Project Page] [paper]
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer. [Project Page] [paper]
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models. [Project Page] [paper]
InstructX: Towards Unified Visual Editing With MLLM Guidance. [Project Page] [paper]
If OmniTransfer is helpful, please help to ⭐ the repo.
If you find this project useful for your research, please consider citing our paper.
@misc{zhang2026omnitransfer,
title={OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer},
author={Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He and Mingyuan Gao},
year={2026},
eprint={2601.14250},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.14250},
}We would like to thank Junjie Luo, Pengqi Tu, Qi Chen, Qichao Sun and Wanquan Feng for their insightful discussions and valuable data contributions.
If you have any comments or questions regarding this open-source project, please open a new issue or contact us.
