Release code

EnVision-Research · Mar 31, 2024 · 91b94f5 · 91b94f5
1 parent 93e492c
commit 91b94f5
Show file tree

Hide file tree

Showing 41 changed files with 6,802 additions and 6,499 deletions.
diff --git a/.gitignore b/.gitignore
@@ -159,5 +159,5 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 
-resources/
+scripts/
 outputs/
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -1 +1,104 @@
-# MotionInversion
+<!-- <p align="center">
+    <img src="./assets/readme/icon.png" width="250"/>
+</p>
+<div align="center">
+    <a href="https://github.com/hpcaitech/Open-Sora/stargazers"><img src="https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social"></a>
+    <a href="https://hpcaitech.github.io/Open-Sora/"><img src="https://img.shields.io/badge/Gallery-View-orange?logo=&amp"></a>
+    <a href="https://discord.gg/kZakZzrSUT"><img src="https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp"></a>
+    <a href="https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA"><img src="https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp"></a>
+    <a href="https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g"><img src="https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp"></a>
+    <a href="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png"><img src="https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp"></a>
+    <a href="https://hpc-ai.com/blog/open-sora-v1.0"><img src="https://img.shields.io/badge/Open_Sora-Blog-blue"></a>
+</div> -->
+
+## Motion Inversion for Video Customization
+
+[Luozhou Wang](https://wileewang.github.io/), [Guibao Shen](), [Yixun Liang](https://yixunliang.github.io/), [Xin Tao](http://www.xtao.website/), Pengfei Wan, Di Zhang, [Yijun Li](https://yijunmaverick.github.io/), [Yingcong Chen](https://www.yingcong.me)
+
+HKUST(GZ), HKUST, Kuaishou Technology, Adobe Research.
+
+
+ we present a novel approach to motion customization in video generation, addressing the widespread gap in the thorough exploration of motion representation within video generative models. Recognizing the unique challenges posed by video's spatiotemporal nature, our method introduces **Motion Embeddings**, a set of explicit, temporally coherent one-dimensional embeddings derived from a given video. These embeddings are designed to integrate seamlessly with the temporal transformer modules of video diffusion models, modulating self-attention computations across frames without compromising spatial integrity.  Furthermore, we identify the **Temporal Discrepancy** in video generative models, which refers to variations in how different motion modules process temporal relationships between frames. We leverage this understanding to optimize the integration of our motion embeddings.
+
+
+<h4>Customizing motion of your video with less than 1m parameters and 10 minutes.</h4>
+
+Your content is generally clear and well-structured. I've made some minor grammatical corrections and clarity improvements:
+
+## 📰 News
+
+* **[2024.03.31]** We have released the project page, arXiv paper, and training code.
+
+## 🚧 Todo List
+* [x] Released code for the UNet3D model (ZeroScope, ModelScope).
+<!-- * [ ] Release detailed guidance for training and inference.
+* [ ] Release Gradio demo. -->
+* [ ] Release code for the Sora-like model (Open-Sora, Latte).
+
+
+
+## Contents
+
+* [Installation](#installation)
+* [Training](#training)
+* [Inference](#inference)
+* [Acknowledgement](#acknowledgement)
+* [Citation](#citation)
+
+<!-- * [Motion Embeddings Hub](#motion-embeddings-hub) -->
+
+## Installation
+
+```bash
+# install torch
+pip install torch torchvision
+
+# install diffusers and transformers
+pip install diffusers==0.26.3 transformers==4.27.4
+```
+
+
+## Training
+
+To start training, first download the [ZeroScope](https://huggingface.co/cerspense/zeroscope_v2_576w) weights and specify the path in the config file. Then, run the following commands to begin training:
+
+```bash
+python train.py --config ./configs/train_config.yaml
+```
+
+Stay tuned for training other models and advanced usage!
+
+## Inference
+
+```bash
+python inference.py --config ./configs/inference_config.yaml
+```
+
+We will also provide a Gradio application in this repository.
+
+
+## Acknowledgement
+
+* [MotionDirector](https://github.com/showlab/MotionDirector): We followed their implementation of loss design and techniques to reduce computation resources.
+* [ZeroScope](https://huggingface.co/cerspense/zeroscope_v2_576w): The pretrained video checkpoint we used in our main paper.
+* [AnimateDiff](https://github.com/guoyww/animatediff/): The pretrained video checkpoint we used in our main paper.
+* [Latte](https://github.com/Vchitect/Latte): A video generation model with a similar architecture to Sora.
+* [Open-Sora](https://github.com/hpcaitech/Open-Sora): A video generation model with a similar architecture to Sora.
+
+We are grateful for their exceptional work and generous contribution to the open-source community.
+
+## Citation
+
+<!-- ```bibtex
+@software{opensora,
+  author = {Zangwei Zheng and Xiangyu Peng and Yang You},
+  title = {Open-Sora: Democratizing Efficient Video Production for All},
+  month = {March},
+  year = {2024},
+  url = {https://github.com/hpcaitech/Open-Sora}
+}
+``` -->
+
+<!-- ## Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date) -->
diff --git a/dataset/__init__.py b/dataset/__init__.py
@@ -0,0 +1,5 @@
+from .cached_dataset import CachedDataset
+from .image_dataset import ImageDataset
+from .single_video_dataset import SingleVideoDataset
+from .video_folder_dataset import VideoFolderDataset
+from .video_json_dataset import VideoJsonDataset
diff --git a/dataset/cached_dataset.py b/dataset/cached_dataset.py
@@ -0,0 +1,17 @@
+from utils.dataset_utils import *
+
+class CachedDataset(Dataset):
+    def __init__(self,cache_dir: str = ''):
+        self.cache_dir = cache_dir
+        self.cached_data_list = self.get_files_list()
+
+    def get_files_list(self):
+        tensors_list = [f"{self.cache_dir}/{x}" for x in os.listdir(self.cache_dir) if x.endswith('.pt')]
+        return sorted(tensors_list)
+
+    def __len__(self):
+        return len(self.cached_data_list)
+
+    def __getitem__(self, index):
+        cached_latent = torch.load(self.cached_data_list[index], map_location='cuda:0')
+        return cached_latent