OpenVid-1M is a high-quality text-to-video dataset designed for research institutions to enhance video quality, featuring high aesthetics, clarity, and resolution. It can be used for direct training or as a quality tuning complement to other video datasets. It can also be used in other video generation task (video super-resolution, frame interpolation, etc)
We carefully curate 1 million high-quality video clips with expressive captions to advance text-to-video research, in which 0.4 million videos are in 1080P resolution (termed OpenVidHD-0.4M).
OpenVid-1M is cited, discussed or used in several recent works, including video diffusion models Goku, MarDini, Allegro, T2V-Turbo-V2, Pyramid Flow, SnapGen-V; long video generation model with AR model ARLON; visual understanding and generation model VILA-U; 3D/4D generation models GenXD, DimentionX; video VAE model IV-VAE; Frame interpolation model Framer and large multimodal model InternVL 2.5.
- [2025.02.28] 🤗 Thanks @Binglei, OpenVid-1M-mapping was developed to correlate the video names in the CSV files with their file paths in the unzipped files. It will be particularly useful if you only need to use a portion of OpenVid-1M and prefer not to download the entire collection.
- [2025.01.23] 🏆 OpenVid-1M is accepted by ICLR 2025!!!
- [2024.12.01] 🚀 OpenVid-1M dataset was downloaded over 79,000 times on Huggingface last month, placing it in the top 1% of all video datasets (as of Nov. 2024)!!
- [2024.07.01] 🔥 Our paper, code, model and OpenVid-1M dataset are released!
conda create -n openvid python=3.10
conda activate openvid
pip install torch torchvision
pip install packaging ninja
pip install flash-attn --no-build-isolation
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
- Download OpenVid-1M dataset.
# it takes a lot of time.
python download_scripts/download_OpenVid.py
- Put OpenVid-1M dataset in
./dataset
folder.
dataset
└─ OpenVid-1M
└─ data
└─ train
└─ OpenVid-1M.csv
└─ OpenVidHD.csv
└─ video
└─ ---_iRTHryQ_13_0to241.mp4
└─ ---agFLYkbY_7_0to303.mp4
└─ --0ETtekpw0_2_18to486.mp4
└─ ...
Model | Data | Pretrained Weight | Steps | Batch Size | URL |
---|---|---|---|---|---|
STDiT-16×1024×1024 | OpenVidHQ | STDiT-16×512×512 | 16k | 32×4 | 🔗 |
STDiT-16×512×512 | OpenVid-1M | STDiT-16×256×256 | 20k | 32×8 | 🔗 |
MVDiT-16×512×512 | OpenVid-1M | MVDiT-16×256×256 | 20k | 32×4 | 🔗 |
Our model's weight is partially initialized from PixArt-α.
# MVDiT, 16x512x512
torchrun --standalone --nproc_per_node 1 scripts/inference.py --config configs/mvdit/inference/16x512x512.py --ckpt-path MVDiT-16x512x512.pt
# STDiT, 16x512x512
torchrun --standalone --nproc_per_node 1 scripts/inference.py --config configs/stdit/inference/16x512x512.py --ckpt-path STDiT-16x512x512.pt
# STDiT, 16x1024x1024
torchrun --standalone --nproc_per_node 1 scripts/inference.py --config configs/stdit/inference/16x1024x1024.py --ckpt-path STDiT-16x1024x1024.pt
# MVDiT, 16x256x256, 72k Steps
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py --config configs/mvdit/train/16x256x256.py
# MVDiT, 16x512x512, 20k Steps
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py --config configs/mvdit/train/16x512x512.py
# STDiT, 16x256x256, 72k Steps
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py --config configs/stdit/train/16x256x256.py
# STDiT, 16x512x512, 20k Steps
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py --config configs/stdit/train/16x512x512.py
# STDiT, 16x1024x1024, 16k Steps
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py --config configs/stdit/train/16x1024x1024.py
Training orders: 16x256x256
Part of the code is based upon: Open-Sora. Thanks for their great work!
@article{nan2024openvid,
title={OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation},
author={Nan, Kepan and Xie, Rui and Zhou, Penghao and Fan, Tiehan and Yang, Zhenheng and Chen, Zhijie and Li, Xiang and Yang, Jian and Tai, Ying},
journal={arXiv preprint arXiv:2407.02371},
year={2024}
}