Stars
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
The official implementation of DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
OmniTokenizer: one model and one weight for image-video joint tokenization.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
[arXiv 2024] Follow-Your-Click: This repo is the official implementation of "Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts"
Open-Sora: Democratizing Efficient Video Production for All
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Latte: Latent Diffusion Transformer for Video Generation.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
Character Animation (AnimateAnyone, Face Reenactment)
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Implementation of MagViT2 Tokenizer in Pytorch
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
[ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models
KandinskyVideo — multilingual end-to-end text2video latent diffusion model
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
ykk648 / AnimateDiff-I2V
Forked from guoyww/AnimateDiffAnimateDiff I2V version.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing