A curated list of state-of-the-art video generation models, research, and tools.
The field has evolved from experimental GANs to massive Transformer-based Diffusion (DiT) models capable of generating cinematic-quality video from text or images.
| Model | Developer | Best For | Key Features |
|---|---|---|---|
| Runway Gen-3 Alpha | Runway | Professional Control | Industry-leading motion brush, director mode, and character consistency. |
| Luma Dream Machine | Luma AI | Cinematic Realism | High-speed generation, realistic physics, and complex camera movements. |
| Kling AI | Kuaishou | Long-form Video | Supports videos up to 2 minutes, native 4K, and superior human movement. |
| OpenAI Sora | OpenAI | High Fidelity | 60-second clips with high physical consistency (limited public release). |
| Google Veo 3 | Integration | Native 4K, integrated with Google Vids and Workspace. | |
| Pika 1.5 | Pika Labs | Creative Effects | Specialized in "Pikaffects" (physics-defying creative transformations). |
| Model | Repository | Key Features | License |
|---|---|---|---|
| Wan2.1 | Wan-Video | Current SOTA (2025). Best-in-class prompt adherence. Runs on 8GB-14GB VRAM. | Apache 2.0 |
| HunyuanVideo | Tencent | Cinematic quality, strong Image-to-Video (I2V) capabilities. | Apache 2.0 |
| Mochi-1 | Genmo | High-fidelity motion (30fps) and strong physical realism. | Apache 2.0 |
| CogVideoX | Zhipu AI | Highly accessible; 2B/5B/v1.5 variants. | Apache 2.0 |
| SVD (Stable Video Diffusion) | Stability AI | The industry standard for high-quality Image-to-Video workflows. | Stability NC |
| LTX-Video | Lightricks | Optimized for real-time and efficient video generation. | Apache 2.0 |
Most modern video generation workflows utilize node-based interfaces for maximum control.
- ComfyUI: The de-facto standard for advanced video generation workflows.
- ComfyUI-VideoHelperSuite: Essential nodes for video I/O.
- AnimateDiff-Evolved: The best way to use AnimateDiff modules.
- Stable Diffusion WebUI (A1111/Forge): Popular for stylized video via AnimateDiff and ControlNet.
- Diffusers: Hugging Face's library for running these models in Python.
- Sora: Video generation models as world simulators (OpenAI, 2024)
- HunyuanVideo: Real-world Video Generation with Heterogeneous Diffusion Transformers (Tencent, 2024)
- CogVideoX: Text-to-Video Diffusion Models with Compressed Video Latents (Zhipu AI, 2024)
- Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (Stability AI, 2023)
- Scalable Diffusion Models with Transformers (DiT) (Peebles & Xie, 2023)
- AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning (Guo et al., 2023)
- Panda-70M: 70M high-quality video-text pairs.
- WebVid-10M: A large-scale dataset of short videos with captions.
- HD-VILA-100M: High-resolution video-language dataset.
- Reddit: r/StableDiffusion, r/SoraAI
- Discord: Runway, Luma AI, Pika, and ComfyUI official servers.
The models below represent the "Early Era" of video generation using GANs and VAEs.
| Samples | Code | Paper |
|---|---|---|
| Memoji | -- | -- |
| VideoGAN | Code | Tinyvideo |
| Adversarial Video Gen | Code | 1511.05440 |
| Improved VideoGAN | Code | 1711.11453 |
If you want the good work to continue please support us on