Stars
Aether: Geometric-Aware Unified World Modeling
[CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer
Official PyTorch implementation for "FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis".
Code for "Multi-view Reconstruction via SfM-guided Monocular Depth Estimation". CVPR 2025
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR 2025] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning"
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Official implementation of TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Simulating the Real World: Survey & Resources, which contains our survey "Simulating the Real World: A Unified Survey of Multimodal Generative Models" and Awesome-Text2X-Resources. Watch this repos…
[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation
Physical laws underpin all existence, and harnessing them for generative modeling opens boundless possibilities for advancing science and shaping the future!
[CVPR 2025] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
[ECCV2024] BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Wan: Open and Advanced Large-Scale Video Generative Models
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
These scripts are used to download RealEstate10K dataset.
Original implementation of "Radiant Foam: Real-Time Differentiable Ray Tracing"
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models