SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
-
Updated
Dec 5, 2025 - Python
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
🎥 Generate videos with advanced multimodal reasoning to enhance understanding and interaction, pushing the boundaries of video content creation.
Add a description, image, and links to the video-reasoning topic page so that developers can more easily learn about it.
To associate your repository with the video-reasoning topic, visit your repo's landing page and select "manage topics."