SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
-
Updated
Dec 5, 2025 - Python
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
🎥 Generate videos with advanced multimodal reasoning to enhance understanding and interaction, pushing the boundaries of video content creation.
Add a description, image, and links to the video-reasoning topic page so that developers can more easily learn about it.
To associate your repository with the video-reasoning topic, visit your repo's landing page and select "manage topics."