[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
-
Updated
Aug 7, 2025 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving
[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
[NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"
[ACM MM 2025] HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position.
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
[NeurIPS'23] The official implementation of paper "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method"
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
[AAAI 2023] AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
Official This-Is-My Dataset published in CVPR 2023
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification
The repository contains the code for extracting image and mask from a video segmentation dataset by using the OpenCV library in the Python programming language.
This annotation tool is build to clean and create video dataset.
This repository contains the implementation of Enhanced Random Binary Multilevel Attention Network (ERBMA-Net), a novel framework for facial depression recognition. ERBMA-Net addresses key limitations in existing methods by introducing random binary convolutional filters for enhanced adaptability and multilevel attention mechanisms.
Add a description, image, and links to the video-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-dataset topic, visit your repo's landing page and select "manage topics."