[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
-
Updated
Sep 23, 2024 - Python
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Papers, code and datasets about deep learning and multi-modal learning for video analysis
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.
Awesome papers & datasets specifically focused on long-term videos.
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera like position.
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)
Official repository for the paper titled "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method", accepted by NeurIPS 2023 Dataset and Benchmark Track
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
[AAAI 2023] AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
Official This-Is-My Dataset published in CVPR 2023
[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification
Synthetically Generated Surveillance Perspective Human Action Recognition Dataset: 6901 Videos from 10 action classes, made by a 3D Simulation, all cropped spatio-temporally and filmed from a surveillance-camera like position.
Add a description, image, and links to the video-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-dataset topic, visit your repo's landing page and select "manage topics."