[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
-
Updated
Dec 8, 2023 - Python
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
Research and Materials on Hardware implementation of Transformer Model
Easiest way of fine-tuning HuggingFace video classification models
[NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection
[NeurIPS 2022 Spotlight] VideoMAE for Action Detection
Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
A demo project showing how to convert an MP4 video to MOV format using the Shotstack Ingest API.
Add a description, image, and links to the video-transformer topic page so that developers can more easily learn about it.
To associate your repository with the video-transformer topic, visit your repo's landing page and select "manage topics."