Video Foundation Model Evaluation Framework

In this repo, we provide two things:

Pre-extracted feature vectors obtained using Twelve Labs' video foundation model
Pytorch evaluation code to evaluate & utilize the embeddings

We hope that (1) the published embeddings will help to achieve high performance in various downstream tasks and will be valuable for research, and (2) the evluation source code will be a good baseline code for researchers / developers studying the video foundation models.

Original Paper

Please refer to our technical report for the further details of the evaluation pipeline.

Paper Link

Downstream Tasks & Supported Benchmarks

All results will be saved in ./results directory.

Linear Probing
- Kinetics-400
- Something-Something-v2
- Moments-in-Time
- Diving 48
K-Nearest-Neighbor
- Kinetics-400
- Something-Something-v2
- Moments-in-Time
- Diving 48
Temporal Action Localization
- ActivityNet v1.3
- THUMOS14
Temporal Action Segmentation
- 50Salads
- Breakfast
- GTEA
Embedding Visualization
- Kinetics-400
- Something-Something-v2
- Moments-in-Time
- Diving 48

Embedding Description

Some of the benchmark folders are organized according to how they sample frames (uniform or multi-clip). If you enter the top-level folder of the dataset, or the directory corresponding to each sampling, you will arrive at the location of the train or train/val folder. The current directory in this state is the --embeddings_dir in each downstream task.
There are three files corresponding to a video.
- [video_id].json
  - This json file contains the label corresponding to the video, as well as meta data about the duration, number of frames, and the start and end times of each subclip. Exceptionally, the label for temporal action segmentation utilizes external files rather than this json file.
- [video_id]_c.npy
  - This file contains embedding vectors for each subclip of the video in the form (number of subclips) x (dimension).
- [video_id]_v.npy
  - This file contains one embedding vector that represents the entire video. Same as [video_id]_c.npy for uniform sampling or when only one clip is defined for the entire video.

Citation

If you think this project is helpful, please feel free to leave a star and cite our paper:

@inproceedings{twelvelabs2024twlv,
  title={TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models},
  author={Twelve Labs},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
knn		knn
linear_probing		linear_probing
temporal_action_localization		temporal_action_localization
temporal_action_segmentation		temporal_action_segmentation
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Foundation Model Evaluation Framework

Original Paper

Downstream Tasks & Supported Benchmarks

Embedding Description

Citation

About

Releases

Packages

Contributors 3

Languages

License

twelvelabs-io/video-embeddings-evaluation-framework

Folders and files

Latest commit

History

Repository files navigation

Video Foundation Model Evaluation Framework

Original Paper

Downstream Tasks & Supported Benchmarks

Embedding Description

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages