This is an official implementation of MUSE built on model CLIP4clip.
# Pytorch version
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
# From CLIP4clip
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas
Install Causal_conv1d and Mamba_ssm following Vim.
For MSRVTT
The official data and video links can be found in link.
For the convenience, you can also download the splits and captions by,
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip
Besides, the raw videos can be found in sharing from Frozen️ in Time, i.e.,
wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip
bash train_msrvtt.sh