This repository contains the official source code for our paper:
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim, Namyup Kim, and Suha Kwak
POSTECH CSE
CVPR (Highlight), Vancouver, 2023.
Parts of our codes are adopted from the following repositories.
- https://github.com/yalesong/pvse
- https://github.com/fartashf/vsepp
- https://github.com/lucidrains/perceiver-pytorch
For now, provided training script is only for Faster-RCNN + bi-GRU experimental setting on COCO dataset. We use the dataset preparation scripts from https://github.com/kuanghuei/SCAN#download-data. Place the precomp folder under ./data/coco_butd, and the vocab file under ./vocab.
You can install requirements using conda.
conda create --name <env> --file requirements.txt
sh train_eval_coco.sh