This repository contains official source code for our paper:
Improving Cross-Modal Retrieval with Diverse Set of Embeddings
Dongwon Kim, Namyup Kim, and Suha Kwak
POSTECH CSE
CVPR (Highlight), Vancouver, 2023.
Parts of our codes are adopted from following repositories.
- https://github.com/yalesong/pvse
- https://github.com/fartashf/vsepp
- https://github.com/lucidrains/perceiver-pytorch
For now, provided training script is only for Faster-RCNN + bi-GRU experimental setting on COCO dataset. We use the dataset preparation scripts from https://github.com/kuanghuei/SCAN#download-data. Place the precomp folder and id_mapping.json under ./data/coco_butd, and vocab file under ./vocab.
You can install requirements using conda.
conda create --name <env> --file requirements.txt
sh train_eval_coco.sh