The images for CC-Neg come from the ImageLabels split of the CC-3M which we prepare and provide here. Please find a compressed file called ccneg_images.zip
in this directory, download, and extract the images. Verify that the structure of the ccneg_images
folder becomes
ccneg_images
|___ cc3m_subset_images_extracted_final
|___ image1.jpg
|___ image2.jpg
...
The annotations containing the true caption and the negated (false) caption for each image in CC-Neg can be downloaded from here. This file, named ccneg_preprocessed.pt
must be downloaded into this directory. The helper for using distractor images during fine-tuning is provided here, named distractor_image_mapping.pt
.
This directory is specified in configs given in src/configs/__init__.py
which is accessed by the src/data
folder. Here, the src/data/evaluation_datasets.py
and src/data/finetuning_datasets.py
use the configs to load in the dataset. For finetuning CLIP to get CoN-CLIP, we use MS-COCO along with CC-Neg. The root folders for both these datasets must be specificed in src/configs/__init__.py
. Be sure to check this before running code which utilizes CC-Neg.