This repository contains the code for our NeurIPS 2024 paper Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
. [Paper]
We test our codebase with PyTorch 2.1.1 with CUDA 12.1. Please install corresponding PyTorch and CUDA versions according to your computational resources. Then install the rest of required packages by running pip install -r requirements.txt
. Please install the info-nce-pytorch package following https://github.com/RElbers/info-nce-pytorch.
To set up all required datasets, kindly refer to the guidance in DATASETS.md, which incorporates steps for installing two benchmarks.
To run the code, you can execute the following 4 bash scripts:
- ResNet50: Run DPE on the OOD Benchmark using the ResNet-50 model:
bash ./scripts/run_ood_benchmark_rn50.sh
- ViT/B-16: Run DPE on the OOD Benchmark using the ViT/B-16 model.
bash ./scripts/run_ood_benchmark_vit.sh
- ResNet50: Run DPE on the Cross-Domain Benchmark using the ResNet-50 model:
bash ./scripts/run_cd_benchmark_rn50.sh
- ViT/B-16: Run DPE on the Cross-Domain Benchmark using the ViT/B-16 model.
bash ./scripts/run_cd_benchmark_vit.sh
In each bash script, you can modify the following arguments: (1) --datasets
to specify the datasets, (2) --backbone
to specify the backbone model (RN50 and ViT-B/16), and (3) --coop
to enable the learned prompts by CoOp. We use wandb
to track the results. If you wish to deactivate this feature, simply omit the --wandb-log
argument.
Our codebase is adapted from Tip-Adapter, CLIP, TDA, TPT, and CuPL. We thank the authors for releasing their code!
If you have any questions, please contact at cezhang@cs.cmu.edu.
If you find this code useful, please consider citing our work:
@article{zhang2024dual,
title={Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models},
author={Zhang, Ce and Stepputtis, Simon and Sycara, Katia and Xie, Yaqi},
journal={arXiv preprint arXiv:2410.12790},
year={2024}
}