This is the official repository for our paper Meta Prompting which has been accepted for publication at ECCV 2024.
In this paper, we present Meta-Prompting for Visual Recognition (MPVR), a method to effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition. Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, MPVR automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. MPVR generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs. For example, MPVR obtains a zero-shot recognition improvement over CLIP by up to 19.8% and 18.2% (5.0% and 4.5% on average over 20 datasets) leveraging GPT and Mixtral LLMs, respectively.
Our code is built upon the official codebase of the CoOp.
As a first step, install dassl
library (under Meta-Prompting/
) in your environment by following the instructions here.
To further install all other dependencies, please run the following command, after having your environment activated:
pip install -r requirements.txt
Under Meta-Prompting/
first make an empty data folder:
mkdir data
Then download and structure your datasets according to the instructions provided in
the CoOp
official repository.
Most of the datasets are already implemented in their codebase.
For other datasets, you will need to download the datasets from the official sources and structure them as the other
datasets in the CoOp
codebase. For convenience, we provide the download links for remaining datasets here:
The category-level 2.5M VLM prompts for 20 datasets are provided in the Meta-Prompting/descriptions
directory.
To generate the VLM prompts yourself for the datasets, please run the files for the individual dataset files present in the
Meta-Prompting/generate
directory.
In the following we provide instructions to obtain the baseline and MPVR results for the 20 datasets for all the models used in our paper.
- To get the baseline results (with default
a photo of a {}
template) for 20 datasets, run the following command:
bash scripts/zero_shot.sh s_temp none clip_b32 eurosat imagenet_r \
oxford_flowers imagenet_sketch dtd fgvc_aircraft food101 k400 caltech101 \
places365 cubs imagenet stanford_cars sun397 imagenetv2 cifar10 cifar100 \
oxford_pets ucf101 resisc
- To get the baseline results (with dataset-specific templates) for 20 datasets, run the following command:
bash scripts/zero_shot.sh ds_temp none clip_b32 eurosat imagenet_r \
oxford_flowers imagenet_sketch dtd fgvc_aircraft food101 k400 caltech101 \
places365 cubs imagenet stanford_cars sun397 imagenetv2 cifar10 cifar100 \
oxford_pets ucf101 resisc
- To get the MPVR results (with GPT prompts) for 20 datasets, run the following command:
bash scripts/zero_shot.sh mpvr gpt clip_b32 eurosat imagenet_r \
oxford_flowers imagenet_sketch dtd fgvc_aircraft food101 k400 caltech101 \
places365 cubs imagenet stanford_cars sun397 imagenetv2 cifar10 cifar100 \
oxford_pets ucf101 resisc
- To get the MPVR results (with Mixtral prompts) for 20 datasets, run the following command:
bash scripts/zero_shot.sh mpvr mixtral clip_b32 eurosat imagenet_r \
oxford_flowers imagenet_sketch dtd fgvc_aircraft food101 k400 caltech101 \
places365 cubs imagenet stanford_cars sun397 imagenetv2 cifar10 cifar100 \
oxford_pets ucf101 resisc
In the above commands, change the model name with the desired models from the following list:
clip_b32
(OpenAI)clip_b16
(OpenAI)clip_l14
(OpenAI)metaclip_b32
(MetaCLIP)metaclip_b16
(MetaCLIP)metaclip_l14
(MetaCLIP)
@inproceedings{mirza2024mpvr,
author = {Mirza, M. Jehanzeb and Karlinsky, Leonid and Lin, Wei and Doveh, Sivan and
and Micorek, Jakub and Kozinski, Mateusz and Kuhene, Hilde and Possegger, Horst},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
title = {{Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs}},
year = {2024}
}