Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.ipynb_checkpoints		.ipynb_checkpoints
clipcap_ood_experiments		clipcap_ood_experiments
dataloaders		dataloaders
img_templates		img_templates
models		models
notebooks		notebooks
plotting		plotting
train_results/logs		train_results/logs
utils		utils
.gitignore		.gitignore
README.md		README.md
bird.jpeg		bird.jpeg
check_imagenet_subset.py		check_imagenet_subset.py
debug_loader_with_path.py		debug_loader_with_path.py
eval_ood_detection.py		eval_ood_detection.py
eval_ood_detection_one_class.py		eval_ood_detection_one_class.py
linear_prob_clip.py		linear_prob_clip.py
linear_prob_vit.py		linear_prob_vit.py
play_with_clip.py		play_with_clip.py
requirements.txt		requirements.txt
tarin_dog.py		tarin_dog.py
train_huggingface_clip.py		train_huggingface_clip.py

Repository files navigation

Delving into OOD Detection with Vision-Language Representations

Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that our proposed MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 56.60% (FPR95).

Links

ArXiv

Environment Setup

conda create -n clip-ood python=3.7 -y
conda activate clip-ood

# Install GPU version of pytorch, please verify your own CUDA toolkit version
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge

# Install dependencies
pip install -r requirements.txt

Data Preparation

For complete information, refer to Appendix B.3 of the paper. The default dataset location is ./datasets/, which can be changed in settings.yaml.

In-distribution Datasets

Please download the full ImageNet dataset from the link; the other datasets can be automatically downloaded as the experiments run.

Out-of-Distribution Datasets

iNaturalist, SUN, Places, Texture

We use the large scale OOD datasets curated by Huang et al. 2021. Please follow instruction from the this repository to download the cleaned datasets, where overlaps with ImageNet are removed.

The overall file structure:

CLIP_OOD
|-- datasets
    |-- ImageNet
    |-- CUB-200
    |-- Food-101
    |-- iNaturalist
    ...

Experiments

OOD Detection

The main entry point for running OOD detection experiments is eval_ood_detection.py. Here are the list of arguments:

--name: A unique ID for the experiment, can be any string.
--seed: Random seed for the experiments. (We used 4.)
--gpu: The indexes of the GPUs to use. For example --gpu=0 1 2.
--in_dataset: The in-distribution dataset.
- Accepts: CIFAR-10, CIFAR-100, ImageNet, ImageNet10, ImageNet20, ImageNet100, bird200, car196, flower102, food101 , pet37,
-b, --batch_size: Mini-batch size; 1 for nouns score; 75 for odin_logits; 512 for other scores [clip].
--epoch: Number of epochs to run if doing linear probe.
--model: The model architecture to extract features with.
- Accepts: CLIP, CLIP-Linear, ViT, ViT-Linear. (-Linear is the linear probe version of the model.)
--CLIP_variant: Specifies the pretrained CLIP encoder to use.
- Accepts: ViT-B/32, ViT-B/16, RN50x4, ViT-L/14.
--classifier_ckpt: Specifies the linear probe classifier to load.
--score: The OOD detection score, we accept any of the following:
- MCM: Maximum Concept Matching, Our main result; Correspond to Table 1, 2 in our paper.
- Maha: Mahalanobis score, Correspond to figure 5 in the paper. First time running wil generate class-wise means and precision matrices used in calculation.
- energy: Energy based score, Correspond to Table 6 in our paper.
- max-logit: Cosine similarity without softmax.
- entropy, var, scaled: Respectively: ngative entropy of softmax scaled cosine similarities, variance of cosine similarities, and the scaled difference between the largest and second largest cosine similarities. Correspond to Table 7 in our paper.
- MSP: Maximum Softmax Probability; Classic baseline score.

The results are stored in the folder ./results/. The format is a csv.

Fine-tuning

[TODO]

Reproduction

Here are the commands to reproduce numerical results of our paper, note that we ran our experiments on a single GTX 2080 GPU.

Table 1

eval_ood_detection.py \
    --in_dataset={ImageNet10, ImageNet20, ImageNet100, bird200, car196, flower102, food101/pet37} \
    --out_dataset=iNat SUN Places DTD
    --model=CLIP --CLIP_variant=ViT-B/16 \
    --score=MCM \
    --batch_size=512

Table 2

# zero shot
eval_ood_detection.py \
    --in_dataset=ImageNet --model=CLIP --CLIP_variant={ViT-B/16, ViT-L/14}
    --score=MCM
    --batch_size=512

# Fort et al, MSP
eval_ood_detection.py \
    --in_dataset=ImageNet --model=ViT --CLIP_variant={ViT-B/16, ViT-L/14}
    --score={Maha, MSP}
    --batch_size=512

Table 3

eval_ood_detection.py \
    --in_dataset={ImageNet-10, ImageNet-20, Waterbirds}
    --out_dataset={ImageNet-20, ImageNet-10, Waterbirds-Spurious-OOD}
    --model=CLIP --CLIP_variant=ViT-B/16
    --score={MSP, Maha, MCM}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delving into OOD Detection with Vision-Language Representations

Links

Environment Setup

Data Preparation

In-distribution Datasets

Out-of-Distribution Datasets

Experiments

OOD Detection

Fine-tuning

Reproduction

Table 1

Table 2

Table 3

About

Releases

Packages

Languages

deeplearning-wisc/MCM

Folders and files

Latest commit

History

Repository files navigation

Delving into OOD Detection with Vision-Language Representations

Links

Environment Setup

Data Preparation

In-distribution Datasets

Out-of-Distribution Datasets

Experiments

OOD Detection

Fine-tuning

Reproduction

Table 1

Table 2

Table 3

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages