Skip to content

Commit

Permalink
clean up codebase
Browse files Browse the repository at this point in the history
  • Loading branch information
Yifei Ming committed Dec 1, 2022
1 parent 17b15a5 commit de3e05d
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 85 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ figures/
img_templates/
img_templates/all_feat/
#results
datasets/
datasets_inst/
datasets
datasets_inst
results/
linear_probe_logs/
train_results/
Expand Down
146 changes: 63 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,125 +1,105 @@
# Delving into OOD Detection with Vision-Language Representations
# Delving into Out-of-distribution Detection with Vision-Language Representations

Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that our proposed MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 56.60% (FPR95).
This codebase provides a Pytorch implementation for the paper Delving into Out-Of-Distribution Detection with Vision-Language Representations at NeurIPS 2022.

# Links
### Abstract

ArXiv
Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 13.1% (AUROC).

# Environment Setup
### Illustration

```sh
conda create -n clip-ood python=3.7 -y
conda activate clip-ood
![Arch_figure](figures/Arch_figure.png)



# Set up

## Required Packages

Our experiments are conducted on Ubuntu Linux 20.04 with Python 3.8 and Pytorch 1.10. Besides, the following packages are required to be installed:

- [transformers](https://huggingface.co/docs/transformers/installation)
- scipy
- matplotlib
- seaborn

## Checkpoints

We use the publicly available checkpoints from Hugging Face where the ViT model is pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k. For example, the checkpoint for ViT-B is available [here](https://huggingface.co/google/vit-base-patch16-224).

For CLIP models, our reported results are based on checkpoints provided by Hugging Face for [CLIP-B](https://huggingface.co/openai/clip-vit-base-patch16) and [CLIP-L](https://huggingface.co/openai/clip-vit-large-patch14). Similar results can be obtained with checkpoints in the codebase by [OpenAI](https://github.com/openai/CLIP).

# Install GPU version of pytorch, please verify your own CUDA toolkit version
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge

# Install dependencies
pip install -r requirements.txt
```

# Data Preparation

For complete information, refer to Appendix B.3 of the paper. The default dataset location is `./datasets/`, which can be changed in `settings.yaml`.
For complete information, refer to Appendix B in the paper. The default dataset location is `./datasets/`, which can be changed in `settings.yaml`.

## In-distribution Datasets

We consider the following (in-distribution) datasets:

- [`CUB-200`](http://www.vision.caltech.edu/datasets/cub_200_2011/), [`Standford-Cars`](http://ai.stanford.edu/~jkrause/cars/car_dataset.html), [`Food-101`](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/), [`Oxford-Pet`](https://www.robots.ox.ac.uk/~vgg/data/pets/)
- [`ImageNet`](https://image-net.org/challenges/LSVRC/2012/index.php#), [`ImageNet-10`](https://github.com/alvinmingwisc/CLIP_OOD/blob/clean-up/dataloaders/ImageNet-10-classlist.csv), [`ImageNet-20`](https://github.com/alvinmingwisc/CLIP_OOD/blob/clean-up/dataloaders/ImageNet-20-classlist.csv)
- `ImageNet-1k`, `ImageNet-10`, `ImageNet-20`, `ImageNet-100`

Please download the full ImageNet dataset from the link; the other datasets can be automatically downloaded as the experiments run.
The ImageNet-1k dataset (ILSVRC-2012) can be downloaded [here](https://image-net.org/challenges/LSVRC/2012/index.php#). ImageNet-10, ImageNet-20, and ImageNet-100 can be generated given the classnames and IDs provided in `data/ImageNet10/ImageNet-10-classlist.csv` , `data/ImageNet20/ImageNet-20-classlist.csv`, and `data/ImageNet100/class_list.txt` respectively. The other datasets will be automatically downloaded.

## Out-of-Distribution Datasets

- [iNaturalist](https://arxiv.org/abs/1707.06642), [SUN](https://vision.princeton.edu/projects/2010/SUN/), [Places](https://arxiv.org/abs/1610.02055), [Texture](https://arxiv.org/abs/1311.3618)

We use the large scale OOD datasets curated by [Huang et al. 2021](https://arxiv.org/abs/2105.01879). Please follow instruction from the this [repository](https://github.com/deeplearning-wisc/large_scale_ood#out-of-distribution-dataset) to download the cleaned datasets, where overlaps with ImageNet are removed.
We use the large-scale OOD datasets [iNaturalist](https://arxiv.org/abs/1707.06642), [SUN](https://vision.princeton.edu/projects/2010/SUN/), [Places](https://arxiv.org/abs/1610.02055), and [Texture](https://arxiv.org/abs/1311.3618) curated by [Huang et al. 2021](https://arxiv.org/abs/2105.01879). Please follow instruction from the this [repository](https://github.com/deeplearning-wisc/large_scale_ood#out-of-distribution-dataset) to download the subsampled datasets where semantically overlapped classes with ImageNet-1k are removed.

The overall file structure:

```
CLIP_OOD
MCM
|-- datasets
|-- ImageNet
|-- ImageNet10
|-- ImageNet20
|-- CUB-200
|-- Food-101
|-- iNaturalist
...
```

# Experiments

## OOD Detection

The main entry point for running OOD detection experiments is `eval_ood_detection.py`. Here are the list of arguments:

- `--name`: A unique ID for the experiment, can be any string.
- `--seed`: Random seed for the experiments. (We used 4.)
- `--gpu`: The indexes of the GPUs to use. For example `--gpu=0 1 2`.
- `--in_dataset`: The in-distribution dataset.
- Accepts: `CIFAR-10`, `CIFAR-100`, `ImageNet`, `ImageNet10`, `ImageNet20`, `ImageNet100`, `bird200`, `car196`, `flower102`, `food101` , `pet37`,
<!-- - `--out_datasets`: The out-of-distribution datasets, we accept multiple ones.
- Accepts: `iNat`, `SUN`, `Places`, `DTD`, `ImageNet10`, `ImageNet20` -->
- `-b`, `--batch_size`: Mini-batch size; 1 for nouns score; 75 for odin_logits; 512 for other scores [clip].
- `--epoch`: Number of epochs to run if doing linear probe.
- `--model`: The model architecture to extract features with.
- Accepts: `CLIP`, `CLIP-Linear`, `ViT`, `ViT-Linear`. (`-Linear` is the linear probe version of the model.)
- `--CLIP_variant`: Specifies the pretrained CLIP encoder to use.
- Accepts: `ViT-B/32`, `ViT-B/16`, `RN50x4`, `ViT-L/14`.
- `--classifier_ckpt`: Specifies the linear probe classifier to load.
- `--score`: The OOD detection score, we accept any of the following:

- `MCM`: Maximum Concept Matching, Our main result; Correspond to Table 1, 2 in our paper.
- `Maha`: [Mahalanobis score](https://arxiv.org/abs/1807.03888), Correspond to figure 5 in the paper. First time running wil generate class-wise means and precision matrices used in calculation.
- `energy`: [Energy based score](https://proceedings.neurips.cc/paper/2020/hash/f5496252609c43eb8a3d147ab9b9c006-Abstract.html), Correspond to Table 6 in our paper.
- `max-logit`: Cosine similarity without softmax.
- `entropy`, `var`, `scaled`: Respectively: ngative entropy of softmax scaled cosine similarities, variance of cosine similarities, and the scaled difference between the largest and second largest cosine similarities. Correspond to Table 7 in our paper.
- `MSP`: [Maximum Softmax Probability](https://arxiv.org/abs/1610.02136); Classic baseline score.

The results are stored in the folder `./results/`. The format is a csv.
# Quick Start

## Fine-tuning
The main script for evaluating OOD detection performance is `eval_ood_detection.py`. Here are the list of arguments:

[TODO]
- `--name`: A unique ID for the experiment, can be any string
- `--score`: The OOD detection score, which accepts any of the following:
- `MCM`: Maximum Concept Matching score
- `energy`: The [Energy score](https://proceedings.neurips.cc/paper/2020/hash/f5496252609c43eb8a3d147ab9b9c006-Abstract.html)
- `max-logit`: Max Logit score (i.e., cosine similarity without softmax)
- `entropy`: Negative entropy of softmax scaled cosine similarities
- `var`: Variance of cosine similarities
- `--seed`: A random seed for the experiments
- `--gpu`: The index of the GPU to use. For example `--gpu=0`
- `--in_dataset`: The in-distribution dataset
- Accepts: `ImageNet`, `ImageNet10`, `ImageNet20`, `ImageNet100`, `bird200`, `car196`, `flower102`, `food101` , `pet37`,
- `-b`, `--batch_size`: Mini-batch size
- `--CLIP_ckpt`: Specifies the pretrained CLIP encoder to use
- Accepts: `ViT-B/32`, `ViT-B/16`, `ViT-L/14`.

# Reproduction
The OOD detection results will be generated and stored in `results/in_dataset/score/CLIP_ckpt/name/`.

Here are the commands to reproduce numerical results of our paper, note that we ran our experiments on a single GTX 2080 GPU.

## Table 1
We provide bash scripts to help reproduce numerical results of our paper and facilitate future research. For example, to evaluate the performance of MCM score on ImageNet-1k, with an experiment name `eval_ood`:

```sh
python eval_ood_detection.py \
--in_dataset={ImageNet10, ImageNet20, ImageNet100, bird200, car196, flower102, food101/pet37} \
--out_dataset=iNat SUN Places DTD \
--model=CLIP --CLIP_variant=ViT-B/16 \
--score=MCM \
--batch_size=512
sh scripts/eval_mcm.sh eval_ood ImageNet MCM
```

## Table 2

```sh
# zero shot
python eval_ood_detection.py \
--in_dataset=ImageNet --model=CLIP --CLIP_variant={ViT-B/16, ViT-L/14} \
--score=MCM \
--batch_size=512

# Fort et al, MSP
python eval_ood_detection.py \
--in_dataset=ImageNet --model=ViT --CLIP_variant={ViT-B/16, ViT-L/14} \
--score={Maha, MSP} \
--batch_size=512
```

## Table 3
### Citation

If you find our work useful, please consider citing our paper:

```
python eval_ood_detection.py \
--in_dataset={ImageNet-10, ImageNet-20, Waterbirds} \
--out_dataset={ImageNet-20, ImageNet-10, Waterbirds-Spurious-OOD} \
--model=CLIP --CLIP_variant=ViT-B/16 \
--score={MSP, Maha, MCM}
```
@inproceedings{ming2022delving,
title={Delving into Out-of-Distribution Detection with Vision-Language Representations},
author={Ming, Yifei and Cai, Ziyang and Gu, Jiuxiang and Sun, Yiyou and Li, Wei and Li, Yixuan},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}
```
File renamed without changes.
File renamed without changes.

0 comments on commit de3e05d

Please sign in to comment.