A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

ℹ️ The original code publication can be accessed under the version tag v.0.1.0. The instructions here describe how to reproduce the results with the current benchmark version.

For installation and general usage, please follow the FD-Shifts README.

Citing this Work

@inproceedings{
    jaeger2023a,
    title={A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification},
    author={Paul F Jaeger and Carsten Tim L{\"u}th and Lukas Klein and Till J. Bungert},
    booktitle={International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=YnkGMIh0gvX}
}

Data Folder Requirements

For the predefined experiments we expect the data to be in the following folder structure relative to the folder you set for $DATASET_ROOT_DIR.

<$DATASET_ROOT_DIR>
├── breeds
│   └── ILSVRC ⇒ ../imagenet/ILSVRC
├── imagenet
│   ├── ILSVRC
├── cifar10
├── cifar100
├── corrupt_cifar10
├── corrupt_cifar100
├── svhn
├── tinyimagenet
├── tinyimagenet_resize
├── wilds_animals
│   └── iwildcam_v2.0
└── wilds_camelyon
    └── camelyon17_v1.0

For information regarding where to download these datasets from and what you have to do with them please check out the dataset documentation.

Training

To get a list of all fully qualified names for all experiments in the paper, use

fd-shifts list-experiments --custom-filter=iclr2023

To reproduce all results of the paper:

fd-shifts launch --mode=train --custom-filter=iclr2023
fd-shifts launch --mode=test --custom-filter=iclr2023
fd-shifts launch --mode=analysis --custom-filter=iclr2023

Model Weights

All pretrained model weights used for the benchmark can be found on Zenodo under the following links:

iWildCam-2020-Wilds
iWildCam-2020-Wilds (OpenSet Training)
BREEDS-ENTITY-13
CAMELYON-17-Wilds
CIFAR-100
CIFAR-100 (superclasses)
CIFAR-10
SVHN
SVHN (OpenSet Training)

Create results tables

fd-shifts report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iclr_2023.md

iclr_2023.md

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Citing this Work

Data Folder Requirements

Training

Model Weights

Create results tables

Files

iclr_2023.md

Latest commit

History

iclr_2023.md

File metadata and controls

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Citing this Work

Data Folder Requirements

Training

Model Weights

Create results tables