Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification [PDF]
"Convolutional Neural Networks have demonstrated dermatologist-level performance in the classification of melanoma and other skin lesions, but prediction irregularities due to bias are an issue that should be addressed before widespread deployment is possible. In this work, we robustly remove bias and spurious variation from an automated melanoma classification pipeline using two leading bias 'unlearning' techniques: 'Learning Not to Learn' [1] (LNTL) and 'Turning a Blind Eye' [2] (TABE), as well as an additional hybrid of the two (CLGR) . We show that the biases introduced by surgical markings and rulers presented in previous studies can be reasonably mitigated using these bias removal methods. We also demonstrate the generalisation benefits of 'unlearning' spurious variation relating to the imaging instrument used to capture lesion images. The novel contributions of this work include a comparison of different debiasing techniques for artefact bias removal and the concept of instrument bias 'unlearning' for domain generalisation in melanoma detection. Our experimental results provide evidence that the effect of each of the aforementioned biases are notably reduced, with different debiasing techniques excelling at different tasks."
[Bevan and Atapour-Abarghouei, 2021]
Examples of artefacts seen in ISIC 2020 data. Top row shows images with surgical markings present, bottom row shows images with rulers present.
'Learning Not to Learn' architecture (left) and 'Turning a Blind Eye' architecture (right). Feature extractor, f, is implemented as a convolutional architecture such as ResNeXt or EfficientNet in this work. 'fc' denotes a fully connected layer.
Python 3.9.6
CUDA Version 11.3
Nvidia Driver Version: 465.31
PyTorch 1.8.1
A free account must be created to download The Interactive Atlas of Dermoscopy, available at this link:
https://derm.cs.sfu.ca/Download.html. Place the release_v0.zip
file into the
data/raw_images
directory (see below), from which it will be processed by the download.py
script. The other datasets
will be automatically downloaded and processed by the download.py
script.
Melanoma-Bias └───Data │ └───csv │ | │ asan.csv │ | │ atlas.csv │ | │ ... │ | | └───images | | | └───raw_images | | release_v0.zip | ...
Run download.py
to download, crop and resize the ISIC, ASAN, MClass clinical, MClass dermoscopic and Fitzpatrick17k
datasaets. Have patience as it may take around an hour to complete. The 256x256 resized images are automatically placed
into data/images
as shown below. The manually downloaded Atlas data (data/raw_images/release_v0.zip
) will also be
processed by this script. Note this script clears the data/images
directory before populating it, so if you want to put other
images in there, do this after running the download.py
script.
NOTE: The surgical markings/rulers test set from Heidelberg University [3] is not publicly available.
The data directory should now look as follows:
Melanoma-Bias └───Data │ └───csv │ | │ asan.csv │ | │ atlas.csv │ | │ ... │ | | └───images | | asan_256 | | atlas_256 | | isic_19_train_256 | | isic_20_train_256 | | MClassC_256 | | MClassD_256 | ...
If you do wish to manually download the datasets, they are available at the following links:
ISIC 2020 data: https://www.kaggle.com/cdeotte/jpeg-melanoma-256x256
ISIC 2019/2018/2017 data: https://www.kaggle.com/cdeotte/jpeg-isic2019-256x256
Interactive Atlas of Dermoscopy: https://derm.cs.sfu.ca/Welcome.html
ASAN Test set: https://figshare.com/articles/code/Caffemodel_files_and_Python_Examples/5406223
MClassC/MClassD: https://skinclass.de/mclass/
Training commands for the main experiments from the paper are below. Please see arguments.py
for the full range of arguments if you wish to devise alternative experiments. Test results (plots, logs and weights) will autosave into the results
directory, in subdirectories specific to the test number. Please contact me if you require trained weights for any model in the paper.
Some useful arguments to tweak the below commands:
- Adjust
--CUDA_VISIBLE_DEVICES
andnum-workers
to suit the available GPUs and CPU cores respectively on your machine. - To run in debug mode add
--DEBUG
(limits epochs to 3 batches). - To chnage the random seed (default 0), use
--seed
argument. - To run on different architechtures, use
--arch
argument to choose fromresnext101
,enet
,resnet101
,densenet
orinception
(default=resnext101
). - Add
--cv
to perform cross validation. - Add
--test-only
if you wish to load weights and run testing only (loads weights of whatever--test-no
argument is passed).
Instrument debiasing for domain generalisation:
Baseline: python train.py --test-no 9 --n-epochs 4 --CUDA_VISIBLE_DEVICES 0,1 LNTL: python train.py --test-no 10 --n-epochs 4 --debias-config LNTL --GRL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 TABE: python train.py --test-no 11 --n-epochs 4 --debias-config TABE --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 CLGR: python train.py --test-no 12 --n-epochs 4 --debias-config TABE --GRL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8
Surgical marking bias removal (REQUIRES PRIVATE HEIDELBERG UNIVERSITY DATASET):
Baseline: python train.py --test-no 1 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --marked --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_marked LNTL: python train.py --test-no 2 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config LNTL --GRL --marked --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_marked TABE: python train.py --test-no 3 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config TABE --marked --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_marked CLGR: python train.py --test-no 4 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config TABE --GRL --marked --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_marked
Ruler bias removal (REQUIRES PRIVATE HEIDELBERG UNIVERSITY DATASET):
Baseline: python train.py --test-no 5 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --rulers --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_rulers LNTL: python train.py --test-no 6 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config LNTL --GRL --rulers --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_rulers TABE: python train.py --test-no 7 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config TABE --rulers --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_rulers CLGR: python train.py --test-no 8 --arch enet --enet-type efficientnet_b3 --n-epochs 15 --debias-config TABE --GRL --rulers --CUDA_VISIBLE_DEVICES 0,1 --skew --heid-test_rulers
Double headers (removing instrument and surgical marking bias):
TABE (instrument) + TABE (marks): python train.py --test-no 21 --n-epochs 4 --debias-config doubleTABE --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 CLGR (instrument) + CLGR (marks): python train.py --test-no 22 --n-epochs 4 --debias-config doubleTABE --GRL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 LNTL (instrument) + CLGR (marks): python train.py --test-no 23 --n-epochs 4 --debias-config both --GRL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 CLGR (instrument) + LNTL (marks): python train.py --test-no 24 --n-epochs 4 --debias-config both --GRL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux2 8 --switch-heads --lr-class 0.0003 LNTL (instrument) + TABE (marks): python train.py --test-no 25 --n-epochs 4 --debias-config both --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 TABE (instrument) + LNTL (marks): python train.py --test-no 26 --n-epochs 4 --debias-config both --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux2 8 --switch-heads --lr-class 0.0003 LNTL (instrument) + LNTL (marks): python train.py --test-no 27 --n-epochs 4 --debias-config doubleLNTL --instrument --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003
Double headers (removing instrument and ruler bias):
TABE (instrument) + TABE (rulers): python train.py --test-no 21 --n-epochs 4 --debias-config doubleTABE --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 CLGR (instrument) + CLGR (rulers): python train.py --test-no 22 --n-epochs 4 --debias-config doubleTABE --GRL --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 LNTL (instrument) + CLGR (rulers): python train.py --test-no 23 --n-epochs 4 --debias-config both --GRL --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 CLGR (instrument) + LNTL (rulers): python train.py --test-no 24 --n-epochs 4 --debias-config both --GRL --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux2 8 --switch-heads --lr-class 0.0003 LNTL (instrument) + TABE (rulers): python train.py --test-no 25 --n-epochs 4 --debias-config both --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003 TABE (instrument) + LNTL (rulers): python train.py --test-no 26 --n-epochs 4 --debias-config both --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux2 8 --switch-heads --lr-class 0.0003 LNTL (instrument) + LNTL (rulers): python train.py --test-no 27 --n-epochs 4 --debias-config doubleLNTL --instrument --rulers --CUDA_VISIBLE_DEVICES 0,1 --num-aux 8 --lr-class 0.0003
Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification (P. Bevan, A. Atapour-Abarghouei) [pdf]
@InProceedings{pmlr-v162-bevan22a,
title = {Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification},
author = {Bevan, Peter and Atapour-Abarghouei, Amir},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {1874--1892},
year = {2022},
editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v162/bevan22a/bevan22a.pdf},
url = {https://proceedings.mlr.press/v162/bevan22a.html},
}