Skip to content

Commit

Permalink
Merge branch 'master' of github.com:thulas/dac-label-noise
Browse files Browse the repository at this point in the history
Conflicts:
	README.md
  • Loading branch information
thulas committed Dec 5, 2019
2 parents 03f9ec6 + 365ad50 commit 1d550f6
Show file tree
Hide file tree
Showing 5 changed files with 59 additions and 10 deletions.
69 changes: 59 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,25 @@ PyTorch implementation of the deep abstaining classifier (DAC) from the ICML 20

**Combating Label Noise in Deep Learning Using Abstention**, Sunil Thulasidasan, Tanmoy Bhattacharya, Jeff Bilmes, Gopinath Chennupati, Jamaludin Mohd-Yusof

The DAC uses an abstention loss function for identifying arbitrary and systematic label noise while training deep neural networks. For example, in the "random monkeys" experiment from the paper, all the monkey images in the train-set have their labels randomized. During prediction, the DAC abstains on most of the monkey images in the test-set.
The DAC uses an abstention loss function for identifying both arbitrary and systematic label noise while training deep neural networks.

<div class="row">
<div class="column">
<img src="https://github.com/thulas/dac-label-noise/blob/master/imgs/monkey_tile.png" width="300" >
</div>
<div class="column">
<img src="https://github.com/thulas/dac-label-noise/blob/master/imgs/rand_monk_expt_dac_monk_dist.png" width="300">
</div>
</div>
## Identifying Systematic Label Noise

The DAC can be used to learn features or corrupting transformations that are associated with unreliable labels. As an example, in the "random monkeys" experiment, all the monkey images in the train-set have their labels randomized. During prediction, the DAC abstains on most of the monkey images in the test-set.

<p float="left">
<img src="imgs/monkey_tile.png" width="300" >
<img src="imgs/rand_monk_expt_dac_monk_dist.png" width="300">
</p>

In another experiment, we blur a subset (20%) of the images in the training set and randomize their labels. The DAC learns to abstain on predicting on the blurred images during test time.


<p float="left">
<img src="imgs/blurred_sample_tile_4x4.png" width="250" />
<img src="imgs/blurred_expt_dac_blurred_val_pred_dist.png" width="300" />
<img src="imgs/blurred_expt_dac_vs_dnn_val_acc2.png" width="300" />
</p>


To re-run the random monkeys experiment described in the paper,
Expand All @@ -26,15 +35,55 @@ To re-run the random monkeys experiment described in the paper,
and then run as follows:


`python train_dac.py --datadir <path-to-stl10-data> --dataset stl10-c --train_y train_y_downshifted_random_monkeys.bin --test_y test_y_downshifted_random_monkeys.bin --nesterov --net_type vggnet -use-gpu --epochs 75 --loss_fn dac_loss --learn_epochs 20 --seed 0`
`python train_dac.py --datadir <path-to-stl10-data> --dataset stl10-c --train_y train_y_downshifted_random_monkeys.bin --test_y test_y_downshifted_random_monkeys.bin --nesterov --net_type vggnet -use-gpu --epochs 200 --loss_fn dac_loss --learn_epochs 20 --seed 0`

In the above experiment, the best abstention occurs around epoch 75.


## Identifying Arbitrary Label Noise

The DAC can also be used to identify arbitrary label noise where there might not be an underlying corrupting feature or transformation, but classes get mislabeled with a certain probability.

### Training Protocol

- Use DAC to identify label noise
- Eliminiate train samples that are abstained
- Retrain on cleaner set using regular cross-entropy loss

The DAC gives state-of-the-art results in label-noise experiments.


<p float="left">
<img src="imgs/cifar_10_60_ln.png" width="300" />
<img src="imgs/cifar_100_60_ln.png" width="300" />
<img src="imgs/cifar_10_80_ln.png" width="300" />
<img src="imgs/webvision.png" width="300" />
</p>
[GCE: Generalized Cross-Entropy Loss (Zhang et al NIPS ‘18); Forward (Patrini et al, CVPR ’17); MentorNet (Li et al, ICML ‘18)]

More results are in our ICML 2019 paper.

### Tested with:

- Python 2.7
- PyTorch 1.0.1

### Citation
```
@InProceedings{pmlr-v97-thulasidasan19a,
title = {Combating Label Noise in Deep Learning using Abstention},
author = {Thulasidasan, Sunil and Bhattacharya, Tanmoy and Bilmes, Jeff and Chennupati, Gopinath and Mohd-Yusof, Jamal},
booktitle = {Proceedings of the 36th International Conference on Machine Learning},
pages = {6234--6243},
year = {2019},
editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
volume = {97},
series = {Proceedings of Machine Learning Research},
address = {Long Beach, California, USA},
month = {09--15 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v97/thulasidasan19a/thulasidasan19a.pdf},
```

This is open source software available under the BSD Clear license;

Expand Down
Binary file added imgs/cifar_100_60_ln.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/cifar_10_60_ln.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/cifar_10_80_ln.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/webvision.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1d550f6

Please sign in to comment.