This is the implementation for the paper Towards Automated Over-Sampling for Imbalanced Classification. We propose AutoSMOTE, an automated over-sampling algorithm for imbalanced classification. It jointly optimize different levels of decisions with deep hierarchical reinforcement learning. Please refer the paper for more details.
📢 Do you want to learn more about oversampling or data augmentation? Please check out our data-centric AI survey and data-centric AI resources!
If you find this project helpful, please cite
@inproceedings{zha2022automated,
title={Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning},
author={Daochen Zha and Kwei-Herng Lai and Qiaoyu Tan and Sirui Ding and Na Zou and Xia Hu},
booktitle={CIKM},
year={2022},
}
Make sure that you have Python 3.6+ installed. Install with
pip3 install -r requirements.txt
pip3 install -e .
You don't need to mannually download datasets. Just pass the dataset name, and it will be automatically downloaded.
Train on the Mozilla4 dataset with undersampling ratio of 100 and SVM as the base classifier:
python3 train.py
You can run AutoSMOTE under different configurations. Some important arguments are listed below.
--dataset
: which dataset to use--clf
: which base classifeir to use--metric
: which metric to use--device
: by default it trains with GPU. Train with CPU by passingcpu
--total_steps
: search budget