Active learning

Introduction

The main idea behind active learning is that instead of labelling every data we have, we should just label the ones that offers the best model improvement. However, without labeling them, we do not have an exact measurement on how much particular datapoints would improve it. We have to rely on the properties of our model's preformance on unlabelled data to chose which ones are the most promising ones to label and include in the training set.

For the active learning experiments, the first 1000 image of the first batch file was used as already labelled, and another 5000 was selected as 'new labelings' with different techniques. Finally, new models were trained in the original labelled and the newly labelled data, and the methods were compared to each other.

The techniques in these experiments used for picking new data to label were (for further information refer to this site):

random selection
least certainty
least certainty margin
biggest entropy

Requirements

The required packages can be found in config/env_files/active_learing_env.yml. Dependencies could be installed by running:

conda env create -f config/env_files/active_learning_env.yml

Configuration

The experiments are run according to configurations. The config files can be found in config/config_files. Configurations can be based on each other (base_config key). This way the code will use the parameters of the specified base config and only the newly specified parameters will be overwritten.

The base config file is base.yaml. A hpo example can be found in base_hpo.yaml which is based on base.yaml and does hyperparameter optimization only on the specified parameters.

Every experiment has its config file and hpo config file.

Arguments

The code should be run with arguments:

--id_tag specifies the name under the config where the results will be saved
--config specifies the config name to use (eg. --config "base" for config/config_files/base.yaml)
--mode can be 'train', 'val' or 'hpo'

Required data

I used cifar-10 for the experiments which can be downloaded from here. After extracting it the required data folder's path should be specified inside the config file like:

data:
params:
dataset_path: '/home/data/cifar-10-batches-py' \

To set which data part (the full training, the labelled, the newly picked ones or their combinations). state the data_part value accordingly in the config path. The usable data parts correspond to the files in data/data_parts/{data_part}.

An example of using the labelled dataset and randomly picked 5000 images from the unlabelled dataset:

data:
params:
data_parts: ['labelled', 'random'] \

Saving and loading experiment

The save folder for the experiment outputs can be set in the config file like:

id: "base"
env:
result_dir: 'results'

All the experiment will be saved under the given results dir: {result_dir}/{config_id}/{id_tag arg}

tensorboard files
train and val metric csv
the best model
confusion matrices and by class metrics

If the result dir already exists and contains a model file then the experiment will automatically resume (either resume the training or use the trained model for inference.)

Usage

Training

To train the model use:

python run.py --config base --mode train

Eval

For eval the results dir ({result_dir}/{config_id}/{id_tag arg}) should contain a model as model_best.pth.tar. During eval the validation files will be inferenced and the metrics will be calculated.

python run.py --config base --mode val

Save predictions for an eval

To save the predictions for an eval (like saving the predictions of unlabelled data) an eval should be run with

stating the result dir of the model to use for the prediction
writing env: save_preds: true to the config file
setting id_tag argument empty: --id_tag ""

An example of the required config file can be found in config/config_files/save_unlabelled_pred_from_labelled.yaml

python run.py --config save_unlabelled_pred_from_labelled --mode val --id_tag ""

HPO

For hpo use:

python run.py --config base_hpo --mode hpo

Picking files for active learning

To use different active learning methods to pick new images to label run one of the scripts in active_learning folder. Before running overwrite the pred_path to the path of your train's prediction file (refer to: Save predictions for an eval). These will generate the corresponding files into data/data_parts/{active_learning_name}, containing the picked data with a particular active learning method.

For further information about using the picked data refer to Required data

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
active_learning		active_learning
config		config
data		data
ml		ml
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
active_learning.pdf		active_learning.pdf
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Active learning

Introduction

Requirements

Configuration

Arguments

Required data

Saving and loading experiment

Usage

Training

Eval

Save predictions for an eval

HPO

Picking files for active learning

About

Uh oh!

Releases

Packages

Languages

gregiberri/active_learning

Folders and files

Latest commit

History

Repository files navigation

Active learning

Introduction

Requirements

Configuration

Arguments

Required data

Saving and loading experiment

Usage

Training

Eval

Save predictions for an eval

HPO

Picking files for active learning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages