MoleculeNet SMILES BERT Mixup

This repository contains implementation of mixup strategy for text classification. The implementation is primarily based on the paper Augmenting Data with Mixup for Sentence Classification: An Empirical Study , although there is some difference.

Three variants of mixup are considered for text classification

Embedding mixup: Texts are mixed immediately after word embeedding
Hidden/Encoder mixup: Mixup is done prior to the last fully connected layer
Sentence mixup: Mixup is done before softmax

Run Supervised Training with Late Mixup Augmentation

from tqdm import tqdm

SAMPLES_PER_CLASS = [50, 100, 150, 200, 250]
N_AUGMENT = [0, 2, 4, 8, 16]
DATASETS = ['bace', 'bbbp']
METHODS = ['embed', 'encoder', 'sent']
OUTPUT_FILE = 'eval_result_mixup_augment_v1.csv'
N_TRIALS = 20
EPOCHS = 20

for method in METHODS:
  for dataset in DATASETS:
      for sample in SAMPLES_PER_CLASS:
          for n_augment in N_AUGMENT:
              for i in tqdm(range(N_TRIALS)):
                  !python bert_mixup/late_mixup/train_bert.py --dataset-name={dataset} --epoch={EPOCHS} \
                  --batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \
                  --samples-per-class={sample} --eval-after={EPOCHS} --method={method} \
                  --out-file={OUTPUT_FILE} --n-augment={n_augment}
                  !cat {OUTPUT_FILE}

Run Supervised Training with Early Mixup Augmentation

from tqdm import tqdm

SAMPLES_PER_CLASS = [50, 100, 150, 200, 250]
N_AUGMENT = [2, 4, 8, 16, 32]
DATASETS = ['bace', 'bbbp']
OUTPUT_FILE = '/nethome/skhan/moleculenet-smiles-bert-mixup/eval_result_early_mixup.csv'
N_TRIALS = 20
EPOCHS = 100


for dataset in DATASETS:
    for sample in SAMPLES_PER_CLASS:
        for n_augment in N_AUGMENT:
            for i in tqdm(range(N_TRIALS)):
                !python bert_mixup/early_mixup/main.py --dataset-name={dataset} --epoch={EPOCHS} \
                --batch-size=16 --model-name-or-path=shahrukhx01/muv2x-simcse-smole-bert \
                --samples-per-class={sample} --eval-after={EPOCHS} \
                --out-file={OUTPUT_FILE} --n-augment={n_augment}
                !cat {OUTPUT_FILE}

Acknowledgement:

The code in this repository is mainly adapted from the repo "xashru/mixup-text".

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bert_mixup		bert_mixup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoleculeNet SMILES BERT Mixup

Run Supervised Training with Late Mixup Augmentation

Run Supervised Training with Early Mixup Augmentation

Acknowledgement:

About

Releases

Packages

Languages

License

MoleculeTransformers/moleculenet-smiles-bert-mixup

Folders and files

Latest commit

History

Repository files navigation

MoleculeNet SMILES BERT Mixup

Run Supervised Training with Late Mixup Augmentation

Run Supervised Training with Early Mixup Augmentation

Acknowledgement:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages