Skip to content

Molecule Transformers is a collection of recipes for pre-training and fine-tuning molecular transformer language models, including BART, BERT, etc. Full thesis available at https://moleculetransformers.github.io/thesis_cs_msc_Khan_Shahrukh.pdf.

License

Notifications You must be signed in to change notification settings

uds-lsv/enumeration-aware-molecule-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enumeration-aware Molecular Transformers for Representation Learning

Overview

We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.

1. Enumeration-aware Molecular Transformers

Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.

a. Molecular Domain Adaptation (Contrastive Encoder-based)

i. Architecture

smole bert drawio

ii. Contrastive Learning

Screenshot 2023-04-22 at 11 54 23 AM

b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)

Screenshot 2023-04-22 at 11 43 06 AM

Code

You can reproduce the experiment by:

Install depedencies

bash pip install -r requirements.txt

1.Pre-training the molecular transfomers

The detailed steps are for pre-training with Encoder-based architectures pertaining MLM, MTR and Seq2Seq BART with denoising objectives are outlined in here.

2. Domain Adaptation with Contrastive Learning and Multitask Learning

To reproduce the domain adaptation step from our work please follow the guidelines here.

3. Finetuning

Finally for finetuning the domain adapted molecular languages on downstream tasks are explained in the accompanying notebook which can be found here.

Acknowledgements

Code base adapted from:

About

Molecule Transformers is a collection of recipes for pre-training and fine-tuning molecular transformer language models, including BART, BERT, etc. Full thesis available at https://moleculetransformers.github.io/thesis_cs_msc_Khan_Shahrukh.pdf.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published