Regularized Molecular Conformation Fields

Paper Link

Implementation of RMCF(NeurIPS2022), method for molecular conformation generation, by Lihao Wang, Yi Zhou, Yiqun Wang, Xiaoqing Zheng, Xuanjing Huang, Hao Zhou. This repository contains all code to reproduce a model. If you have any question, feel free to open an issue or reach out to us: wanglh19@fudan.edu.cn.

Requirements

torch (=1.12.0)
rdkit (>=2022.9.3)
networkx (>=2.6.3)
transformers (=4.9.1)
scikit-learn (>=1.0.2)
scikit-learn-extra (>=0.2.0)
torch-scatter (>=2.1.0)
torch-sparse (>=0.6.16)
torch-cluster (>=1.6.0)
torch-spline-conv (>=1.2.1)
torch-geometric (>=2.2.0)
debugpy (>=1.6.5)
fairscale (>=0.4.6)
einops (>=0.6.0)

pip install -r requirements.txt

Dataset

The offical raw GEOM dataset is avaiable [here]. Please download and unzip the dataset, such that you have the path rdkit_folder/

Preprocessing

1. Generate Vocabulary(Optional)

python3 data_gen/step0_gen_vocab.py

We provide generated vocabularies in geom-drugs/

2. Process Molecules

python3 data_gen/step1_process_mol.py

3. Split Dataset

python3 data_gen/step2_split_data.py

The training set is cut into 8 (8 GPUs) parts by default.

Training

You can train the model with the following commands:

python3 train.py \
    --train-prefix  $data_path/train \
    --valid-prefix  $data_path/valid \
    --test-prefix   $data_path/test \
    --model-dir $model_dir \
    --seg-vocab-path $data_path/hit.pkl \
    --vocab-path $data_path/vocab.pkl \
    --model-name rmcf \
    --mpnn-steps 3 \
    --batch-size 256 \
    --learning-rate 5e-4

or

bash scripts/train.sh

Evaluation & Generation

the COV and MAT scores on the GEOM-Drugs testset can be calculated using the following commands:

python3 generate.py \
    --test-prefix  $data_path/test \
    --model-dir $model_dir \
    --seg-vocab-path $data_path/hit.pkl \
    --vocab-path $data_path/vocab.pkl \
    --sampling-strategy random \
    --cov-thres 1.25 \
    --model-name rmcf \
    --mpnn-steps 3 \
    --batch-size 256

or

bash scripts/test.sh

If want to get predicted molecules, please add an argument --return-mols

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
data_gen		data_gen
geom-drugs		geom-drugs
img		img
models		models
processor		processor
scripts		scripts
segmentor		segmentor
torch_random_fields		torch_random_fields
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
callback.py		callback.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regularized Molecular Conformation Fields

Paper Link

Requirements

Dataset

Preprocessing

1. Generate Vocabulary(Optional)

2. Process Molecules

3. Split Dataset

Training

Evaluation & Generation

Examples

About

Access has been restricted

Releases

Packages

Languages

leowang1217/RMCF

Folders and files

Latest commit

History

Repository files navigation

Regularized Molecular Conformation Fields

Paper Link

Requirements

Dataset

Preprocessing

1. Generate Vocabulary(Optional)

2. Process Molecules

3. Split Dataset

Training

Evaluation & Generation

Examples

About

Resources

Access has been restricted

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages