Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
datasets		datasets
docs		docs
models_jupyter		models_jupyter
mols_jupyter		mols_jupyter
tartarus		tartarus
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
benchmark.py		benchmark.py
environment.yml		environment.yml
example.py		example.py
make_env.sh		make_env.sh

Repository files navigation

Tartarus: Practical and Realistic Benchmarks for Inverse Molecular Design

This repository contains the code and results for the paper Tartarus, an open-source collection of benchmarks for evaluation of a generative model.

Total installation time: ~15-20mins.

Installing XTB and CREST

The task of designing organic photovoltaics and emitters will require the use of XTB, a program package of semi-empirical quantum mechanical methods, and CREST, a utility of xtb used to sample molecular conformers.

The binaries are provided in here. Place in home directory, and the software can be sourced using

export XTBHOME=${HOME}/xtb
export PATH=${PATH}:${XTBHOME}/bin
export XTBPATH=${XTBHOME}/share/xtb:${XTBHOME}:${HOME}
export MANPATH=${MANPATH}:${XTBHOME}/share/man

Installing SMINA

The task of designing molecules that dock to proteins requires the use of SMINA, a method for calcualte docking scores of ligands onto solved structures (proteins). The binary file is already included in the repository, in tartarus/docking_structures/smina.static.

Packages required

Use python >= 3.8. We recommend using a conda environment for the installation of

rdkit >= 2021.03.3
xtb-python >= 20.1
openbabel == 3.1.1

Required packages:

numpy >= 1.22.3
pandas >= 1.4.3
torch == 1.12.0
pyscf == 2.0.1
morfeus-ml >= 0.7.1
geometric == 0.9.7.2
pyberny == 0.6.3
loguru == 0.6.0
geodesic-interpolate == 1.0.0
(pip install -i https://test.pypi.org/simple/ geodesic-interpolate)
polanyi == 0.0.1
(pip install git+https://github.com/kjelljorner/polanyi)

Datasets

All datasets are found in the datasets directory. The arrows indicate the goal (↑ = maximization, ↓ = minimization).

Task	Dataset name	# of smiles	Columns in file
Designing OPV	`hce.csv`	24,953	PCE_PCBM -SAS (↑)	PCE_PCDTBT -SAS (↑)
Designing emitters	`gdb13.csv`	403,947	Singlet-triplet gap (↓)	Oscillator strength (↑)	Multi-objective (↑)
Designing drugs	`docking.csv`	152,296	1SYH (↓)	6Y2F (↓)	4LDE (↓)
Designing chemical reaction substrates	`reactivity.csv`	60,828	Activation energy ΔE^‡ (↓)	Reaction energy ΔE_r (↓)	ΔE^‡ + ΔE_r (↓)	- ΔE^‡ + ΔE_r (↓)

Getting started

Below are some examples of how to load the datasets and use the fitness functions. For more details, you can also look at example.py.

Designing organic photovoltaics

To use the evaluation function, load either the full xtb calculation from the pce module, or use the surrogate model, with pretrained weights.

import pandas as pd
data = pd.read_csv('./datasets/hce.csv')   # or ./dataset/unbiased_hce.csv
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import pce
dipm, gap, lumo, combined, pce_pcbm_sas, pce_pcdtbt_sas = pce.get_properties(smi)

## use pretrained surrogate model
dipm, gap, lumo, combined = pce.get_surrogate_properties(smi)

Designing Organic Emitters

Load the objective functions from the tadf module. All 3 fitness functions are returned for each smiles.

import pandas as pd
data = pd.read_csv('./datasets/gdb13.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import tadf
st, osc, combined = tadf.get_properties(smi)

Design of drug molecule

Load the docking module. There are separate functions for each of the proteins, as shown below.

import pandas as pd
data = pd.read_csv('./datasets/docking.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## Design of Protein Ligands 
from tartarus import docking
score_1syh = docking.get_1syh_score(smi)
score_6y2f = docking.get_6y2f_score(smi)
score_4lde = docking.get_4lde_score(smi)

Design of Chemical Reaction Substrates

Load the reactivity module. All 4 fitness functions are returned for each smiles.

import pandas as pd
data = pd.read_csv('./datasets/reactivity.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## calculating binding affinity for each protein
from tartarus import reactivity
Ea, Er, sum_Ea_Er, diff_Ea_Er = reactivity.get_properties(smi)

Results

Our results for running the corresponding benchmarks can be found here:

Design of Protein Ligands: https://drive.google.com/file/d/1d_4mg1Eb7HrUJ2L7A8kFtld-TmPmOKlJ/view?usp=sharing
Design of Chemical Reaction Substrates: https://drive.google.com/file/d/1fCnFxSUITg4qSlOuwFolvQPUQA31Qaii/view?usp=sharing
Designing organic photovoltaics (photovoltaic conversion efficiency): https://drive.google.com/file/d/1w6oOBGjDC4Enh492jLQ7A3Xc1XbHXiIt/view?usp=sharing
Designing Organic Emitters: https://drive.google.com/file/d/1l8weYg835HDGvOoRbOcHUnvLjiyQi_Ms/view?usp=sharing
Designing organic photovoltaics (Explore): https://drive.google.com/file/d/1-J99iXfBx0_aG1BqEEXPh7q0kovBFD0L/view?usp=sharing
Designing organic photovoltaics (Surrogate, exploit): https://drive.google.com/file/d/1EV7ST9_F4DBnQpxhd6VaaJWP5r9ygr0c/view?usp=sharing
Designing organic photovoltaics (Exploit): https://drive.google.com/file/d/1Yh_8E3jRf6X230CvlRlPtk2qPQIkC5hB/view?usp=sharing

Questions, problems?

Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat98[AT]stanford[DOT]edu, robert[DOT]pollice[AT]gmail[DOT]com)

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tartarus: Practical and Realistic Benchmarks for Inverse Molecular Design

Installing XTB and CREST

Installing SMINA

Packages required

Datasets

Getting started

Designing organic photovoltaics

Designing Organic Emitters

Design of drug molecule

Design of Chemical Reaction Substrates

Results

Questions, problems?

License

About

Releases 2

Packages

Contributors 3

Languages

aspuru-guzik-group/Tartarus

Folders and files

Latest commit

History

Repository files navigation

Tartarus: Practical and Realistic Benchmarks for Inverse Molecular Design

Installing XTB and CREST

Installing SMINA

Packages required

Datasets

Getting started

Designing organic photovoltaics

Designing Organic Emitters

Design of drug molecule

Design of Chemical Reaction Substrates

Results

Questions, problems?

License

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages