Accepted to IJCAI 2021
We use Pytorch for ML framework, RDKit for cheminformatics tool, and DGL for easy implementation of graph neural networks. We highly recommend to use Anaconda for managing package dependencies.
conda create -n RetCL python=3.7
conda activate RetCL
conda install pytorch=1.5.0 cudatoolkit=10.1 -c pytorch
conda install rdkit=2019.09 -c rdkit
conda install dgl=0.4.3 dgllife=0.2.1 -c dglteampython preprocess.py --dataset uspto_50k --datadir data/uspto_50k/
python preprocess.py --dataset uspto_50k --datadir data/uspto_50k_modified/
python preprocess.py --dataset uspto_candidates --datadir data/uspto_candidates/You can download the USPTO-full dataset from the GLN repository.
We provide pretrained models in checkpoints/.
You can obtain the same results reported in Table 1 using the following scripts:
python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_unknown.pth
python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_given.pth --use-labelTo obtain the results in Table 2, replace --best 50 --beam 50 with --best 200 --beam 200.
For the generalization experiment, run the following script:
python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_modified_unknown.pth --datadir data/uspto_50k_modified/ --classwiseYou can train our RetCL framework using train.py. For example, one can learn with 4 nearest neighbors and sum pooling using the following script:
python train.py --use-sum --num-neighbors 4 --logdir logs/uspto_50k_unknown_N4