Exploring the graph latent structures has not garnered much attention in the graph generative research field. Yet, exploiting the latent space is as crucial as working on the data space for discrete data such as graphs. However, previous methods either failed to preserve the permutation symmetry of graphs or lacked an effective approaches to model appropriately within the latent space. To mitigate those issues, we propose a simple, yet effective discrete latent graph diffusion generative model. Our model, namely GLAD, not only overcomes the drawbacks of existing latent approaches, but also alleviates inherent issues present in diffusion methods applied on the graph space. We validate our generative model on the molecular benchmark datasets, on which it demonstrates competitive performance compared with the state-of-the-art baselines.
GLAD is built upon Python 3.10.1 and Pytorch 1.12.1. To install additional packages, run the below command:
pip install -r requirements.txtAnd rdkit for molecule graphs:
conda install -c conda-forge rdkit=2020.09.1.0We follow the GDSS repo [Link] to set up the dataset benchmarks.
We benchmark GLAD on three generic graph datasets (Ego-small, Community_small, ENZYMES) and two molecular graph datasets (QM9, ZINC250k).
To generate the generic datasets, run the following command:
python data/data_generators.py --dataset ${dataset_name}To preprocess the molecular graph datasets for training models, run the following command:
python data/preprocess.py --dataset ${dataset_name}
python data/preprocess_for_nspdk.py --dataset ${dataset_name}For the evaluation of generic graph generation tasks, run the following command to compile the ORCA program (see http://www.biolab.si/supp/orca/orca.html):
cd src/metric/orca
g++ -O2 -std=c++11 -o orca orca.cppWe provide GLAD's hyperparameters in the config folder.
The first stage, train the finite scalar quantization autoencoder:
sh run -d ${dataset} -t base -e exp -n ${dataset}_basewhere:
dataset: data type (inconfig/data)dataset_base: autoencoder base (inconfig/exp/{dataset}_base)
Example:
sh run -d qm9 -t base -e exp -n qm9_baseThe sencod stage, train the discrete latent graph diffusion bridges:
sh run -d ${dataset} -t bridge -e exp -n ${dataset}_bridgewhere:
dataset: data type (inconfig/data)dataset_bridge: diffusion bridge (inconfig/exp/{dataset}_bridge)
Example:
sh run -d qm9 -t bridge -e exp -n qm9_bridgeWe provide code that caculates the mean and std of different metrics on generic graphs (15 sampling runs) and molecule graphs (3 sampling runs).
sh run -d ${dataset} -t sample -e exp -n ${dataset}_bridgeExample:
sh run -d qm9 -t sample -e exp -n qm9_bridgeDownload our model weights:
sh download.shPlease refer to our work if you find our paper with the released code useful in your research. Thank you!
@inproceedings{
nguyen2024glad,
title={{GLAD}: Improving Latent Graph Generative Modeling with Simple Quantization},
author={Van Khoa Nguyen and Yoann Boget and Frantzeska Lavda and Alexandros Kalousis},
booktitle={ICML 2024 Workshop on Structured Probabilistic Inference {\&} Generative Modeling},
year={2024},
url={https://openreview.net/forum?id=aY1gdSolIv}
}
