Code for Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks.
Hao Peng, Ruitong Zhang, Yingtong Dou, Renyu Yang, Jingyi Zhang, Philip S. Yu.
The repository is organized as follows:
data/: dataset folderYelpChi.zip: Data of the dataset Yelp;Amazon.zip: Data of the dataset Amazon;Mimic.zip: Data of the dataset Mimic;
log/: log foldermodel/: model foldergraphsage.py: model code for vanilla GraphSAGE model;layers.py: RioGNN layers implementations;model.py: RioGNN model implementations;
RL/: RL folderactor_critic.py: RL algorithm, Actor-Critic;rl_model.py: RioGNN RL Forest implementations;
utils/: functions folderdata_process.py: transfer sparse matrix to adjacency lists;utils.py: utility functions for data i/o and model evaluation;
train.py: training and testing all models
We build different multi-relational graphs for experiments in two task scenarios and three datasets:
| Dataset | Task | Nodes | Relation |
|---|---|---|---|
| Yelp | Fraud Detection | 45,954 | rur, rtr, rsr, homo |
| Amazon | Fraud Detection | 11,944 | upu, usu, uvu, homo |
| MIMIC-III | Diabetes Diagnosis | 28,522 | vav, vdv, vpv, vmv, homo |
To run RioGNN on your datasets, you need to prepare the following data:
- Multiple-single relation graphs with the same nodes where each graph is stored in
scipy.sparsematrix format, you can usesparse_to_adjlist()inutils.pyto transfer the sparse matrix into adjacency lists used by RioGNN; - A numpy array with node labels. Currently, RioGNN only supports binary classification;
- A node feature matrix stored in
scipy.sparsematrix format.
You can download the project and and run the program as follows:
1. The dataset folder \data only contains two Fraud datasets, please use the following links to download the Mimic dataset (~700MB);
Google Drive: https://drive.google.com/file/d/1WvYtNSHcvSQr8fzI9ykpgjMBSPwCTW0h/view?usp=sharing
Baidu Cloud: https://pan.baidu.com/s/1iyaOqnkyYGqo1Mdwt4QYnQ Password: vbwn
* Note that all datasets need to be unzipped in the folder \data first;
pip3 install -r requirements.txtpython data_process.pypython train.py* To run the code, you need to have at least Python 3.6 or later versions.
- Our model supports both CPU and GPU mode, you can change it through parameter
--use_cudaand--device: - Set the
--dataasyelp,amazonormimicto change different dataset. - Parameter
--num_epochsis used to set the maximum number of iterative epochs. Note that the model will stop early when reinforcement learning has explored all depths. - The default value of parameter
--ALAPHAis10, which means that the accuracy of different depths of reinforcement learning tree will be progressive with 0.1, 0.01, 0.001, etc. If you want to conduct more width and depth experiments, please adjust here.
* For other dataset and parameter settings, please refer to the arg parser in train.py.
Our preliminary work, CAmouflage-REsistant Graph Neural Network (CARE-GNN), is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.
If you use our code, please cite the paper below:
@article{peng2021reinforced,
title={Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks},
author={Peng, Hao and Zhang, Ruitong and Dou, Yingtong and Yang, Renyu and Zhang, Jingyi and Yu, Philip S.},
journal={ACM Transactions on Information Systems (TOIS)},
year={2021}
}