This is the source code repository of RAND (Reinforcement Neighborhood Selection for Unsupervised Graph Anomaly Detection), ICDM 2023.
Unsupervised graph anomaly detection is crucial for various practical applications as it aims to identify anomalies in a graph that exhibit rare patterns deviating significantly from the majority of nodes. Recent advancements have utilized Graph Neural Networks (GNNs) to learn high-quality node representations for anomaly detection by aggregating information from neighborhoods. However, the presence of anomalies may render the observed neighborhood unreliable and result in misleading information aggregation for node representation learning. Selecting the proper neighborhood is critical for graph anomaly detection but also challenging due to the absence of anomaly-oriented guidance and the interdependence with representation learning. To address these issues, we utilize the advantages of reinforcement learning in adaptively learning in complex environments and propose a novel method that incorporates Reinforcement neighborhood selection for unsupervised graph ANomaly Detection (RAND). RAND begins by enriching the candidate neighbor pool of the given central node with multiple types of indirect neighbors. Next, RAND designs a tailored reinforcement anomaly evaluation module to assess the reliability and reward of considering the given neighbor. Finally, RAND selects the most reliable subset of neighbors based on these rewards and introduces an anomaly-aware aggregator to amplify messages from reliable neighbors while diminishing messages from unreliable ones.
python >= 3.8
scikit-learn == 1.3.0
numpy == 1.23.1
pandas == 1.5.3
torch  == 1.11.0
dgl == 0.8.1
dglgo == 0.0.1
The experiments are conducted on the Linux system with NVIDIA GeForce RTX 3090 (24G) GPUs.
We have provided the datasets in the folder for direct usage besides the Reddit dataset (too large). Due to all datasets being too large to be directly uploaded (exceeds git's file size upload limitation), we store the preprocessed datasets in Google Cloud Disk, the access links are as follows: https://drive.google.com/drive/folders/1TIBfnHBnw6ibIpnQ7i2FC_xRo0uMHnsN?usp=sharing
After the data.zip file is downloaded from the link, you should unzip it and place the unzipped datasets into the /data folder.
We have provided all the running hyperparameters to reproduce our experimental results in the /config folder. Note you should unzip and place the downloaded datasets into the /data folder and enter the root folder before running the code.
Due to the hyper-parameters have been already included in the Json file under the /config folder, you can directly run the following Python command for usage:
python main.py --dataset [dataset name] --runs [overall running times] --device [GPU device ID] RAND
where the contents in [] should be replaced according to your need.
If you want to test the performance of RAND on your own dataset, you can also place the dataset in the data folder and configure the corresponding running hyperparameters in the Json file under the config folder to run.
@inproceedings{bei2023reinforcement,
  title={Reinforcement Neighborhood Selection for Unsupervised Graph Anomaly Detection},
  author={Bei, Yuanchen and Zhou, Sheng and Tan, Qiaoyu and Xu, Hao and Chen, Hao and Li, Zhao and Bu, Jiajun},
  booktitle={2023 IEEE International Conference on Data Mining (ICDM)},
  pages={11--20},
  year={2023},
  organization={IEEE}
}