Introduction: DGFraud is a Graph Neural Network (GNN) based toolbox for fraud detection. It integrates the implementation & comparison of state-of-the-art GNN-based fraud detection models. It also includes several utility functions such as graph preprocessing, graph sampling, and performance evaluation. The introduction of implemented models can be found here.
We welcome contributions on adding new fraud detectors and extending the features of the toolbox. Some of the planned features are listed in TODO list.
If you feel this repo is useful, please cite the paper below:
@inproceedings{liu2020alleviating,
title={Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection},
author={Liu, Zhiwei and Dou, Yingtong and Yu, Philip S. and Deng, Yutong and Peng, Hao},
booktitle={Proceedings of the 43nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2020}
}
Useful Resources
- Graph-based Fraud Detection Paper List
- Awesome Fraud Detection Papers
- Attack and Defense Papers on Graph Data
- PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
- PyODD: An End-to-end Outlier Detection System
- DGL: Deep Graph Library
- Outlier Detection DataSets (ODDS)
Table of Contents
git clone https://github.com/safe-graph/DGFraud.git
cd transformers
python setup.py install
- tensorflow>=1.14.0,<2.0
- numpy>=1.16.4
- scipy>=1.2.0
Introduce how to run the code from the command line, how to run the code from IDE, how to fine-tune the model, the structure of code, the function of different directories, how to load graphs, how to evaluate the models.
python Player2vec_main.py
you can specify parameters for models when running the code.
Have a look at the load_data_dblp() function in utils/utils.py for an example.
In order to use your own data, you have to provide:
- adjacency matrices or adjlists (for SpamGCN);
- a feature matrix
- a label matrix then split feature matrix and label matrix into testing data and training data.
You can specify a dataset as follows:
python xx_main.py --dataset your_dataset
or by editing xx_main.py
The repository is organised as follows:
algorithms/
contains the implemented models and the corresponding example code;base_models/
contains the basic models (GCN);dataset/
contains the necessary dataset files;utils/
contains:- loading and splitting the data (
data_loader.py
); - contains various utilities (
utils.py
); - preprocessing raw data (
process_dzdp.py and process_yelp.py
); - computing ndcg score and ranking precision score (
cal_ndcg.py
).
- loading and splitting the data (
Model | Paper | Venue | Reference |
---|---|---|---|
GraphConsis | Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection | SIGIR 2020 | BibTex |
SemiGNN | A Semi-supervised Graph Attentive Network for Financial Fraud Detection | ICDM 2019 | BibTex |
Player2Vec | Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework | CIKM 2019 | BibTex |
GAS | Spam Review Detection with Graph Convolutional Networks | CIKM 2019 | BibTex |
FdGars | FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System | WWW 2019 | BibTex |
GeniePath | GeniePath: Graph Neural Networks with Adaptive Receptive Paths | AAAI 2019 | BibTex |
GEM | Heterogeneous Graph Neural Networks for Malicious Account Detection | CIKM 2018 | BibTex |
Model | Application | Graph Type | Base Model |
---|---|---|---|
GraphConsis | Opinion Fraud | Homogeneous | GraphSAGE |
SemiGNN | Financial Fraud | Heterogeneous | GAT, LINE, DeepWalk |
Player2Vec | Cyber Criminal | Heterogeneous | GAT, GCN |
GAS | Opinion Fraud | Heterogeneous | GCN, GAT |
FdGars | Opinion Fraud | Homogeneous | GCN |
GeniePath | Financial Fraud | Homogeneous | GAT |
GEM | Financial Fraud | Heterogeneous | GCN |
- GraphConsis Implementation
- Add preprocessed Yelp datasets
- The memory-efficient implementation of SemiGNN
- The log loss for GEM model
- Time-based sampling for GEM
- Add sampling methods
- Benchmarking SOTA models
- Scalable implementation
- TensorFlow 2.0+ implementation
- Pytorch version
You are welcomed to contribute to this open-source toolbox. The detailed instructions will be released soon. Currently, you can create issues or send email to ytongdou@gmail.com for enquiry.