Skip to content

hoangntc/DyHNet

Repository files navigation

DyHNet: Learning Dynamic Heterogeneous Network Representations

1. Directory structure:

.
|   README.md
|   environment.yml
|
|--- dataset
|--- model
|--- ouput
|--- dedenpencies
|   |-- littleballoffur: module for graph sampling
|   |-- prepare_data: module for data preprocessing
|
|--- DyHNet
|   DyHNet.py: main pipeline object
|   main.py: main file
|   |-- config
|   |   dblp.json
|   |   dblp_four_area.json
|   |   imdb.json
|   |   yelp.json
|   |-- src
|   |   datasets.py: data module for training
|   |   model.py: model module for training
|   |   trainer.py: trainer module for training
|   |   inference.py: inference agent
|   |   evaluation: evaluation module
|   |   utils.py: utils functions

2. Installation

2.1 Libraries

To install all neccessary libraries, please run:

conda env create -f environment.yml

In case, the version of Pytorch and Cuda are not compatible on your machine, please remove all related lib in the .yml file; then install Pytorch and Pytorch Geometric separately. If you want to create an environment without using existing file, please refer to installation.md file.

2.2 PyTorch

Please follow Pytorch installation instruction in this link.

2.3 Torch Geometric

pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

where ${TORCH} and ${CUDA} is version of Pytorch and Cuda.

3. Model Architecture

Model architecture

4. Experimental results

4.1 Link prediction

IMDB

Model AUC F1 (k=1) Recall (k=1) Precision (k=1) F1 (k=2) Recall (k=2) Precision (k=2)
DynAE 0.5156 0.3705 0.3344 0.4154 0.5141 0.6709 0.4167
DynAERNN 0.5014 0.3647 0.3291 0.4089 0.5100 0.6656 0.4134
DynGEM 0.5829 0.4367 0.3941 0.4896 0.5494 0.7170 0.4453
DySAT 0.5087 0.3717 0.3354 0.4167 0.5052 0.6593 0.4095
VGRNN 0.5534 0.3949 0.3564 0.4427 0.5084 0.6635 0.4121
EvolveGCN 0.5586 0.3717 0.3354 0.4167 0.5133 0.6698 0.4160
CTGCN-C 0.6169 0.4448 0.4015 0.4987 0.5590 0.7296 0.4531
DHNE 0.5102 0.3577 0.3229 0.4010 0.5116 0.6677 0.4147
DyHATR 0.5216 0.3438 0.3103 0.3854 0.4956 0.6468 0.4017
DyHNet 0.6588 0.5029 0.4539 0.5638 0.5831 0.7610 0.4727
% over the best 6.79% 13.05% 13.05% 13.05% 4.31% 4.31% 4.31%

AMiner

Model AUC F1 (k=1) Recall (k=1) Precision (k=1) F1 (k=2) Recall (k=2) Precision (k=2)
DynAE 0.5337 0.1848 0.1568 0.2250 0.2708 0.3241 0.2325
DynAERNN 0.5883 0.1972 0.1673 0.2401 0.2708 0.3241 0.2325
DynGEM 0.5294 0.2609 0.2213 0.3176 0.3786 0.4532 0.3251
DySAT 0.5138 0.2624 0.2227 0.3195 0.3368 0.4032 0.2892
VGRNN 0.5817 0.2686 0.2279 0.3270 0.3060 0.3663 0.2628
EvolveGCN 0.5982 0.2034 0.1726 0.2476 0.2829 0.3386 0.2429
CTGCN-C 0.5511 0.2360 0.2003 0.2873 0.3247 0.3887 0.2788
DHNE 0.5048 0.2407 0.2042 0.2930 0.3434 0.4111 0.2949
DyHATR 0.5111 0.2866 0.2431 0.3491 0.3788 0.4531 0.3254
DyHNet 0.5742 0.3307 0.2806 0.4026 0.4260 0.5099 0.3658
% over the best -4.02% 15.40% 15.46% 15.32% 12.46% 12.50% 12.41%

DBLP

Model AUC F1 (k=1) Recall (k=1) Precision (k=1) F1 (k=2) Recall (k=2) Precision (k=2)
DynAE 0.6481 0.0333 0.0328 0.0338 0.2602 0.3829 0.1971
DynAERNN 0.6606 0.2968 0.2925 0.3012 0.2052 0.3019 0.1554
DynGEM 0.6492 0.0136 0.0134 0.0138 0.2056 0.3025 0.1558
DySAT 0.5000 0.2499 0.2463 0.2536 0.2543 0.3742 0.1926
EvolveGCN 0.5239 0.0095 0.0094 0.0096 0.0287 0.0422 0.0217
CTGCN-C 0.5994 0.1759 0.1734 0.1785 0.2025 0.2979 0.1533
DHNE 0.5057 0.2146 0.2115 0.2178 0.2739 0.4029 0.2074
DyHATR 0.5178 0.3552 0.3501 0.3604 0.4404 0.6479 0.3336
DyHNet 0.6824 0.3701 0.3648 0.3756 0.4609 0.6780 0.3491
% over the best 3.29% 4.21% 4.21% 4.21% 4.65% 4.65% 4.65%

Yelp

Model AUC F1 (k=1) Recall (k=1) Precision (k=1) F1 (k=2) Recall (k=2) Precision (k=2)
DynAE 0.5221 0.0091 0.0206 0.0058 0.0095 0.0382 0.0054
DynAERNN 0.5128 0.0011 0.0025 0.0007 0.0021 0.0083 0.0012
DynGEM <0.6849 0.0220 0.0499 0.0141 0.0220 0.0886 0.0125
DySAT 0.5024 0.0048 0.0108 0.0030 0.0043 0.0172 0.0024
EvolveGCN 0.5393 0.0174 0.0394 0.0112 0.0152 0.0613 0.0087
CTGCN-C 0.6366 0.0137 0.0311 0.0088 0.0134 0.0539 0.0076
DHNE 0.5027 0.0027 0.0062 0.0017 0.0033 0.0132 0.0019
DyHATR 0.5124 0.0052 0.0117 0.0033 0.0058 0.0234 0.0033
DyHNet 0.6715 0.0235 0.0534 0.0151 0.0247 0.0986 0.0141
% over the best -1.96% 7.02% 7.06% 7.01% 12.26% 11.23% 12.40%

4.2 Node classification

Datasets Metrics Train % DynAE DynAERNN DynGEM DySAT EvolveGCN CTGCN-C DHNE DyHATR DyHNet % over the best
Yelp AUC 20% 0.5319 0.5040 0.5756 0.4947 0.6273 0.6368 0.5001 0.6037 0.6965 9.38%
40% 0.5498 0.5109 0.6040 0.5076 0.6245 0.6690 0.4986 0.6201 0.7021 4.94%
60% 0.5503 0.5089 0.6043 0.5043 0.6304 0.6697 0.4890 0.6142 0.6963 3.97%
80% 0.5643 0.5205 0.6097 0.4970 0.6311 0.6673 0.4898 0.6225 0.7147 7.11%
Macro F1 20% 0.1741 0.2845 0.3455 0.2589 0.1741 0.4414 0.1759 0.3983 0.4871 10.37%
40% 0.1791 0.2215 0.4088 0.2672 0.1791 0.4854 0.1791 0.4056 0.5035 3.74%
60% 0.1771 0.1771 0.3996 0.2783 0.1771 0.4671 0.1771 0.4043 0.4917 5.25%
80% 0.1817 0.2878 0.4054 0.2657 0.1738 0.4579 0.1817 0.4194 0.5123 11.87%
Micro F1 20% 0.3534 0.3687 0.3956 0.3442 0.3534 0.4351 0.3544 0.4082 0.4882 12.21%
40% 0.3673 0.3704 0.4224 0.3587 0.3673 0.4774 0.3673 0.4088 0.5090 6.61%
60% 0.3618 0.3618 0.4128 0.3673 0.3618 0.4592 0.3618 0.4091 0.4944 7.68%
80% 0.3748 0.3748 0.4119 0.3636 0.3525 0.4527 0.3748 0.4193 0.5121 13.11%
DBLP four area AUC 20% 0.5828 0.5880 0.6263 0.5030 0.5056 0.7152 0.4895 0.6301 0.7304 2.14%
40% 0.5860 0.5919 0.6347 0.5078 0.5002 0.7387 0.4924 0.6393 0.7565 2.42%
60% 0.5875 0.5942 0.6375 0.5130 0.5193 0.7481 0.5072 0.6410 0.7797 4.22%
80% 0.5799 0.5906 0.6294 0.4996 0.5044 0.7529 0.5067 0.6366 0.8098 7.55%
Macro F1 20% 0.2348 0.2348 0.2966 0.1849 0.1362 0.4341 0.1023 0.2736 0.4488 3.39%
40% 0.2547 0.2661 0.3482 0.1925 0.1357 0.4693 0.1009 0.2913 0.5107 8.82%
60% 0.2499 0.2499 0.3400 0.1874 0.1348 0.4780 0.1704 0.2734 0.5362 12.16%
80% 0.2322 0.2322 0.3214 0.1793 0.1427 0.4895 0.1252 0.2528 0.5785 18.16%
Micro F1 20% 0.3012 0.3012 0.3218 0.2907 0.2849 0.4469 0.2571 0.3348 0.4632 3.65%
40% 0.3194 0.3175 0.3614 0.2965 0.2895 0.4842 0.2527 0.3561 0.5212 7.63%
60% 0.3165 0.3165 0.3510 0.2931 0.2888 0.4951 0.2802 0.3393 0.5468 10.45%
80% 0.2945 0.2945 0.3321 0.2878 0.3149 0.5031 0.2270 0.3161 0.5916 17.60%

5. Experimental replication

5.1 Dataset

To use your own dataset, you need to prepare:

  • One config file with json format named data_name.json and put it in DyHNet/config

  • Four input files put under the folder dataset/data_name.

    1. node_types.csv: format of each row node_id (int), node_type (int), node_type_name (str)

      node_id,node_type,node_type_name
      0,0,author
      1,0,author
      2,0,author
      3,0,author
      4,0,author
      
    2. temporal_edge_list.txt: format of each row source_node_id (int), target_node_id (int), time_id (int)

      1840 1 6
      1840 2 6
      1840 3 6
      1841 4 4
      1841 5 4
      
    3. temporal_subgraphs.pth: format of each row subgraph_ids, time_id, label. The label column can be null if you don't have the label for the subgraph. In our experiment, we use this column as a meta data for constructing the input data 4 (data.pkl)

      1883-90-105-12693-12812-13117-13235-13273-13682-14027-14158-14241-14387-14517	0	uai	
      1884-105-121-12736-12827-13072-13329-14517	0	uai	
      1909-182-183-12636-12640-12749-12776-12782-12807-13039-13040-13124-13676-14308-14410-14489-14519	0	cikm	
      1930-242-243-13072-13228-13702-14073-14089-14311-14519	0	cikm	
      1972-346-347-12578-12693-12893-13437-13473-13595-13740-14421-14523	0	colt	
      
    4. data.pkl: a dictionary for train/val/test dataloader

      data = {0: {
      'node_id': 800,
      'subgraph_idx': {0: [1, 2], 1: [4, 10], 2: [], 3: [8], 4: [99, 100, 101], 5: [7]},
      'label': 'kdd',
      'dataset': 'train',
      'time_id': 3,
      },
      }
      

5.2 Usage

  • General usage:
from DyHNet.DyHNet import *
config_path = './DyHNet/config/imdb.json'
dyhnet = DyHNet(config_path=config_path)

# Run full pipeline
dyhnet.run_pipeline()

# Run single step
## Preprocess data
dyhnet.preprocess_data()

## Initialize data, model, trainer
data_module, model_module, trainer = dyhnet.initialize()

## Train
dyhnet.train(data_module, model_module, trainer)

## Test with all checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
for checkpoint_path in checkpoint_paths:
    dyhnet.test(data_module, model_module, trainer, checkpoint_path)

## Infer with the last checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
dyhnet.generate_embedding(data_module, model_module, checkpoint_path=checkpoint_paths[-1])
  • Detail usage:
    • For preprocessing data: refer to DyHNet/prepare_data.py
    • For training model: refer to DyHNet/train.py
    • For generating graph embedding: refer to DyHNet/infer.py
    • For evaluating model performance: refer to DyHNet/eval.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published