.
| README.md
| environment.yml
|
|--- dataset
|--- model
|--- ouput
|--- dedenpencies
| |-- littleballoffur: module for graph sampling
| |-- prepare_data: module for data preprocessing
|
|--- DyHNet
| DyHNet.py: main pipeline object
| main.py: main file
| |-- config
| | dblp.json
| | dblp_four_area.json
| | imdb.json
| | yelp.json
| |-- src
| | datasets.py: data module for training
| | model.py: model module for training
| | trainer.py: trainer module for training
| | inference.py: inference agent
| | evaluation: evaluation module
| | utils.py: utils functions
To install all neccessary libraries, please run:
conda env create -f environment.yml
In case, the version of Pytorch and Cuda are not compatible on your machine, please remove all related lib in the .yml
file; then install Pytorch and Pytorch Geometric separately. If you want to create an environment without using existing file, please refer to installation.md
file.
Please follow Pytorch installation instruction in this link.
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric
where ${TORCH}
and ${CUDA}
is version of Pytorch and Cuda.
IMDB
Model | AUC | F1 (k=1) | Recall (k=1) | Precision (k=1) | F1 (k=2) | Recall (k=2) | Precision (k=2) |
---|---|---|---|---|---|---|---|
DynAE | 0.5156 | 0.3705 | 0.3344 | 0.4154 | 0.5141 | 0.6709 | 0.4167 |
DynAERNN | 0.5014 | 0.3647 | 0.3291 | 0.4089 | 0.5100 | 0.6656 | 0.4134 |
DynGEM | 0.5829 | 0.4367 | 0.3941 | 0.4896 | 0.5494 | 0.7170 | 0.4453 |
DySAT | 0.5087 | 0.3717 | 0.3354 | 0.4167 | 0.5052 | 0.6593 | 0.4095 |
VGRNN | 0.5534 | 0.3949 | 0.3564 | 0.4427 | 0.5084 | 0.6635 | 0.4121 |
EvolveGCN | 0.5586 | 0.3717 | 0.3354 | 0.4167 | 0.5133 | 0.6698 | 0.4160 |
CTGCN-C | 0.6169 | 0.4448 | 0.4015 | 0.4987 | 0.5590 | 0.7296 | 0.4531 |
DHNE | 0.5102 | 0.3577 | 0.3229 | 0.4010 | 0.5116 | 0.6677 | 0.4147 |
DyHATR | 0.5216 | 0.3438 | 0.3103 | 0.3854 | 0.4956 | 0.6468 | 0.4017 |
DyHNet | 0.6588 | 0.5029 | 0.4539 | 0.5638 | 0.5831 | 0.7610 | 0.4727 |
% over the best | 6.79% | 13.05% | 13.05% | 13.05% | 4.31% | 4.31% | 4.31% |
AMiner
Model | AUC | F1 (k=1) | Recall (k=1) | Precision (k=1) | F1 (k=2) | Recall (k=2) | Precision (k=2) |
---|---|---|---|---|---|---|---|
DynAE | 0.5337 | 0.1848 | 0.1568 | 0.2250 | 0.2708 | 0.3241 | 0.2325 |
DynAERNN | 0.5883 | 0.1972 | 0.1673 | 0.2401 | 0.2708 | 0.3241 | 0.2325 |
DynGEM | 0.5294 | 0.2609 | 0.2213 | 0.3176 | 0.3786 | 0.4532 | 0.3251 |
DySAT | 0.5138 | 0.2624 | 0.2227 | 0.3195 | 0.3368 | 0.4032 | 0.2892 |
VGRNN | 0.5817 | 0.2686 | 0.2279 | 0.3270 | 0.3060 | 0.3663 | 0.2628 |
EvolveGCN | 0.5982 | 0.2034 | 0.1726 | 0.2476 | 0.2829 | 0.3386 | 0.2429 |
CTGCN-C | 0.5511 | 0.2360 | 0.2003 | 0.2873 | 0.3247 | 0.3887 | 0.2788 |
DHNE | 0.5048 | 0.2407 | 0.2042 | 0.2930 | 0.3434 | 0.4111 | 0.2949 |
DyHATR | 0.5111 | 0.2866 | 0.2431 | 0.3491 | 0.3788 | 0.4531 | 0.3254 |
DyHNet | 0.5742 | 0.3307 | 0.2806 | 0.4026 | 0.4260 | 0.5099 | 0.3658 |
% over the best | -4.02% | 15.40% | 15.46% | 15.32% | 12.46% | 12.50% | 12.41% |
DBLP
Model | AUC | F1 (k=1) | Recall (k=1) | Precision (k=1) | F1 (k=2) | Recall (k=2) | Precision (k=2) |
---|---|---|---|---|---|---|---|
DynAE | 0.6481 | 0.0333 | 0.0328 | 0.0338 | 0.2602 | 0.3829 | 0.1971 |
DynAERNN | 0.6606 | 0.2968 | 0.2925 | 0.3012 | 0.2052 | 0.3019 | 0.1554 |
DynGEM | 0.6492 | 0.0136 | 0.0134 | 0.0138 | 0.2056 | 0.3025 | 0.1558 |
DySAT | 0.5000 | 0.2499 | 0.2463 | 0.2536 | 0.2543 | 0.3742 | 0.1926 |
EvolveGCN | 0.5239 | 0.0095 | 0.0094 | 0.0096 | 0.0287 | 0.0422 | 0.0217 |
CTGCN-C | 0.5994 | 0.1759 | 0.1734 | 0.1785 | 0.2025 | 0.2979 | 0.1533 |
DHNE | 0.5057 | 0.2146 | 0.2115 | 0.2178 | 0.2739 | 0.4029 | 0.2074 |
DyHATR | 0.5178 | 0.3552 | 0.3501 | 0.3604 | 0.4404 | 0.6479 | 0.3336 |
DyHNet | 0.6824 | 0.3701 | 0.3648 | 0.3756 | 0.4609 | 0.6780 | 0.3491 |
% over the best | 3.29% | 4.21% | 4.21% | 4.21% | 4.65% | 4.65% | 4.65% |
Yelp
Model | AUC | F1 (k=1) | Recall (k=1) | Precision (k=1) | F1 (k=2) | Recall (k=2) | Precision (k=2) |
---|---|---|---|---|---|---|---|
DynAE | 0.5221 | 0.0091 | 0.0206 | 0.0058 | 0.0095 | 0.0382 | 0.0054 |
DynAERNN | 0.5128 | 0.0011 | 0.0025 | 0.0007 | 0.0021 | 0.0083 | 0.0012 |
DynGEM | <0.6849 | 0.0220 | 0.0499 | 0.0141 | 0.0220 | 0.0886 | 0.0125 |
DySAT | 0.5024 | 0.0048 | 0.0108 | 0.0030 | 0.0043 | 0.0172 | 0.0024 |
EvolveGCN | 0.5393 | 0.0174 | 0.0394 | 0.0112 | 0.0152 | 0.0613 | 0.0087 |
CTGCN-C | 0.6366 | 0.0137 | 0.0311 | 0.0088 | 0.0134 | 0.0539 | 0.0076 |
DHNE | 0.5027 | 0.0027 | 0.0062 | 0.0017 | 0.0033 | 0.0132 | 0.0019 |
DyHATR | 0.5124 | 0.0052 | 0.0117 | 0.0033 | 0.0058 | 0.0234 | 0.0033 |
DyHNet | 0.6715 | 0.0235 | 0.0534 | 0.0151 | 0.0247 | 0.0986 | 0.0141 |
% over the best | -1.96% | 7.02% | 7.06% | 7.01% | 12.26% | 11.23% | 12.40% |
Datasets | Metrics | Train % | DynAE | DynAERNN | DynGEM | DySAT | EvolveGCN | CTGCN-C | DHNE | DyHATR | DyHNet | % over the best |
Yelp | AUC | 20% | 0.5319 | 0.5040 | 0.5756 | 0.4947 | 0.6273 | 0.6368 | 0.5001 | 0.6037 | 0.6965 | 9.38% |
40% | 0.5498 | 0.5109 | 0.6040 | 0.5076 | 0.6245 | 0.6690 | 0.4986 | 0.6201 | 0.7021 | 4.94% | ||
60% | 0.5503 | 0.5089 | 0.6043 | 0.5043 | 0.6304 | 0.6697 | 0.4890 | 0.6142 | 0.6963 | 3.97% | ||
80% | 0.5643 | 0.5205 | 0.6097 | 0.4970 | 0.6311 | 0.6673 | 0.4898 | 0.6225 | 0.7147 | 7.11% | ||
Macro F1 | 20% | 0.1741 | 0.2845 | 0.3455 | 0.2589 | 0.1741 | 0.4414 | 0.1759 | 0.3983 | 0.4871 | 10.37% | |
40% | 0.1791 | 0.2215 | 0.4088 | 0.2672 | 0.1791 | 0.4854 | 0.1791 | 0.4056 | 0.5035 | 3.74% | ||
60% | 0.1771 | 0.1771 | 0.3996 | 0.2783 | 0.1771 | 0.4671 | 0.1771 | 0.4043 | 0.4917 | 5.25% | ||
80% | 0.1817 | 0.2878 | 0.4054 | 0.2657 | 0.1738 | 0.4579 | 0.1817 | 0.4194 | 0.5123 | 11.87% | ||
Micro F1 | 20% | 0.3534 | 0.3687 | 0.3956 | 0.3442 | 0.3534 | 0.4351 | 0.3544 | 0.4082 | 0.4882 | 12.21% | |
40% | 0.3673 | 0.3704 | 0.4224 | 0.3587 | 0.3673 | 0.4774 | 0.3673 | 0.4088 | 0.5090 | 6.61% | ||
60% | 0.3618 | 0.3618 | 0.4128 | 0.3673 | 0.3618 | 0.4592 | 0.3618 | 0.4091 | 0.4944 | 7.68% | ||
80% | 0.3748 | 0.3748 | 0.4119 | 0.3636 | 0.3525 | 0.4527 | 0.3748 | 0.4193 | 0.5121 | 13.11% | ||
DBLP four area | AUC | 20% | 0.5828 | 0.5880 | 0.6263 | 0.5030 | 0.5056 | 0.7152 | 0.4895 | 0.6301 | 0.7304 | 2.14% |
40% | 0.5860 | 0.5919 | 0.6347 | 0.5078 | 0.5002 | 0.7387 | 0.4924 | 0.6393 | 0.7565 | 2.42% | ||
60% | 0.5875 | 0.5942 | 0.6375 | 0.5130 | 0.5193 | 0.7481 | 0.5072 | 0.6410 | 0.7797 | 4.22% | ||
80% | 0.5799 | 0.5906 | 0.6294 | 0.4996 | 0.5044 | 0.7529 | 0.5067 | 0.6366 | 0.8098 | 7.55% | ||
Macro F1 | 20% | 0.2348 | 0.2348 | 0.2966 | 0.1849 | 0.1362 | 0.4341 | 0.1023 | 0.2736 | 0.4488 | 3.39% | |
40% | 0.2547 | 0.2661 | 0.3482 | 0.1925 | 0.1357 | 0.4693 | 0.1009 | 0.2913 | 0.5107 | 8.82% | ||
60% | 0.2499 | 0.2499 | 0.3400 | 0.1874 | 0.1348 | 0.4780 | 0.1704 | 0.2734 | 0.5362 | 12.16% | ||
80% | 0.2322 | 0.2322 | 0.3214 | 0.1793 | 0.1427 | 0.4895 | 0.1252 | 0.2528 | 0.5785 | 18.16% | ||
Micro F1 | 20% | 0.3012 | 0.3012 | 0.3218 | 0.2907 | 0.2849 | 0.4469 | 0.2571 | 0.3348 | 0.4632 | 3.65% | |
40% | 0.3194 | 0.3175 | 0.3614 | 0.2965 | 0.2895 | 0.4842 | 0.2527 | 0.3561 | 0.5212 | 7.63% | ||
60% | 0.3165 | 0.3165 | 0.3510 | 0.2931 | 0.2888 | 0.4951 | 0.2802 | 0.3393 | 0.5468 | 10.45% | ||
80% | 0.2945 | 0.2945 | 0.3321 | 0.2878 | 0.3149 | 0.5031 | 0.2270 | 0.3161 | 0.5916 | 17.60% |
To use your own dataset, you need to prepare:
-
One config file with json format named
data_name.json
and put it inDyHNet/config
-
Four input files put under the folder
dataset/data_name
.-
node_types.csv: format of each row
node_id (int), node_type (int), node_type_name (str)
node_id,node_type,node_type_name 0,0,author 1,0,author 2,0,author 3,0,author 4,0,author
-
temporal_edge_list.txt: format of each row
source_node_id (int), target_node_id (int), time_id (int)
1840 1 6 1840 2 6 1840 3 6 1841 4 4 1841 5 4
-
temporal_subgraphs.pth: format of each row
subgraph_ids, time_id, label
. The label column can be null if you don't have the label for the subgraph. In our experiment, we use this column as a meta data for constructing the input data 4 (data.pkl)1883-90-105-12693-12812-13117-13235-13273-13682-14027-14158-14241-14387-14517 0 uai 1884-105-121-12736-12827-13072-13329-14517 0 uai 1909-182-183-12636-12640-12749-12776-12782-12807-13039-13040-13124-13676-14308-14410-14489-14519 0 cikm 1930-242-243-13072-13228-13702-14073-14089-14311-14519 0 cikm 1972-346-347-12578-12693-12893-13437-13473-13595-13740-14421-14523 0 colt
-
data.pkl: a dictionary for train/val/test dataloader
data = {0: { 'node_id': 800, 'subgraph_idx': {0: [1, 2], 1: [4, 10], 2: [], 3: [8], 4: [99, 100, 101], 5: [7]}, 'label': 'kdd', 'dataset': 'train', 'time_id': 3, }, }
-
- General usage:
from DyHNet.DyHNet import *
config_path = './DyHNet/config/imdb.json'
dyhnet = DyHNet(config_path=config_path)
# Run full pipeline
dyhnet.run_pipeline()
# Run single step
## Preprocess data
dyhnet.preprocess_data()
## Initialize data, model, trainer
data_module, model_module, trainer = dyhnet.initialize()
## Train
dyhnet.train(data_module, model_module, trainer)
## Test with all checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
for checkpoint_path in checkpoint_paths:
dyhnet.test(data_module, model_module, trainer, checkpoint_path)
## Infer with the last checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
dyhnet.generate_embedding(data_module, model_module, checkpoint_path=checkpoint_paths[-1])
- Detail usage:
- For preprocessing data: refer to
DyHNet/prepare_data.py
- For training model: refer to
DyHNet/train.py
- For generating graph embedding: refer to
DyHNet/infer.py
- For evaluating model performance: refer to
DyHNet/eval.py
- For preprocessing data: refer to