DyHNet: Learning Dynamic Heterogeneous Network Representations

1. Directory structure:

.
|   README.md
|   environment.yml
|
|--- dataset
|--- model
|--- ouput
|--- dedenpencies
|   |-- littleballoffur: module for graph sampling
|   |-- prepare_data: module for data preprocessing
|
|--- DyHNet
|   DyHNet.py: main pipeline object
|   main.py: main file
|   |-- config
|   |   dblp.json
|   |   dblp_four_area.json
|   |   imdb.json
|   |   yelp.json
|   |-- src
|   |   datasets.py: data module for training
|   |   model.py: model module for training
|   |   trainer.py: trainer module for training
|   |   inference.py: inference agent
|   |   evaluation: evaluation module
|   |   utils.py: utils functions

2. Installation

2.1 Libraries

To install all neccessary libraries, please run:

conda env create -f environment.yml

In case, the version of Pytorch and Cuda are not compatible on your machine, please remove all related lib in the .yml file; then install Pytorch and Pytorch Geometric separately. If you want to create an environment without using existing file, please refer to installation.md file.

2.2 PyTorch

Please follow Pytorch installation instruction in this link.

2.3 Torch Geometric

pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

where ${TORCH} and ${CUDA} is version of Pytorch and Cuda.

3. Model Architecture

4. Experimental results

4.1 Link prediction

IMDB

Model	AUC	F1 (k=1)	Recall (k=1)	Precision (k=1)	F1 (k=2)	Recall (k=2)	Precision (k=2)
DynAE	0.5156	0.3705	0.3344	0.4154	0.5141	0.6709	0.4167
DynAERNN	0.5014	0.3647	0.3291	0.4089	0.5100	0.6656	0.4134
DynGEM	0.5829	0.4367	0.3941	0.4896	0.5494	0.7170	0.4453
DySAT	0.5087	0.3717	0.3354	0.4167	0.5052	0.6593	0.4095
VGRNN	0.5534	0.3949	0.3564	0.4427	0.5084	0.6635	0.4121
EvolveGCN	0.5586	0.3717	0.3354	0.4167	0.5133	0.6698	0.4160
CTGCN-C	0.6169	0.4448	0.4015	0.4987	0.5590	0.7296	0.4531
DHNE	0.5102	0.3577	0.3229	0.4010	0.5116	0.6677	0.4147
DyHATR	0.5216	0.3438	0.3103	0.3854	0.4956	0.6468	0.4017
DyHNet	0.6588	0.5029	0.4539	0.5638	0.5831	0.7610	0.4727
% over the best	6.79%	13.05%	13.05%	13.05%	4.31%	4.31%	4.31%

AMiner

Model	AUC	F1 (k=1)	Recall (k=1)	Precision (k=1)	F1 (k=2)	Recall (k=2)	Precision (k=2)
DynAE	0.5337	0.1848	0.1568	0.2250	0.2708	0.3241	0.2325
DynAERNN	0.5883	0.1972	0.1673	0.2401	0.2708	0.3241	0.2325
DynGEM	0.5294	0.2609	0.2213	0.3176	0.3786	0.4532	0.3251
DySAT	0.5138	0.2624	0.2227	0.3195	0.3368	0.4032	0.2892
VGRNN	0.5817	0.2686	0.2279	0.3270	0.3060	0.3663	0.2628
EvolveGCN	0.5982	0.2034	0.1726	0.2476	0.2829	0.3386	0.2429
CTGCN-C	0.5511	0.2360	0.2003	0.2873	0.3247	0.3887	0.2788
DHNE	0.5048	0.2407	0.2042	0.2930	0.3434	0.4111	0.2949
DyHATR	0.5111	0.2866	0.2431	0.3491	0.3788	0.4531	0.3254
DyHNet	0.5742	0.3307	0.2806	0.4026	0.4260	0.5099	0.3658
% over the best	-4.02%	15.40%	15.46%	15.32%	12.46%	12.50%	12.41%

DBLP

Model	AUC	F1 (k=1)	Recall (k=1)	Precision (k=1)	F1 (k=2)	Recall (k=2)	Precision (k=2)
DynAE	0.6481	0.0333	0.0328	0.0338	0.2602	0.3829	0.1971
DynAERNN	0.6606	0.2968	0.2925	0.3012	0.2052	0.3019	0.1554
DynGEM	0.6492	0.0136	0.0134	0.0138	0.2056	0.3025	0.1558
DySAT	0.5000	0.2499	0.2463	0.2536	0.2543	0.3742	0.1926
EvolveGCN	0.5239	0.0095	0.0094	0.0096	0.0287	0.0422	0.0217
CTGCN-C	0.5994	0.1759	0.1734	0.1785	0.2025	0.2979	0.1533
DHNE	0.5057	0.2146	0.2115	0.2178	0.2739	0.4029	0.2074
DyHATR	0.5178	0.3552	0.3501	0.3604	0.4404	0.6479	0.3336
DyHNet	0.6824	0.3701	0.3648	0.3756	0.4609	0.6780	0.3491
% over the best	3.29%	4.21%	4.21%	4.21%	4.65%	4.65%	4.65%

Yelp

Model	AUC	F1 (k=1)	Recall (k=1)	Precision (k=1)	F1 (k=2)	Recall (k=2)	Precision (k=2)
DynAE	0.5221	0.0091	0.0206	0.0058	0.0095	0.0382	0.0054
DynAERNN	0.5128	0.0011	0.0025	0.0007	0.0021	0.0083	0.0012
DynGEM	<0.6849	0.0220	0.0499	0.0141	0.0220	0.0886	0.0125
DySAT	0.5024	0.0048	0.0108	0.0030	0.0043	0.0172	0.0024
EvolveGCN	0.5393	0.0174	0.0394	0.0112	0.0152	0.0613	0.0087
CTGCN-C	0.6366	0.0137	0.0311	0.0088	0.0134	0.0539	0.0076
DHNE	0.5027	0.0027	0.0062	0.0017	0.0033	0.0132	0.0019
DyHATR	0.5124	0.0052	0.0117	0.0033	0.0058	0.0234	0.0033
DyHNet	0.6715	0.0235	0.0534	0.0151	0.0247	0.0986	0.0141
% over the best	-1.96%	7.02%	7.06%	7.01%	12.26%	11.23%	12.40%

4.2 Node classification

Datasets	Metrics	Train %	DynAE	DynAERNN	DynGEM	DySAT	EvolveGCN	CTGCN-C	DHNE	DyHATR	DyHNet	% over the best
Yelp	AUC	20%	0.5319	0.5040	0.5756	0.4947	0.6273	0.6368	0.5001	0.6037	0.6965	9.38%
		40%	0.5498	0.5109	0.6040	0.5076	0.6245	0.6690	0.4986	0.6201	0.7021	4.94%
		60%	0.5503	0.5089	0.6043	0.5043	0.6304	0.6697	0.4890	0.6142	0.6963	3.97%
		80%	0.5643	0.5205	0.6097	0.4970	0.6311	0.6673	0.4898	0.6225	0.7147	7.11%
	Macro F1	20%	0.1741	0.2845	0.3455	0.2589	0.1741	0.4414	0.1759	0.3983	0.4871	10.37%
		40%	0.1791	0.2215	0.4088	0.2672	0.1791	0.4854	0.1791	0.4056	0.5035	3.74%
		60%	0.1771	0.1771	0.3996	0.2783	0.1771	0.4671	0.1771	0.4043	0.4917	5.25%
		80%	0.1817	0.2878	0.4054	0.2657	0.1738	0.4579	0.1817	0.4194	0.5123	11.87%
	Micro F1	20%	0.3534	0.3687	0.3956	0.3442	0.3534	0.4351	0.3544	0.4082	0.4882	12.21%
		40%	0.3673	0.3704	0.4224	0.3587	0.3673	0.4774	0.3673	0.4088	0.5090	6.61%
		60%	0.3618	0.3618	0.4128	0.3673	0.3618	0.4592	0.3618	0.4091	0.4944	7.68%
		80%	0.3748	0.3748	0.4119	0.3636	0.3525	0.4527	0.3748	0.4193	0.5121	13.11%
DBLP four area	AUC	20%	0.5828	0.5880	0.6263	0.5030	0.5056	0.7152	0.4895	0.6301	0.7304	2.14%
		40%	0.5860	0.5919	0.6347	0.5078	0.5002	0.7387	0.4924	0.6393	0.7565	2.42%
		60%	0.5875	0.5942	0.6375	0.5130	0.5193	0.7481	0.5072	0.6410	0.7797	4.22%
		80%	0.5799	0.5906	0.6294	0.4996	0.5044	0.7529	0.5067	0.6366	0.8098	7.55%
	Macro F1	20%	0.2348	0.2348	0.2966	0.1849	0.1362	0.4341	0.1023	0.2736	0.4488	3.39%
		40%	0.2547	0.2661	0.3482	0.1925	0.1357	0.4693	0.1009	0.2913	0.5107	8.82%
		60%	0.2499	0.2499	0.3400	0.1874	0.1348	0.4780	0.1704	0.2734	0.5362	12.16%
		80%	0.2322	0.2322	0.3214	0.1793	0.1427	0.4895	0.1252	0.2528	0.5785	18.16%
	Micro F1	20%	0.3012	0.3012	0.3218	0.2907	0.2849	0.4469	0.2571	0.3348	0.4632	3.65%
		40%	0.3194	0.3175	0.3614	0.2965	0.2895	0.4842	0.2527	0.3561	0.5212	7.63%
		60%	0.3165	0.3165	0.3510	0.2931	0.2888	0.4951	0.2802	0.3393	0.5468	10.45%
		80%	0.2945	0.2945	0.3321	0.2878	0.3149	0.5031	0.2270	0.3161	0.5916	17.60%

5. Experimental replication

5.1 Dataset

To use your own dataset, you need to prepare:

One config file with json format named data_name.json and put it in DyHNet/config

Four input files put under the folder dataset/data_name.

node_types.csv: format of each row node_id (int), node_type (int), node_type_name (str)

node_id,node_type,node_type_name
0,0,author
1,0,author
2,0,author
3,0,author
4,0,author

temporal_edge_list.txt: format of each row source_node_id (int), target_node_id (int), time_id (int)
```
1840 1 6
1840 2 6
1840 3 6
1841 4 4
1841 5 4
```

temporal_subgraphs.pth: format of each row subgraph_ids, time_id, label. The label column can be null if you don't have the label for the subgraph. In our experiment, we use this column as a meta data for constructing the input data 4 (data.pkl)

1883-90-105-12693-12812-13117-13235-13273-13682-14027-14158-14241-14387-14517	0	uai	
1884-105-121-12736-12827-13072-13329-14517	0	uai	
1909-182-183-12636-12640-12749-12776-12782-12807-13039-13040-13124-13676-14308-14410-14489-14519	0	cikm	
1930-242-243-13072-13228-13702-14073-14089-14311-14519	0	cikm	
1972-346-347-12578-12693-12893-13437-13473-13595-13740-14421-14523	0	colt

data.pkl: a dictionary for train/val/test dataloader

data = {0: {
'node_id': 800,
'subgraph_idx': {0: [1, 2], 1: [4, 10], 2: [], 3: [8], 4: [99, 100, 101], 5: [7]},
'label': 'kdd',
'dataset': 'train',
'time_id': 3,
},
}

5.2 Usage

General usage:

from DyHNet.DyHNet import *
config_path = './DyHNet/config/imdb.json'
dyhnet = DyHNet(config_path=config_path)

# Run full pipeline
dyhnet.run_pipeline()

# Run single step
## Preprocess data
dyhnet.preprocess_data()

## Initialize data, model, trainer
data_module, model_module, trainer = dyhnet.initialize()

## Train
dyhnet.train(data_module, model_module, trainer)

## Test with all checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
for checkpoint_path in checkpoint_paths:
    dyhnet.test(data_module, model_module, trainer, checkpoint_path)

## Infer with the last checkpoints
checkpoint_paths = dyhnet.get_checkpoint_paths()
dyhnet.generate_embedding(data_module, model_module, checkpoint_path=checkpoint_paths[-1])

Detail usage:
- For preprocessing data: refer to DyHNet/prepare_data.py
- For training model: refer to DyHNet/train.py
- For generating graph embedding: refer to DyHNet/infer.py
- For evaluating model performance: refer to DyHNet/eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DyHNet		DyHNet
dataset		dataset
dependencies		dependencies
figure		figure
output		output
script		script
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
installation.md		installation.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DyHNet: Learning Dynamic Heterogeneous Network Representations

1. Directory structure:

2. Installation

2.1 Libraries

2.2 PyTorch

2.3 Torch Geometric

3. Model Architecture

4. Experimental results

4.1 Link prediction

4.2 Node classification

5. Experimental replication

5.1 Dataset

5.2 Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hoangntc/DyHNet

Folders and files

Latest commit

History

Repository files navigation

DyHNet: Learning Dynamic Heterogeneous Network Representations

1. Directory structure:

2. Installation

2.1 Libraries

2.2 PyTorch

2.3 Torch Geometric

3. Model Architecture

4. Experimental results

4.1 Link prediction

4.2 Node classification

5. Experimental replication

5.1 Dataset

5.2 Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages