-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partitioning for distributed training #7502
Conversation
add partition part support for distributed training
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
fix bug for put_edge_id
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
… test case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
add example for graph partition; change ogb dataset to fakedataset in…
Codecov Report
@@ Coverage Diff @@
## master #7502 +/- ##
==========================================
- Coverage 91.74% 91.45% -0.29%
==========================================
Files 450 451 +1
Lines 25161 25270 +109
==========================================
+ Hits 23084 23111 +27
- Misses 2077 2159 +82
... and 17 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Hi @rusty1s , I added a unit test for graph partitioning, while it returns error |
@kaixuanliu You can place the pytorch_geometric/test/utils/test_scatter.py Lines 25 to 27 in 0f0e0da
|
…ion in example Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
add CHANGELOG; fix unit test error; add train_idx and test_idx partit…
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
adapt to new implementation of LocalFeatureStore
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
delete partition example temporarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I cleaned this up a bit. Especially, Partitioner
no longer stores LocalFeatureStore
and LocalGraphStore
instances, as it is bad practice to pickle arbitrary Python objects. Instead, it saves Python dictionaries now, so please construct LocalFeatureStore
and LocalGraphStore
instances from this when loading the data from disk.
Otherwise, looks good. I would be in favor of just adding the homogeneous code path though. I am not really 100% confident the heterogeneous code path is correct.
Thanks Matthias! It is a better practice to replace LocalFeatureStore/LocalGraphStore with python dictionary, as it does not need customized data structure. For heterogeneous graph partition, we have checked the output of partition for ogbn-mags dataset, and we will further validate its correctness in later development for hetero graph distributed training. |
This code belongs to the part of the whole distributed training for PyG.
This class (partitioner.py) will implement
The partition folders as below-
output_dir/
|-- META.json
|-- node_map.pt
|-- edge_map.pt
|-- part0/
We also provide two example codes to help generate the homo/hetero graph partition based on ogbn-products/ogbn-mags under example/distributed folder.
One unit test code under /test folder is used to verify this partition algorithm based on FakeDataset/FakeHeteroDataset.
Any comments please let us know. thanks