-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing Subgraph Methods #6613
Fixing Subgraph Methods #6613
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching and fixing this.
torch_geometric/data/data.py
Outdated
if (subset[:-1] > subset[1:]).any(): | ||
node_idx = torch.argsort(subset) | ||
edge_index = node_idx[edge_index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would simply calling torch.unique
at the start of the function resolve this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fix the bug, too.
The solution here has the additional utility of allowing a user to change the order of vertices to a new order of their choice. If this is not considered useful for these methods, we could also simply switch to always sorting all integer lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can change the order outside of this function if needed. So lets switch to sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I have switched to using torch.unique at the beginning of both methods. This also addresses my question regarding handling duplicate indices.
Perhaps we should add a reorder_nodes
method as a utility function? But this is a separate PR I guess.
Codecov Report
@@ Coverage Diff @@
## master #6613 +/- ##
=======================================
Coverage 87.55% 87.56%
=======================================
Files 422 422
Lines 22879 22882 +3
=======================================
+ Hits 20032 20036 +4
+ Misses 2847 2846 -1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This pull request addresses a bug in both
Data.subgraph
andHeteroData.subgraph
, which yield incorrect results when an unsorted list of integers is used to select a subgraph. In this case, the attributes of nodes are reordered according to the given list of integers, but the ids used inedge_index
are not. Therefore, the node features will be distributed incorrectly in the output subgraph.Note that
Data.subgraph
does use theutils.subgraph
method with therelabel_nodes=True
option to compute the reduced edge list. However, this is insufficient to ensure a correct output, as this only shifts node ids downward to ensure consecutive ids. It has no effect on the actual order of nodes, which causes this problem.Consider the following example:
With torch_geometric==2.3.0 (current nightly) we obtain the following output:
Note that in the original graph each edge has a source and destination with the same node label. In the subgraph, the remaining edge has a source with label 0 and a destination with label 1. This is therefore not a valid subgraph.
I have addressed this bug by re-mapping the node ids in
edge_index
to their correct new index if the input is indeed an unsorted integer list.HeteroData.subgraph
had the same issue with an analog solution. I also extended the tests to cover these issues.Beyond this, I was wondering what the behaviour of
Data.subgraph
should be if the argument is a list of integers with duplicate entries. Should the method raise an Exception, ignore duplicates or create duplicate vertices? I have not implemented anything in this regard yet, but I do think this should be specified.