-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed
] Partition MovieLens
dataset
#8815
Conversation
8652f1c
to
b0f1d7e
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #8815 +/- ##
==========================================
- Coverage 89.87% 89.39% -0.49%
==========================================
Files 479 479
Lines 31136 31145 +9
==========================================
- Hits 27984 27842 -142
- Misses 3152 3303 +151 ☔ View full report in Codecov by Sentry. |
2cf2cd5
to
6a5cacc
Compare
Distributed
] Partition MovieLens
dataset
train_data, val_data, test_data = T.RandomLinkSplit( | ||
num_val=0.1, | ||
num_test=0.1, | ||
neg_sampling_ratio=0.0, | ||
edge_types=[edge_type], | ||
rev_edge_types=[('movie', 'rev_rates', 'user')], | ||
)(dataset[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we drop the adjustment of message passing here computed in train_data
, val_data
and test_data
. This is not necessarily a blocker (since I don't necessarily see a good way to fix this for now), but something you should be aware of. Currently, you would like information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't get that. What do you mean by "we drop the adjustment of message passing here computed in train_data, val_data and test_data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edge_index
is different across different splits for link prediction tasks (in order to not leak information).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. What about the solution used in the temporal_link_pred example?: temporal_link_pred.py #L27 Maybe we can use that one instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using temporal sampling is usually a good option to resolve this.
Changes made: