A large social network of GitHub developers was collected from the public API in June 2019. The vertex features were extracted based on the location, repositories starred, employer and e-mail address. Link prediction is performed to predict whether pairs of GitHub developers will have mutual followers or not in the future.
- Nodes are developers who have starred at least 10 repositories.
- Each node is binary labelled (web or a machine learning developer).
- Edges are mutual follower relationships between the GitHub developer.
- Edges will then be split for training node embeddings and link prediction model.
link prediction.ipynb demonstrate the implementation of the algorithms.
1. Split graph into training graph and test graph.
2. Split edges for training link embeddings and link prediction model.
3. Calculate and save link embeddings for the whole graph.
4. Reduce dimension and visualize link embeddings on a 2-D scale.
5. Train link prediction classifier.
6. Evaluate the classifier on the test data.