This is a friend recommendation system used on social media platforms (e.g. Facebook, Instagram, Twitter) to suggest friends/new connections based on common interests, workplace, common friends etc. using Graph Mining techniques. Here, we are given a social graph, i.e. a graph structure where nodes are individuals on social media platforms and a directed edges (or 'links') indicates that one person 'follows' the other, or are 'friends' on social media. Now, the task is to predict newer edges to be offered as 'friend suggestions'.
Given a directed social graph, have to predict missing links to recommend users. (Link Prediction in Graph)
Taken data from facebook's recruting challenge on Kaggle Data contains two columns source and destination eac edge in graph.
- Data columns (total 2 columns):
- source_node int64
- destination_node int64
-
Generated training samples of good and bad links from given directed graph and for each link got some features like no of followers, is he followed back, page rank, katz score, adar index, some svd fetures of adj matrix, some weight features etc. and trained ml model based on these features to predict link.
-
Some reference papers and videos :
- No low-latency requirement.
- Probability of prediction is useful to recommend ighest probability links
- Both precision and recall is important so F1 score is good choice
- Confusion matrix
Decision Tree based approached proved to be quite effective for this problem statement and since the number of features constructed is not too large, bagging and boosting approaches could be easily employed for high precision and easy training.
Here are the details and performance metrics of the classifiers used :
Model | No. of Base Learners | Max Depth of Base Learners | Training F1-score | Testing F1-score |
---|---|---|---|---|
Random Forest | 121 | 14 | 0.964 | 0.921 |
XGBoost | 109 | 10 | 0.992 | 0.926 |