Social Network Analysis with Yelp Academic Dataset
-
Dependencies
-
Check out
# Cloned from https://github.com/kevinmao/yelp-sna cd ~ git clone https://github.com/kevinmao/yelp-sna cd yelp-sna
-
Download Yelp Dataset Challenge
# download from http://www.yelp.com/dataset_challenge
-
Unzip Yelp data
make xunzip
-
Transform JSON format Yelp data to TSV format
make transform
-
Split Yelp reviews into training and test sets
make split_review
-
Map graphs into integer IDs
make user_keys make business_keys make user_user
-
Create user-business review data
make ub_review
-
Extract MaxWcc
make get_maxwcc
-
Create core datasets
make core_review make core_review_sample # sampling
-
Create similarity scores
make ub_similarity
-
Create top_n candidates on Hadoop
make top_n
-
Link prediction
make link_cand_summary make predicted_topn
-
Create training data for matrix factorization, training and predicting
make create_mf_data make mf_train make mf_predict
-
Create training data for matrix factorization, training and predicting
make create_mf_data make mf_train make mf_predict
-
Link prediction for matrix factorization
make top_n_mf make mf_predicted_tp
-
Combine link prediction results
make precision_comb make combine
-
Create plots
# degree distribution make degree_dist # graph statistical info make graph_info # precision, precision@n and recall make octave_plot