Project Description

Datasets

I was given a sub-dataset, ub_sample_data.csv, from the Yelp review dataset that contains user,business pairs from yelp reviews.

Tasks

1.1 Graph Construction

I constructed a social network graph by assuming that each node is uniquely labeled and that links are undirected and unweighted. Each node represents a user, and there is an edge between two nodes if the number of common businesses reviewed by two users is greater than or equivalent to the filter threshold.

1.2 Task1: Community Detection Based on GraphFrames

I explored the Spark GraphFrames library to detect communities in the network graph. I used the Label Propagation Algorithm (LPA) provided by the library to detect communities.

1.3 Execution Detail

I used the version 0.6.0 of GraphFrames for Python and followed the provided instructions to install the package.

1.4 Output Result

I saved the result of communities in a txt file, following the specified format.

2.1 Task 2: Community Detection Based on Girvan-Newman algorithm

I implemented the Girvan-Newman algorithm to detect communities in the network graph using Spark RDD and standard Python or Scala libraries.

2.3 Betweenness Calculation

I calculated the betweenness of each edge in the original graph and saved the result in a txt file, following the specified format.

2.4 Community Detection

I divided the graph into suitable communities, which reached the global highest modularity, following the Girvan-Newman algorithm. I saved the result in a txt file, following the specified format.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Assignment 4.pdf		Assignment 4.pdf
README.md		README.md
example_output_2_1.txt		example_output_2_1.txt
example_output_task_2_2.txt		example_output_task_2_2.txt
task1.py		task1.py
task2.py		task2.py
ub_sample_data.csv		ub_sample_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Datasets

Tasks

1.1 Graph Construction

1.2 Task1: Community Detection Based on GraphFrames

1.3 Execution Detail

1.4 Output Result

2.1 Task 2: Community Detection Based on Girvan-Newman algorithm

2.3 Betweenness Calculation

2.4 Community Detection

About

Releases

Packages

Languages

drewm8080/data_mining_community_detection

Folders and files

Latest commit

History

Repository files navigation

Project Description

Datasets

Tasks

1.1 Graph Construction

1.2 Task1: Community Detection Based on GraphFrames

1.3 Execution Detail

1.4 Output Result

2.1 Task 2: Community Detection Based on Girvan-Newman algorithm

2.3 Betweenness Calculation

2.4 Community Detection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages