A Python-based project that explores data processing, dataset cleaning, and recommendation algorithms using a simulated social network dataset.
This project demonstrates how structured user data can be analyzed to generate friend recommendations and page suggestions similar to features used by modern social platforms.
The system works with JSON datasets and applies basic data science workflows and graph-based reasoning to analyze relationships between users.
Social media platforms rely heavily on data analysis and recommendation systems to improve user experience.
This project recreates a simplified version of those mechanisms using Python.
The system processes a dataset containing:
- Users
- Friend relationships
- Liked pages
Using this data, the project performs multiple operations including:
- Data loading
- Dataset cleaning
- User network analysis
- Friend recommendation using mutual connections
- Page recommendation based on shared interests
The goal is to demonstrate how basic recommendation systems can be implemented using Python and structured datasets.
The project reads user and page data from JSON files and converts it into Python data structures for processing.
This step ensures that the dataset can be efficiently accessed and manipulated during analysis.
Real-world datasets often contain inconsistencies or redundant data.
The cleaning module improves the dataset by:
- Removing users with missing or empty names
- Eliminating duplicate friend connections
- Filtering inactive users with no activity
- Removing duplicate page entries
After processing, the cleaned dataset is saved for further analysis.
The project includes a module that displays the dataset in a readable format, showing:
- User names and IDs
- Friend connections
- Pages liked by each user
- Available pages in the dataset
This helps verify the dataset and understand the network structure.
This module implements a simplified "People You May Know" algorithm.
The recommendation logic works by:
- Identifying a user's direct friends
- Finding friends-of-friends
- Counting the number of mutual connections
- Ranking potential recommendations based on shared friends
This approach is commonly used in graph-based social network analysis.
The project also suggests pages a user may like based on shared interests with other users.
The algorithm works by:
- Identifying pages liked by the target user
- Finding other users with overlapping interests
- Suggesting pages those users like but the target user has not interacted with
This represents a simplified form of collaborative filtering, a common recommendation strategy.
| File | Description |
|---|---|
data.json |
Original dataset containing users and page information |
Data_set.json |
Dataset used for recommendation algorithms |
cleaned_data.json |
Output dataset after data cleaning |
data.py |
Loads and prints dataset contents |
data_cleaning.py |
Cleans and prepares dataset |
friends_recommendation.py |
Generates friend suggestions based on mutual friends |
pages_recommendation.py |
Recommends pages based on shared user interests |
Generate_data.py |
This generate a larger data set |
social_network_large.json |
It contains the data generated by Generate_data.py |
- Python 3
- JSON datasets
- Python Standard Library
Key Python concepts used:
- Dictionaries
- Lists
- Sets
- File handling
- Sorting algorithms
- Data processing
- Build a dictionary mapping users to their friends
- Identify the target user's direct friends
- Explore friends-of-friends
- Count mutual connections
- Rank suggested users by mutual friend count
- Identify pages liked by the target user
- Compare interests with other users
- Measure overlap in liked pages
- Recommend new pages liked by similar users
Example friend recommendation: People You May Know for User 17: [12, 9, 4]
Example page recommendation:
Pages You Might Like: [5, 8, 11]
This project focuses on building practical understanding of:
- Data preprocessing
- JSON dataset handling
- Social network relationships
- Graph-inspired recommendation logic
- Algorithmic problem solving
- Python data structures
Possible extensions to make the system more advanced:
- Network visualization using NetworkX
- Graph analysis (centrality, clustering)
- Recommendation ranking improvements
- Machine learning based recommendations
- Web interface for user interaction
- Larger and more realistic datasets
Eamon
MIT