This project aims to dissect and analyze the intricacies of user engagement across two distinct social platforms: Farcaster and Reddit, a well-established community-driven forum. By extracting and comparing data from Reddit's API and Dune Analytics (focusing on Farcaster), we delve into the dynamics of user interaction, content popularity, and the overall growth trajectory of these platforms.
- Reddit: Utilized the Reddit API to fetch detailed metrics surrounding subreddit engagement, including post interactions and overall community engagement stats.
- Farcaster: Executed targeted queries via Dune Analytics, focusing on extracting expansive datasets related to Farcaster's channel activities and user engagement metrics.
A multi-faceted approach was adopted for data preparation, ensuring a robust foundation for the ensuing analysis. Key steps included:
-
Data Cleansing:
- Extracting and refining channel and subreddit names from embedded HTML tags.
- Standardizing column names across datasets to maintain consistency.
-
Data Enrichment:
- Deriving key engagement metrics, including ratios and growth indices.
- Aggregating statistics to capture overall and average engagement insights.
-
Data Merging:
- Integrating Farcaster and Reddit data on comparable metrics (e.g., channels vs. subreddits) to facilitate a direct comparative analysis while maintaining data integrity.
Interactive Python scripting via Jupyter Notebooks was leveraged to:
- Identify Engagement Trends: Isolating highly active communities by scrutinizing engagement trends over selected periods.
- Highlight Growth Dynamics: Ranking Farcaster channels by growth, considering channel age and activity metrics.
- Draw Comparative Insights: Directly contrasting Farcaster channels against Reddit subreddits to uncover unique patterns and behaviors.
- Python: For comprehensive data manipulation and analysis.
- Key Libraries:
pandasfor data analysis,matplotlibfor visualization,PRAWfor fetching Reddit data.
- Key Libraries:
- Dune Analytics: Accessed for Farcaster data insights.
- Jupyter Notebook: Utilized for conducting and documenting the analysis process.
- Python 3.8 or later
- Jupyter Notebook or JupyterLab
-
Clone the repository:
git clone https://github.com/ash-rk/RedditxFarcaster.git
-
Install python packages: pip install pandas praw matplotlib
-
Setup credentials in .secret file REDDIT_CLIENT_ID='your_client_id' REDDIT_CLIENT_SECRET='your_client_secret' DUNE_API_KEY='your_dune_api_key' # If applicable
Follow these steps to execute the analysis:
-
Reddit Data Extraction
- Navigate to the
Redditfolder and run the script for extracting Reddit data.python3 analyze_subreddits.py
- Navigate to the
-
Farcaster Data Extraction
- Move to the
Farcasterfolder and execute the script to gather Farcaster channel and user metrics.python3 fc_channel_query.py
- Move to the
-
Data Cleaning
- Still within the
Farcasterfolder, run the following script to clean the gathered Farcaster data.python3 dune_result_analysis.py
- Still within the
-
Farcaster Follower data
- Still within the
Farcasterfolder, run the following script to clean the gathered Farcaster data.python3 scrape.py
- Still within the
-
Analysis Notebook
- Open the
Farcaster vs Reddit Analysis.ipynbnotebook located in the main directory and run all cells to perform the analysis.
- Open the
Note: The output files generated from steps 1 to 3 are stored in the data_retrieved folder for easy access.
This project is licensed under the MIT License