Skip to content

This project performs topic modeling on Reddit posts using BERTopic. It retrieves and processes data from Reddit, applies topic modeling, and visualizes key topics discussed within a subreddit. πŸš€

Notifications You must be signed in to change notification settings

Nikhil-1705/reddit_topic_modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Œ Reddit Topic Modelling

πŸ“– Overview

This project performs topic modeling on Reddit posts using BERTopic. It retrieves and processes data from Reddit, applies topic modeling, and visualizes key topics discussed within a subreddit. πŸš€

πŸ”§ Features

  • βœ… Data Retrieval: Extracts posts and comments from Reddit πŸ“₯
  • βœ… Text Preprocessing: Cleans and prepares text for analysis πŸ› οΈ
  • βœ… Topic Modelling: Uses BERTopic to identify key discussion topics 🎯
  • βœ… Visualization: Generates insightful visual representations of the extracted topics πŸ“Š
  • βœ… Labelling Docs: Generates a well labelled Json file for your Docs file, segregating each document to its topic (example at the end)

πŸ“‚ Project Structure

πŸ“ Reddit Project
β”‚-- πŸ“œ README.md
β”‚-- πŸ“œ Data_retrieval.ipynb  # Fetches Reddit data using API
β”‚-- πŸ“œ main.ipynb            # Preprocessing & Topic Modeling
β”‚-- πŸ“ data                  # Stores raw & processed data
β”‚-- πŸ“ models                # Trained topic models
β”‚-- πŸ“ visualizations        # Graphs and topic word clouds

πŸ› οΈ Setup & Installation

1️⃣ Clone the Repository

git clone https://github.com/Nikhil-1705/reddit_topic_modelling.git
cd reddit_topic_modelling

2️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

πŸš€ How to Run the Project

  1. Run Data Retrieval Notebook πŸ“₯
    jupyter notebook Data_retrieval.ipynb
  2. Run Topic Modeling Notebook 🧠
    jupyter notebook main.ipynb

πŸ“Š Output Examples

  • Word Clouds showing top words in topics πŸŒ₯️
  • Bar Charts ranking most common topics πŸ“Š
  • Tables listing top keywords per topic πŸ“œ

πŸ“ Contributing

Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request. 🀝

πŸ“œ License

This project is licensed under the MIT License. πŸ”“


πŸ“© Developed by Nikhil Bhandari |

newplot newplot1 newplot image

About

This project performs topic modeling on Reddit posts using BERTopic. It retrieves and processes data from Reddit, applies topic modeling, and visualizes key topics discussed within a subreddit. πŸš€

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published