🧠 NLP Business Case: Automated Customer Reviews

📌 Overview

This project builds a smart product review analysis and recommendation system powered by Natural Language Processing (NLP) and Generative AI. It automates the end-to-end process of understanding customer sentiment, clustering products, and summarizing key insights—helping businesses make data-driven decisions and enhance product visibility.

🚀 Key Features

✅ Sentiment Classification: Categorize reviews into Positive, Neutral, or Negative.
📦 Product Clustering: Group products into 4–6 meaningful categories using unsupervised learning.
🧠 Generative AI Summarization: Summarize reviews and generate product highlights using Large Language Models (LLMs).
📇 Product Card Generation: Create structured product summaries that can be integrated into a user interface.

🧩 Project Steps

🔹 Part 1: Data Preprocessing & Sentiment Label Classification

🎯 Objective

Clean and prepare the Amazon review data, then classify each review into Positive, Neutral, or Negative categories to gain valuable insights into customer sentiment.

📚 Dataset

We used Amazon product review data from Kaggle:

Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv
Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv

These were merged to create a more diverse and robust dataset.

⚙️ Workflow Summary

📍 Notebook: data_exploration_and_acquisition.ipynb, data_preprocessing.ipynb

Load and explore both datasets
Combine and clean the review data
Apply sentiment labeling using rule-based or lexicon-based methods (e.g., VADER)
Save final dataset as cleaned_amazon_reviews_final.csv

🔹 Part 2: Sentiment Classification with DistilBERT

🎯 Objective

Use a pretrained transformer model (DistilBERT) to classify Amazon reviews into sentiment categories more accurately than traditional models.

📂 Dataset

cleaned_amazon_reviews_final.csv with:
- full_review (cleaned review text)
- sentiment (labeled sentiment)

⚙️ Workflow Summary

📍 Notebook: Classification_Model_1.ipynb

Load and encode sentiment labels
Tokenize using Hugging Face’s DistilBERT tokenizer
Fine-tune DistilBertForSequenceClassification on labeled reviews
Evaluate model using accuracy and confusion matrix

✅ Achieved 95.29% accuracy

🔹 Part 3: Product Clustering & GPT Summarization

🎯 Objective

Organize products into clusters and generate summaries and recommendations using LLMs like GPT.

🛠️ Workflow Overview

Clustering
- Dimensionality reduction with UMAP
- Clustering with KMeans (11 clusters)
- Relabled the Clusters in 4
- Assign descriptive cluster_labels using GPT-based interpretation
Product-Level Aggregation
- Group reviews by name and brand
- Compute average rating, sentiment trends, and gather review samples
GPT-Based Summary Generation
- Identify top and worst products per category
- Output structured summaries in JSON:
Product Card Generation
- Final outputs rendered as Product Cards with:
  - Product name, brand, rating
  - Review highlights and representative image
  - Category and source URL

🛠️ Tech Stack

Python (pandas, numpy, sklearn, matplotlib, seaborn)
Hugging Face Transformers (DistilBERT)
UMAP, KMeans
OpenAI GPT-4 API (for summarization)
Jupyter Notebooks

📈 Results

✅ 95.29% accuracy on sentiment classification using DistilBERT.
🔍 Products grouped into coherent clusters such as Kindle, Fire Tablets, Accessories.
💬 GPT-generated summaries provide actionable insights and highlight standout products.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
Expriments		Expriments
clusters_summaries_json		clusters_summaries_json
data		data
models		models
reports		reports
visualization		visualization
.DS_Store		.DS_Store
.gitignore		.gitignore
Classification_Model_1.ipynb		Classification_Model_1.ipynb
Generative GPT3 model.ipynb		Generative GPT3 model.ipynb
Kmeans_Clustering_analysis.ipynb		Kmeans_Clustering_analysis.ipynb
Model_Comparison.ipynb		Model_Comparison.ipynb
NLP Project 3.pptx		NLP Project 3.pptx
README.md		README.md
bertopic_clustering_analysis.ipynb		bertopic_clustering_analysis.ipynb
clustering_analysis.ipynb		clustering_analysis.ipynb
data_exploration_and_acquisition.ipynb		data_exploration_and_acquisition.ipynb
data_preprocessing.ipynb		data_preprocessing.ipynb
gpt_summaries_by_category.json		gpt_summaries_by_category.json
product_summary_site.html		product_summary_site.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 NLP Business Case: Automated Customer Reviews

📌 Overview

🚀 Key Features

🧩 Project Steps

🔹 Part 1: Data Preprocessing & Sentiment Label Classification

🎯 Objective

📚 Dataset

⚙️ Workflow Summary

🔹 Part 2: Sentiment Classification with DistilBERT

🎯 Objective

📂 Dataset

⚙️ Workflow Summary

🔹 Part 3: Product Clustering & GPT Summarization

🎯 Objective

🛠️ Workflow Overview

🛠️ Tech Stack

📈 Results

About

Uh oh!

Releases

Packages

Languages

MercyMoparthy/project-nlp-business-case-automated-customers-reviews-v2

Folders and files

Latest commit

History

Repository files navigation

🧠 NLP Business Case: Automated Customer Reviews

📌 Overview

🚀 Key Features

🧩 Project Steps

🔹 Part 1: Data Preprocessing & Sentiment Label Classification

🎯 Objective

📚 Dataset

⚙️ Workflow Summary

🔹 Part 2: Sentiment Classification with DistilBERT

🎯 Objective

📂 Dataset

⚙️ Workflow Summary

🔹 Part 3: Product Clustering & GPT Summarization

🎯 Objective

🛠️ Workflow Overview

🛠️ Tech Stack

📈 Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages