This project explores sentiment analysis of Amazon beauty product reviews by fine-tuning the RoBERTa model. As a baseline comparison, a rule-based approach was applied to understand the difference in performance between traditional sentiment analysis methods and deep learning models.
- Classify product reviews into Positive, Negative, or Neutral sentiments.
- Use rule-based technique (VADER) as a baseline to understand the performance gap between rule-based and deep learning approaches.
- Compare how well RoBERTa's sentiment predictions align with the product ratings associated with the reviews.
- Visualise the correlation between predictions and product ratings.
The dataset is sourced from Amazon Reviews 2023. It contains customer reviews and their corresponding product ratings (0-5) for beauty products.
- Preprocessing: Basic text cleaning like lowercasing, removing special characters, and tokenization to prepare the text data.
- Rule-Based Approach (VADER): VADER (Valence Aware Dictionary and sEntiment Reasoner) is a simple but effective sentiment analysis tool that uses pre-defined word lists with sentiment scores. It’s fast but often struggles with context and sarcasm.
- Deep Learning Approach (RoBERTa): I fine-tuned the RoBERTa transformer model on the dataset to classify the sentiments more accurately.
- Comparison: The sentiment classifications from both methods were compared with the product ratings to understand which method better reflects customer satisfaction.
- Visualisation: Plots were generated to visualise how well each model’s predictions aligned with the numerical ratings.
I wanted to see how much of an improvement deep learning models like RoBERTa offer over traditional methods like VADER — especially when applied to product reviews, where sentiment isn't always straightforward.
- Python
- Transformers
- Scikit-learn
- Matplotlib, Seaborn
- Pandas
- NLTK (for VADER)
- The RoBERTa model consistently provided more accurate sentiment classifications compared to VADER.
- Visualisations showed that RoBERTa's predictions aligned more closely with user ratings, especially in distinguishing neutral and mixed reviews.
Archit Rawat
This project is licensed under the MIT License.