Skip to content

sentiment analysis on Amazon review using Vader, Bert, and Roberta

License

Notifications You must be signed in to change notification settings

tianw52/sentiment_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment_analysis

Sentiment analysis on Amazon food review using Vader, Bert, and Roberta.

Also talked about the pros and cons of each approches briefly.

Data overivew

Screenshot 2023-11-07 at 00 48 22

The dataset has 50,000+ rows with 10 different features, in the project we mainly used "Score" which is the review stars, and "Text" which is the actual review.

Screenshot 2023-11-07 at 00 48 57

Notice that the reviews are unbalanced, most of the reivews are highly positive.

** NOTE: for the purpose of this project, we use the stars are references to verify the accuracy of our models.**

1. Vader

Firstly, we use Vader, which is a bag-of-n-garms approach.

Sample Result

Screenshot 2023-11-07 at 00 52 45

Results Visualization

Screenshot 2023-11-07 at 00 52 58

Drawback

Screenshot 2023-11-07 at 00 53 13

In the above example we see that the result is not accurate at all. This is because the Vader model use bag of n-grams approach, i.e. it take tokens individually without considering the sequence/order, which is important in human languages, for sentiment analysis

** For Berta and Roberta, we only use the first 1000 rows of the dataset**

2. Bert

Secondly, we use Bert which is a bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.

Sample Result

To verify our result, we compare the model output ('sentiment score') with the user input review stars ('Score'). Screenshot 2023-11-07 at 01 48 40

We defined aligned to be 1 if the model output is within +- 1 of the user input stars.

Screenshot 2023-11-07 at 01 53 25 Screenshot 2023-11-07 at 01 56 54

The accuracy is about 92.43%.

3. Roberta

Lastly, we use Roberta which builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates.

Sample Result

Screenshot 2023-11-07 at 02 03 04

Results Visualization

Screenshot 2023-11-07 at 00 54 03

Like Vader, the positive and negative sentiment scores behave like what we expected (the more the stars, the more positive the reviews). However, unlike how the neutral sentiments score are evenly distributed in Vader, Roberta has a normal distribution, which make sense because 3 out of 5 means a netural rating and the more extreme the stars, the less netural the rating.

Releases

No releases published

Packages

No packages published