Sentiment analysis on Amazon food review using Vader, Bert, and Roberta.
Also talked about the pros and cons of each approches briefly.
The dataset has 50,000+ rows with 10 different features, in the project we mainly used "Score" which is the review stars, and "Text" which is the actual review.
Notice that the reviews are unbalanced, most of the reivews are highly positive.
** NOTE: for the purpose of this project, we use the stars are references to verify the accuracy of our models.**
Firstly, we use Vader, which is a bag-of-n-garms approach.
In the above example we see that the result is not accurate at all. This is because the Vader model use bag of n-grams approach, i.e. it take tokens individually without considering the sequence/order, which is important in human languages, for sentiment analysis
** For Berta and Roberta, we only use the first 1000 rows of the dataset**
Secondly, we use Bert which is a bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.
To verify our result, we compare the model output ('sentiment score') with the user input review stars ('Score').
We defined aligned to be 1 if the model output is within +- 1 of the user input stars.
The accuracy is about 92.43%.
Lastly, we use Roberta which builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates.
Like Vader, the positive and negative sentiment scores behave like what we expected (the more the stars, the more positive the reviews). However, unlike how the neutral sentiments score are evenly distributed in Vader, Roberta has a normal distribution, which make sense because 3 out of 5 means a netural rating and the more extreme the stars, the less netural the rating.