A sentiment analysis model trained using a Kaggle GPU. Sentiment140 Dataset, with 1.6 million tweets.
**Deployed on my personal Docker Hub repository: Click here
**Kaggle Notebook link: Kaggle notebook
- Train/test split : 90% / 10%
- Size : 1.6M samples
- Link : Dataset
- Model type : Sequential, RNN, Binary classification
- Optimizer : Adam
- Loss function : Binary cross entropy
- Outputs : Sentiment score [0;1]
- Thresholds (fine-tuned): >=0.625 ---> "Positive", <0.625 ----> "Negative"
- Best validation accuracy : 83%
- F1-score : 0.8340
- Version : 4
Metric | Score |
---|---|
Precision | Negative: 0.84; Positive: 0.82 |
Recall | Negative: 0.82; Positive: 0.84 |
F-1 score | Negative: 0.83; Positive: 0.83 |
- Training epochs : initially 50, but 22 with early stopping and a patience factor = 10
- Training environment : Kaggle GPU
**There's also a useful script (command line runner) that converts .h5 models to TF SavedModel format here
- Collected using the Twitter API
- Scripts for searching and saving 100*n tweets containing a keyword : Tweets about Messi & Tweets about Ronaldo
NOTE: Executing these scripts requires a developer account, as well as a bearer_token stored into a text file whose path is manually given into the code, or exported as an environment variable
- Deep Learning Framework : Tensorflow 2.6 or higher
- Data visualization : Pandas, Seaborn, Matplotlib
- Regular expressions builder : re
- NLP library : NLTK
- Train/test splitting, classification_report : Scikit-learn