Skip to content

A sentiment analysis model trained with Kaggle GPU on 1.6M examples, used to make inferences on 220k tweets about Messi and draw insights from their results.

License

Notifications You must be signed in to change notification settings

Justsecret123/Twitter-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-sentiment-analysis Language_support Last_commit Workflow Tensorflow_version

A sentiment analysis model trained using a Kaggle GPU. Sentiment140 Dataset, with 1.6 million tweets.

**Deployed on my personal Docker Hub repository: Click here

**Kaggle Notebook link: Kaggle notebook

Dataset (Sentiment140+GloVe)

  • Train/test split : 90% / 10%
  • Size : 1.6M samples
  • Link : Dataset

Model

  • Model type : Sequential, RNN, Binary classification
  • Optimizer : Adam
  • Loss function : Binary cross entropy
  • Outputs : Sentiment score [0;1]
  • Thresholds (fine-tuned): >=0.625 ---> "Positive", <0.625 ----> "Negative"
  • Best validation accuracy : 83%
  • F1-score : 0.8340
  • Version : 4
Metric Score
Precision Negative: 0.84; Positive: 0.82
Recall Negative: 0.82; Positive: 0.84
F-1 score Negative: 0.83; Positive: 0.83

Training

  • Training epochs : initially 50, but 22 with early stopping and a patience factor = 10
  • Training environment : Kaggle GPU

Architecture

Model_architecture

Inferences (with Tensorflow Serving REST API)

Inference example

Some results using Power BI + Python

Positive tweets

Positives

Negative tweets

Negatives

Data by country (when available)

Country

Useful scripts and notebooks

Notebooks

Training notebook

How inferences were made on our dataset

Data cleaning notebook

Data exploration notebook

Scripts

Link to the Tensorflow Sevring script

**There's also a useful script (command line runner) that converts .h5 models to TF SavedModel format here Args

Data collection (tweets about Messi and Ronaldo)

NOTE: Executing these scripts requires a developer account, as well as a bearer_token stored into a text file whose path is manually given into the code, or exported as an environment variable

Libraries

  • Deep Learning Framework : Tensorflow 2.6 or higher
  • Data visualization : Pandas, Seaborn, Matplotlib
  • Regular expressions builder : re
  • NLP library : NLTK
  • Train/test splitting, classification_report : Scikit-learn