A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning. The 'Untitled.ipynb' file consists of a jupyter notebook of test codes. Rest of the .py files consist of code corresponding to their names.
Find the complete explanation to the approach here https://medium.com/@ankushraut/artificial-neural-network-for-text-classification-b7aa5994d985.
Problem Statement
-> Given a dataset mapping tweets to the associated emotions, an emotion recognizer algorithm needs to be created.
-> Libraries - Natural Language Tool-kit (NLTK) and Sci-kit learn
Pre - processing
-> Removal of regular expressions, symbols using the 're' library
-> Removal of lemmas (Lexicon Normalization) using WordNetLemmatizer from NLTK
-> Removal of multi-letter ambiguities, e.g 'noooo' gets converted to 'no'
-> (Optional) Removal of stop-words - caused decrease in f1-score as well as overall accuracy
A look at the data before and after pre-processing
before
after
Vectorization
-> Term frequency - inverse document frequency (TfidfVectorizer) deployed for converting the words to vectors (for SVM and Naive Bayes)
-> Bag of words representation used as an input for the sigmoid layers model
Model - 1
-> Support Vector Machine - Creation of hyperplanes separating all the classes, linear kernel.
Model - 2
-> Naive Bayes classifier - naively assuming no inter-dependence between words of a sentence corpus.
Model -3
-> Aritificial Neural Network - 3 layer neural network with sigmoid activation and gradient descent optimization