This study is a Natural Language Processing project which is one of the artificial intelligence applications. This project was carried out in order to analyze the sentiment from Twitter comments and to understand whether the text message (SMS) received on the phone is unsolicited message (spam). Later, it was integrated into the web and a more understandable and simple graphical interface was created for the users.
Client
- HTML
- CSS
- JavaScript
Server
- Python - Flask
Database
- MySQL
- Prediction of the sentiment of the given sentences
- Classification of SMS as spam or ham
- You can create a new dataset (via User Sentences)
- Recording the messages sent from the user to the database
- Vanilla language switcher
- Searching for a specific word in datasets
Two different data sets were used in the project. The first is Sentiment140, which is used for sentiment analysis. Sentimen140 is consist of 1.6 million tweets and labelled as "positive" or "negative". The second is the SMS Spam Collection Dataset used for sms classification. SMS Spam Collection Dataset contains almost 5.6k English SMS. Also, this dataset is labeled as two classes too (Spam - Ham). The spam class contains about 5k of data.
In this section, topics such as model training and preprocessing will be discussed. The Sentiment dataset has been cleaned of some special characters like "@, http, 0-9". In addition, the stop words have been removed. Then, Word2vec was trained from these tokens. After that, these texts are pad_sequenced with a maximum length of 300. After the embedding layer was created, the vanilla LSTM model was builded. The final accuracy of the model is 79.10%. The model architecture can be seen in the figure below.
Model Architecture (Image by Author)
The Spam dataset was trained with Multinomial Naive Bayes algorithm is a Bayesian learning approach popular in Natural Language Processing (NLP).
The Web Application consists of 5 pages which can be seen in the gif above. These are Home, Project, About, Contact and finally Dataset page.
Users can submit their opinions, suggestions or problems about the project after filling out the form on the Contact page. Some information in the form is recorded in the database.
SQL query that saves data to MySQL database:
CREATE TABLE contact (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
email VARCHAR(30) NOT NULL,
company_name VARCHAR(50),
message VARCHAR(200),
reg_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
Web App offers you two different language support. One is in English and the other is in Turkish. This option is made with vanilla Javascript and is open for development.
1.Fork this repository.
git clone https://github.com/MelihGulum/Sentiment-Analysis-and-Spam-Classification.git
2.Load the dependencies of the project
pip install -r requirements.txt
3.Now you can run project.
flask --app app.py --debug run