The process of computationally identifying and categorizing opinions expressed in a piece of text, especially to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral. Understanding people’s emotions is essential for businesses since customers are able to express their thoughts and feelings more openly than ever before. By automatically analysing customer feedback, from survey responses to social media conversations, brands are able to listen attentively to their customers, and tailor products and services to meet their needs.
1. IDE - Pycharm
2. BERT - Pre-Trained Model
3. GPU - P-4000
4. Google Colab - Text Analysis
5. Flask- Fast API
6. Postman - API Tester
7. TensorFlow - Hub - Convert to the Tokenizer
🔑 Prerequisites All the dependencies and required libraries are included in the file requirements.txt
Python 3.6
IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.
For Dataset Please click here
- Clone the repo
git clone https://github.com/KrishArul26/Sentiments-Classifier-using-BERT.git
- Change your directory to the cloned repo
cd Sentiments-Classifier-using-BERT
- Create a Python 3.6 version of virtual environment name 'bert' and activate it
pip install virtualenv
virtualenv bert
bert\Scripts\activate
- Now, run the following command in your Terminal/Command Prompt to install the libraries required!!!
pip install -r requirements.txt
Type the following command:
python app.py
After that You will see the running IP adress just copy and paste into you browser and import or upload your speech then click the predict button.
In this section, contains the project directory, explanation of each python file presents in the directory.
Below picture illustrate the complete folder structure of this project. This folder will keep the model that have been trained on the dataset using BERT architecture.
The following image illustrates the file train_model_bert.py. It does the necessary text cleanup, such as removing punctuation and numbers. And it creates tokenizers from the TnsorFlo - Hub Bert model. These tokenizers are padded according to the specified length. Finally, the BERT model is trained using the train dataset.
Below picture illustrate the prediction File.py, After done with train the BERT model, This file processes the test data in the same way as the training data and will predict the test data.
Below picture illustrate the index.html file, these files use to create the web frame for us.
The following image illustrates picture illustrate the main.py. After evaluating the model BERT, this file creates the rest - API. To do this, it uses FLASK frameworks and receives the request from the client. Then it is posted to the prediction files and the response is delivered through the web browser.
The pre-trained model BERT is the best attention model for NLP task. Here BERT works well with text for sentiment analysis, but its model weight is somewhat greater than that of other NLP models.