Sentiment Analysis Project

This project focuses on sentiment analysis of movie reviews using various machine learning models. It includes data preprocessing, model training, hyperparameter tuning, and a web interface for users to input their reviews and receive sentiment predictions.

Overview

The goal of this project is to classify movie reviews as positive or negative. Various models like Naive Bayes, Support Vector Machine (SVM), Random Forest, Logistic Regression, and Gradient Boosting are trained and evaluated. The best models are saved, and an ensemble model is created for better performance.

Dataset

The dataset used in this project is a subset of the IMDB movie reviews dataset, which includes both positive and negative reviews. The data is cleaned, tokenized, lemmatized, and vectorized using methods like Bag of Words, TF-IDF, and Word2Vec.

dataset: IMDB Dataset of Movie Review

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/your-username/sentiment-analysis.git
cd sentiment-analysis

Create and activate a virtual environment (optional but recommended):

python3 -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Download NLTK data:

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

Project Structure

The project is organized as follows:

sentiment_analysis/
│
├── saved_models/           # Directory containing the trained machine learning models
├── IMDB.csv                # Dataset file with movie reviews for sentiment analysis
├── sentiment_analysis.ipynb # Jupyter Notebook with the complete sentiment analysis code
├── README.md               # Project overview and instructions
├── app.py                  # Flask application script for web interface
├── templates/              # Directory containing HTML templates for the web app
└── static/                 # Directory containing static files like CSS for the web app

Models Used

The following models were trained and evaluated:

Naive Bayes
Support Vector Machine (SVM)
Logistic Regression
Random Forest
Gradient Boosting

Additionally, a Voting Classifier (Ensemble Model) was created by combining the predictions of the above models.

Usage

Data Preprocessing

The dataset is preprocessed using techniques like:

Text cleaning: Removing HTML tags, punctuation, and stopwords.
Tokenization: Splitting text into individual words.
Lemmatization: Reducing words to their base forms.
Vectorization: Converting text into numerical features using Bag of Words, TF-IDF, and Word2Vec.

Model Evaluation

The performance of each model is assessed on the test set using metrics such as accuracy, precision, recall, and F1-score.

Web Application

A Flask web app is available for users to input their reviews and view the sentiment predictions from each model.

Running the Web App

To run the web application:

Navigate to the project directory.
Start the Flask app by running the following command:
```
python app.py
```
Open a browser and go to http://127.0.0.1:5000/.
Enter a review in the text box and submit it to see the sentiment predictions.

Results

The project includes the following visualizations:

Word Cloud: Displays common words in the dataset.
Bar Plot: Shows the most frequent words.
Confusion Matrices: Visualizes model performance.

The ensemble model achieved the highest accuracy and is used as the default model in the web application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Project

Table of Contents

Overview

Dataset

Installation

Project Structure

Models Used

Usage

Data Preprocessing

Model Evaluation

Web Application

Running the Web App

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
saved_models		saved_models
static		static
templates		templates
.gitattributes		.gitattributes
IMDB.csv		IMDB.csv
README.md		README.md
app.py		app.py
sentimentAnalysis.ipynb		sentimentAnalysis.ipynb

Chirag6525/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Project

Table of Contents

Overview

Dataset

Installation

Project Structure

Models Used

Usage

Data Preprocessing

Model Evaluation

Web Application

Running the Web App

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages