Movie Review Sentiment Prediction

This project is a sentiment analysis task on movie reviews. It was implemented as part of a term-break assignment and aims to classify the sentiment of movie reviews using machine learning models.

Dataset

The dataset comprises:

train.csv: Training data with labeled sentiments.
test.csv: Test data with unlabeled reviews.
movies.csv: Metadata related to the movies (possibly for enrichment).
sample.csv: A sample submission format for Kaggle.

Objective

Predict the sentiment of movie reviews as a classification problem. The task involves:

Data preprocessing and feature engineering
Exploratory Data Analysis (EDA)
Model training and evaluation
Generating predictions for test data

Technologies Used

Python
Pandas, NumPy
Matplotlib, Seaborn
Scikit-learn

Workflow

Import Libraries
- All necessary data science libraries are imported: numpy, pandas, sklearn, matplotlib, seaborn.
Load and Inspect Data
- Data is read from CSV files using Pandas.
- Checks for null values and performs basic statistical description using .describe() and .isnull().
Exploratory Data Analysis (EDA)
- Visualization of sentiment frequencies.
- Possibly investigates the distribution of sentiments and reviews.
Preprocessing
- Categorical encoding using LabelEncoder, OneHotEncoder, and OrdinalEncoder.
- Scaling with StandardScaler and MinMaxScaler.
Model Training
- Models used: SGDClassifier, RidgeClassifier, LogisticRegression.
- Uses cross_val_predict and RandomizedSearchCV for tuning and evaluation.
- Evaluated with metrics like precision, recall, confusion matrix, and classification report.
Prediction
- Predictions are made on the test dataset using the trained model.
- Output prepared in submission format.

Results and Metrics

The notebook includes metrics such as precision, recall, and confusion matrix to evaluate model performance.
Visual tools like ConfusionMatrixDisplay and precision_recall_curve are used for performance analysis.

How to Run

Clone the repository or download the notebook.
Ensure you have the required datasets (train.csv, test.csv, etc.) in the correct folder structure.
Install dependencies:

pip install numpy pandas scikit-learn matplotlib seaborn

Run the notebook using Jupyter or any IDE that supports .ipynb.

Folder Structure

.
├── train.csv
├── test.csv
├── movies.csv
├── sample.csv
├── 21f3000953-notebook-t22023.ipynb
└── README.md

Author

Name: Shreya Garg
Assignment: Term Break 1 — Sentiment Prediction on Movie Reviews
Platform: Kaggle

Notes

This notebook uses traditional ML models rather than deep learning or NLP techniques like LSTM or Transformers.
Label encoding and standard ML preprocessing are effectively applied.
Could be further improved by including NLP-based features like TF-IDF or word embeddings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Review Sentiment Prediction

Dataset

Objective

Technologies Used

Workflow

Results and Metrics

How to Run

Folder Structure

Author

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
21f3000953-notebook-t22023.ipynb		21f3000953-notebook-t22023.ipynb
README.md		README.md

21f3000953/MLP

Folders and files

Latest commit

History

Repository files navigation

Movie Review Sentiment Prediction

Dataset

Objective

Technologies Used

Workflow

Results and Metrics

How to Run

Folder Structure

Author

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages