This is a simple machine learning project that classifies movie reviews as positive or negative using natural language processing (NLP). It’s built for beginners to understand the complete ML workflow — from data loading and preprocessing to model training, evaluation, and prediction — all in one Python script.
- Uses the NLTK Movie Reviews dataset (no external downloads needed)
- Converts text to numerical features using TF-IDF vectorization
- Trains a Logistic Regression classifier to predict sentiment
- Displays accuracy and classification report
- Allows custom review predictions directly from the terminal
- Python 3
- NLTK for dataset & tokenization
- Scikit-learn for ML model & evaluation
- NumPy / Pandas / Matplotlib (optional for future visualization)
- Load and shuffle labeled movie reviews from NLTK
- Convert raw text into TF-IDF vectors
- Train a Logistic Regression model on 80% of data
- Evaluate accuracy on the remaining 20%
- Predict sentiment for new review text inputs
py main.py~80–85% on the NLTK Movie Reviews dataset — great for a baseline sentiment classifier.