This is a simple SMS Spam Classifier that predicts whether a given text message is spam or ham (not spam). It demonstrates the full machine learning pipeline — from preprocessing to deployment.
- Source: UCI SMS Spam Collection Dataset (via Kaggle)
- The dataset is imbalanced, with a higher number of ham messages compared to spam.
- As a result, prediction performance is reasonable but not highly optimized due to class imbalance.
pandas
,numpy
– data manipulation and analysisnltk
– text preprocessing (tokenization, stopword removal, stemming)sklearn
– model training, evaluation, and vectorization (TF-IDF)matplotlib
,seaborn
– data visualizationflask
– for creating a simple web interface for user interaction
- After testing various models, Multinomial Naive Bayes was selected for its simplicity and effectiveness in text classification tasks.
- A basic Flask web app was created so users can enter an SMS message and receive a real-time prediction.
- This improves usability and demonstrates how a machine learning model can be integrated into a web interface.
- The classifier performs reasonably well but may be improved by addressing the data imbalance (e.g., using SMOTE or class weighting).
- Future improvements could include better evaluation metrics, model tuning, and UI enhancements.
- Ali Shayan – GitHub Profile
Feel free to fork the repo, suggest improvements, or report issues!