This application features a GUI for classifying text from user input as spam or ham using a Naive Bayes algorithm for machine learning. The model is trained to analyze the text from a two dataset with a total of 15560 records from Kaggle. https://www.kaggle.com/datasets/zubairmustafa/spam-and-ham-classification-balanced-dataset & https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
- Model Training: Load dataset, preprocess text data, split dataset, train a Naive Bayes classifier, evaluate the model, and save the trained model and vectorizer.
- Model Loading: Load the pre-trained model and vectorizer from disk.
- Text Classification: Classify user input text as spam or ham using the trained model.
- Graphical User Interface (GUI): A Tkinter-based GUI for entering text and displaying classification results.
- Python 3.x
- pandas
- scikit-learn
- joblib
- tkinter
-
Training the Model:
- Ensure your dataset (
spam_and_ham_classification.csv
&spam1.csv
) is in the same directory as the script. - Uncomment the
train_model()
call in the main execution block if the model is not already trained and saved. - Run the script to train the model and save it to disk.
- Ensure your dataset (
-
Running the Application:
- Ensure the model and vectorizer are saved in the current directory (
spam_classifier_model.pkl
andtfidf_vectorizer.pkl
). - Run the script to launch the GUI.
- Enter your message in the text box and click "Submit" to classify the message as spam or ham.
- Ensure the model and vectorizer are saved in the current directory (
-output from a real spam message I recieved.
-output from a real text I recieved.
spam_classifier.py
: Main script containing functions for model training, loading, text classification, and GUI creation.spam_and_ham_classification.csv
&spam1.csv
: (Required for training) Dataset containing labeled messages for spam and ham classification.spam_classifier_model.ipynb
: jupyter notebook conating the same training and testing of model with detail explanation of the steps.
Feel free to open issues or submit pull requests with improvements or bug fixes. Contributions are welcome!
This project is not 100% accurate!