Welcome to our collaborative Spam Detection System in Python project! This system utilizes a combination of classification and regression algorithms to accurately categorize messages as spam or ham. The development team, consisting of three members, employed a variety of tools and technologies such as Visual Studio Code, Jupyter Notebook, and Kaggle for an efficient and collaborative development process. The dataset used for training and testing the models was collected from Kaggle and includes labeled emails.
-
emails.csv: This file contains the dataset used for training and testing the spam detection model. It includes a comprehensive collection of labeled emails, distinguishing between spam and ham messages.
-
main.py: The main Python script responsible for implementing the spam detection system. This script encompasses data preprocessing, model training using both classification and regression algorithms, and the generation of predictions.
-
naive_model.pkl: A serialized pre-trained model stored using the pickle library. This model, trained on the Kaggle dataset, enables users to make quick predictions without retraining.
-
template.html: An HTML template file for the user interface of the spam detection project. This interface provides users with a straightforward platform to input messages and receive instant predictions regarding spam or ham classification.
-
data_analysis.ipynb: A Jupyter Notebook used for exploratory data analysis (EDA). The notebook provides insights into the dataset, aiding the development team in understanding and preprocessing the data effectively.
-
Classification and Regression Algorithms: The project offers a diverse set of algorithms for spam detection, allowing users to choose the approach that best fits their preferences and requirements.
-
Data Cleaning and Preprocessing: The dataset undergoes meticulous cleaning and preprocessing to enhance the model's accuracy and reliability. The Jupyter Notebook (
data_analysis.ipynb
) includes detailed steps of the preprocessing workflow. -
User-Friendly Interface: The HTML template (
template.html
) provides an intuitive interface for users to interact with the system, simplifying the process of inputting messages and receiving predictions. -
Collaborative Development: The development team utilized Visual Studio Code for efficient and collaborative coding. The inclusion of a Jupyter Notebook promotes collaborative exploratory data analysis.
-
Visual Studio Code: The primary development environment for its collaborative features and efficiency.
-
Jupyter Notebook: Used for in-depth exploratory data analysis, fostering collaboration and insights into the dataset.
-
Kaggle: The dataset was sourced from Kaggle, a valuable resource for diverse and well-labeled data.
- Clone the repository to your local machine..
- Utilize Visual Studio Code for further development or modification..
- Run
main.py
to train the models and preprocess the data. - Use
naive_model.pkl
for quick predictions without retraining. - Access the HTML template (
template.html
) for an interactive user interface for spam detection.
Feel free to contribute, raise issues, or provide feedback! We welcome collaboration and improvements to enhance the effectiveness of our spam detection system.