Email Classification for Abusive Content Detection

This project focuses on building a classification model to distinguish between abusive and non-abusive emails. With the increasing prevalence of online harassment and offensive communication, automated systems for identifying abusive content have become essential.

Introduction

This project aims to develop a machine-learning model that accurately categorizes emails into abusive and non-abusive categories. This can be particularly useful for email providers, social media platforms, and other online communication platforms to filter out harmful content and ensure a safer environment for their users.

Dataset

The dataset used for training and evaluation comprises a diverse collection of emails labeled as abusive or non-abusive. The dataset has been preprocessed to remove personally identifiable information and sensitive content.

Approach

The classification model is built using natural language processing (NLP) techniques and machine learning algorithms. The process involves:

Data Preprocessing: Cleaning and tokenizing the text data, removing stopwords, and performing other necessary preprocessing steps.
Feature Engineering: Extracting relevant features from the text data, such as TF-IDF vectors or word embeddings.
Model Selection: Evaluating various classification algorithms such as Naive Bayes, SVM, and neural networks to determine the most effective approach.
Training and Evaluation: Training the selected model on the labeled dataset and evaluating its performance using metrics such as accuracy, precision, recall, and F1-score.
Deployment: Integrating the trained model into an application or service for real-time classification of incoming emails.

Dependencies

Python 3.x
scikit-learn
NLTK
Pandas
NumPy

Results

The performance of the model on the test dataset is as follows:

Passive Aggressive Classifier--------->99.56%
Naive Bayes--------------------------->97.10%
TFIDF---------------------------------->99.61%
TFIDF: Bigrams------------------------>99.71%
TFIDF: Trigrams------------------------>99.71%

Contributing

Contributions to this project are welcome. If you have any suggestions for improvements or would like to report issues, please submit a pull request or open an issue on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Business Requirement .docx		Business Requirement .docx
LICENSE		LICENSE
README.md		README.md
email_classification.ipynb		email_classification.ipynb
email_classification.py		email_classification.py
emails.csv		emails.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Email Classification for Abusive Content Detection

Table of Contents

Introduction

Dataset

Approach

Dependencies

Results

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rajveersinghcse/Email-Classification

Folders and files

Latest commit

History

Repository files navigation

Email Classification for Abusive Content Detection

Table of Contents

Introduction

Dataset

Approach

Dependencies

Results

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages