🚀 Toxic Terminator: AI-Powered Toxicity Detection 🛡️

"Purifying Digital Spaces One Tweet at a Time" 🔍✨

📋 Table of Contents

📌 Project Overview
📊 Dataset Information
🧹 Data Preprocessing
⚙️ Feature Extraction
🤖 Model Training
📈 Model Evaluation
💻 Installation
🚦 Usage
🚀 Future Enhancements
🤝 Contributing
📜 License
🙏 Acknowledgements
🚀 Deployment Instructions

📌 Project Overview

Toxic Terminator is an ML-powered shield against online toxicity 🛡️. Our solution helps platforms:

✅ Automatically flag harmful content
✅ Improve community moderation
✅ Protect user mental health
✅ Maintain positive digital environments

📊 Dataset Information

🔗 Source

+ Kaggle Twitter Toxicity Dataset
- https://www.kaggle.com/datasets/ashwiniyer176/toxic-tweets-dataset

📦 Dataset Structure

Column	Type	Description	Example
`Unnamed: 0`	int64	Index column (removed)	0
`Toxicity`	int64	Binary label (0/1)	1 (Toxic)
`tweet`	object	Tweet text content	"@user This is offensive..."

📊 Class Distribution

print(df['Toxicity'].value_counts(normalize=True))

0    57.4% 🟢 (Non-Toxic)
1    42.6% 🔴 (Toxic)

🧹 Data Preprocessing

🔄 Cleaning Pipeline

🗑️ Remove index column
🧼 Handle missing values
✂️ Text normalization:
- Remove @mentions
- Strip URLs
- Eliminate special characters
- Convert to lowercase
- Remove stopwords

⚙️ Preprocessing Example

Input:
@user Check this link: http://example.com!!! #toxic

Output:
check link toxic

⚙️ Feature Extraction

TF-IDF Vectorization Settings

TfidfVectorizer(
    max_features=10000,       # 🎯 Top 10k terms
    ngram_range=(1, 2),       # 🔠 Uni+Bigrams
    stop_words=stop_words     # 🚫 Filter common words
)

Feature Matrix

Dimension	Training Shape	Test Shape
TF-IDF Matrix	(45396, 10000)	(11349, 10000)

🤖 Model Training

Model Architecture

graph LR
A[Raw Text] --> B(TF-IDF Features)
B --> C{MultinomialNB}
C --> D[Toxicity Prediction]

🏋️ Training Parameters

Algorithm: Multinomial Naive Bayes
Train Size: 45,396 samples (80%)
Test Size: 11,349 samples (20%)
Serialized As: toxicity_model.pkt

📈 Model Evaluation

📊 Performance Metrics

Metric	Score	Visual
Accuracy	95.2%	🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢
Precision	92.7%	🔵🔵🔵🔵🔵🔵🔵🔵🔵
Recall	91.3%	🟡🟡🟡🟡🟡🟡🟡🟡
F1 Score	92.0%	🟣🟣🟣🟣🟣🟣🟣🟣🟣
ROC AUC	0.9719	📈 (See curve below)

🔍 Confusion Matrix

	Predicted 🟢	Predicted 🔴
Actual 🟢	9,823	526
Actual 🔴	465	535

💻 Installation

Quick Start

# 1. Clone repository
git clone https://github.com/yxshee/toxic-terminator.git

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run training
python notebooks/model.ipynb

🐳 Docker Setup

FROM python:3.8-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

🚦 Usage

Real-Time Prediction

from toxic_detector import ToxicityClassifier

detector = ToxicityClassifier()
tweet = "@user You're completely worthless!"
result = detector.classify(tweet)

print(f"🔍 Result: {result['label']} (Confidence: {result['probability']:.2%})")

Output:
🔍 Result: Toxic (Confidence: 98.72%)

🚀 Future Enhancements

🤝 Contributing

First Time Contributing? 🎉 Here's How:

🌟 Star the Repository
🍴 Fork the Project
🌿 Create a Feature Branch
💻 Commit Changes
🔄 Push to Branch
🎯 Open Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

Organization	Contribution
Kaggle	Dataset Provision
Scikit-learn	ML Framework
Python	Core Language

🚀 Deployment Instructions

To deploy the project on Streamlit:

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure that the model files (models/tf_idf.pkt and models/toxicity_model.pkt) are in the project directory.
Launch the app with Streamlit:
```
streamlit run app.py
```
Open the URL provided by Streamlit (usually http://localhost:8501) in your browser.

Made with ❤️ by YXSHEE | 🛡️ Keep Conversations Clean!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Toxic Terminator: AI-Powered Toxicity Detection 🛡️

📋 Table of Contents

📌 Project Overview

📊 Dataset Information

🔗 Source

📦 Dataset Structure

📊 Class Distribution

🧹 Data Preprocessing

🔄 Cleaning Pipeline

⚙️ Preprocessing Example

⚙️ Feature Extraction

TF-IDF Vectorization Settings

Feature Matrix

🤖 Model Training

Model Architecture

🏋️ Training Parameters

📈 Model Evaluation

📊 Performance Metrics

🔍 Confusion Matrix

💻 Installation

Quick Start

🐳 Docker Setup

🚦 Usage

Real-Time Prediction

🚀 Future Enhancements

🤝 Contributing

📜 License

🙏 Acknowledgements

🚀 Deployment Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.devcontainer		.devcontainer
app		app
data		data
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

yxshee/toxic-terminator

Folders and files

Latest commit

History

Repository files navigation

🚀 Toxic Terminator: AI-Powered Toxicity Detection 🛡️

📋 Table of Contents

📌 Project Overview

📊 Dataset Information

🔗 Source

📦 Dataset Structure

📊 Class Distribution

🧹 Data Preprocessing

🔄 Cleaning Pipeline

⚙️ Preprocessing Example

⚙️ Feature Extraction

TF-IDF Vectorization Settings

Feature Matrix

🤖 Model Training

Model Architecture

🏋️ Training Parameters

📈 Model Evaluation

📊 Performance Metrics

🔍 Confusion Matrix

💻 Installation

Quick Start

🐳 Docker Setup

🚦 Usage

Real-Time Prediction

🚀 Future Enhancements

🤝 Contributing

📜 License

🙏 Acknowledgements

🚀 Deployment Instructions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages