"Purifying Digital Spaces One Tweet at a Time" πβ¨
- π Project Overview
- π Dataset Information
- π§Ή Data Preprocessing
- βοΈ Feature Extraction
- π€ Model Training
- π Model Evaluation
- π» Installation
- π¦ Usage
- π Future Enhancements
- π€ Contributing
- π License
- π Acknowledgements
- π Deployment Instructions
Toxic Terminator is an ML-powered shield against online toxicity π‘οΈ. Our solution helps platforms:
β
Automatically flag harmful content
β
Improve community moderation
β
Protect user mental health
β
Maintain positive digital environments
+ Kaggle Twitter Toxicity Dataset
- https://www.kaggle.com/datasets/ashwiniyer176/toxic-tweets-dataset
Column | Type | Description | Example |
---|---|---|---|
Unnamed: 0 |
int64 | Index column (removed) | 0 |
Toxicity |
int64 | Binary label (0/1) | 1 (Toxic) |
tweet |
object | Tweet text content | "@user This is offensive..." |
print(df['Toxicity'].value_counts(normalize=True))
0 57.4% π’ (Non-Toxic)
1 42.6% π΄ (Toxic)
- ποΈ Remove index column
- π§Ό Handle missing values
- βοΈ Text normalization:
- Remove @mentions
- Strip URLs
- Eliminate special characters
- Convert to lowercase
- Remove stopwords
Input:
@user Check this link: http://example.com!!! #toxic
Output:
check link toxic
TfidfVectorizer(
max_features=10000, # π― Top 10k terms
ngram_range=(1, 2), # π Uni+Bigrams
stop_words=stop_words # π« Filter common words
)
Dimension | Training Shape | Test Shape |
---|---|---|
TF-IDF Matrix | (45396, 10000) | (11349, 10000) |
graph LR
A[Raw Text] --> B(TF-IDF Features)
B --> C{MultinomialNB}
C --> D[Toxicity Prediction]
- Algorithm: Multinomial Naive Bayes
- Train Size: 45,396 samples (80%)
- Test Size: 11,349 samples (20%)
- Serialized As:
toxicity_model.pkt
Metric | Score | Visual |
---|---|---|
Accuracy | 95.2% | π’π’π’π’π’π’π’π’π’π’ |
Precision | 92.7% | π΅π΅π΅π΅π΅π΅π΅π΅π΅ |
Recall | 91.3% | π‘π‘π‘π‘π‘π‘π‘π‘ |
F1 Score | 92.0% | π£π£π£π£π£π£π£π£π£ |
ROC AUC | 0.9719 | π (See curve below) |
Predicted π’ | Predicted π΄ | |
---|---|---|
Actual π’ | 9,823 | 526 |
Actual π΄ | 465 | 535 |
# 1. Clone repository
git clone https://github.com/yxshee/toxic-terminator.git
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run training
python notebooks/model.ipynb
FROM python:3.8-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
from toxic_detector import ToxicityClassifier
detector = ToxicityClassifier()
tweet = "@user You're completely worthless!"
result = detector.classify(tweet)
print(f"π Result: {result['label']} (Confidence: {result['probability']:.2%})")
Output:
π Result: Toxic (Confidence: 98.72%)
- π Multilingual Support
- π§ BERT/Transformer Integration
- β‘ Real-Time API
- π± Mobile Integration
- π Active Learning Pipeline
First Time Contributing? π Here's How:
- π Star the Repository
- π΄ Fork the Project
- πΏ Create a Feature Branch
- π» Commit Changes
- π Push to Branch
- π― Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Organization | Contribution |
---|---|
Dataset Provision | |
![]() |
ML Framework |
Core Language |
To deploy the project on Streamlit:
- Install the required dependencies:
pip install -r requirements.txt
- Ensure that the model files (
models/tf_idf.pkt
andmodels/toxicity_model.pkt
) are in the project directory. - Launch the app with Streamlit:
streamlit run app.py
- Open the URL provided by Streamlit (usually http://localhost:8501) in your browser.