Toxic Comment Classification

Overview

This project explores the efficacy of various machine learning models for toxic comment classification, comparing traditional algorithms with state-of-the-art transformer-based models and an attention-based BiLSTM-CNN architecture. The goal is to identify the most effective model for detecting and categorizing online toxicity into multiple classes.

Features

Multi-Class Classification: Classifies comments into categories such as toxic, severe toxic, obscene, threat, insult, and identity hate.

Model Comparisons: Evaluates traditional models alongside advanced transformer-based models

Traditional Models: Logistic Regression Linear SVC Multinomial Naive Bayes

Advanced Models: BERT RoBERTa DistilBERT

Performance Metrics: Accuracy, precision, recall, and F1-score.

Preprocessing: Implements text cleaning, tokenization, and TF-IDF vectorization.

Our Approach Utilizes attention layers in the BiLSTM-CNN to focus on critical parts of the input text.

Results

The Attention-Based BiLSTM-CNN model achieved the highest accuracy of 98.6%, outperforming both traditional and transformer-based models across all metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
TCC		TCC
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification

Overview

Features

Results

About

Uh oh!

Releases

Packages

Languages

zxnb01/ToxicCommentClassification

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification

Overview

Features

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages