This is Hate speech detection model created using Count Vectorizer and XGBoost Classifier with an Accuracy upto 0.9471, train-test split of 70:30, which can be used to predict whether tweets are hate or non-hate.
- Dataset using Twitter data, isused to research hate-speech detection. The text is classified as: hate-speech, offensive language, and neither. Due to the nature of the study, it’s important to note that this dataset contains text that can be considered racist, sexist, homophobic, or generally offensive.
Link for dataset: https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset
Python
NLP
Porter Stemmer
Count Vectorizer
XGBoost Classifier
Random Forest Classifier
Decision Tree
Support Vector Machine
Logistic Regression
K Nearest Neighbours
Gaussian Naive Bayes Classifier
If you like this repo, please don't forget to give a ⭐.