Toxic_Comment_Classification

My Trial to tackle the Kaggle Toxic Comment Classification Competition

I built a model that calculates the probability of a comment belonging to any of the mentioned classes, I used XGBoost after generating feature vectors using GLove and Google news Word2Vec

I got a total AUC of 0.82

Resources needed:

Download data from kaggle competition page here
Download GLove Word Vectors here, choose the 300d.480B model
Download GoogleNews Word Vectors here
To use the Keras model built in the file example_to_clarify.py, you need to download the 20 Newsgroup dataset

Note:

final_try.py

file is an implementation to XGBoost algorithm on the same data

To Do::

You definetly can make much more hyperparameter optimization epecially regarding the LSTM model. for example: You can try playing around with max_features, max_len, Droupout_rate,size of the Dense layer, etc...
You can try differnt feature engineering and normaization techniques for the text data
In general try playing around with parameters like batch_size, num_epochs and learning_rate
Try to use differnt optimization function, maybe Adagrad ,Adadelta or sgd

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
submissions		submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Toxic_Comment_Classifier_usingXGBoost.ipynb		Toxic_Comment_Classifier_usingXGBoost.ipynb
Words polarity based on LR weights.ipynb		Words polarity based on LR weights.ipynb
example_to_clarify.ipynb		example_to_clarify.ipynb
final_trial.py		final_trial.py
reqirements.txt		reqirements.txt
text_classification_LSTM_usingKeras.ipynb		text_classification_LSTM_usingKeras.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic_Comment_Classification

About

Releases

Packages

Languages

License

alaakh42/Toxic_Comment_Classification

Folders and files

Latest commit

History

Repository files navigation

Toxic_Comment_Classification

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages