GitHub - seroetr/Disaster-Tweets-with-NLP: Disaster Tweets Classifications by Machine Learning, which is a currently Kaggle Competition.

train.csv and test.csv files can be found via https://www.kaggle.com/competitions/nlp-getting-started/data.
Columns in `train.csv' dataset are:
- id
- text
- location
- keyword
- target
You will be predicting if tweet is a real disaster (1) or not (0).
Machine learning models such as LightGBM, XGBoost, RandomForest, and CatBoost Classifiers have been used to predict the disaster tweets.
RandomizedSearchCv is used to tune hyperparameters for models.
There is a commented out code in jupyter notebook in which you can combine other features with tf-idf matrix using hstack just in case of use if wanted.

Models	LGBMClassifier	CatBoostClassifier	XGBClassifier	RandomForestClassifier
Accuracy	0.7634	0.7706	0.7648	0.7873

RandomForestClassifier has demonstrated higher accuracy than rest of the models. Therefore, Test data is evaluated using RandomForestClassifier.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Disaster Tweets Classifications by Machine Learning.ipynb		Disaster Tweets Classifications by Machine Learning.ipynb
README.md		README.md

Provide feedback