In the wake of natural disasters, emergency-response organizations are inundated with tweets, text messages, and service‑desk tickets.
Manually triaging these requests costs precious minutes that could save lives.
This repo delivers an end‑to‑end pipeline—from raw CSV files to a production‑ready Flask app—that automatically classifies each incoming message into 36 humanitarian‑response categories (e.g., water, search and rescue, medical help), ensuring they reach the right teams fast.
The system combines standard NLP preprocessing (tokenization, lemmatization, TF‑IDF) with a grid‑searched multi‑output RandomForest/AdaBoost ensemble.
Performance is tracked with accuracy, precision, recall, and F1, and the trained model is exposed through a simple web UI so field operators can paste a message and instantly see which relief teams should respond.
Raw CSV ─► ETL Pipeline ─► SQLite DB ─► ML Pipeline ─► Pickled Model ─► Flask App
| Notebook | Purpose |
|---|---|
| ETL_Pipeline_Preparation.ipynb | Step‑by‑step exploration of the data‑cleaning workflow implemented in process_data.py. |
| ML_Pipeline_Preparation.ipynb | Interactive training, tuning, and evaluation of the multi‑output classifier (mirrors train_classifier.py). |
Tip : No Jupyter? GitHub renders notebooks automatically.
process_data.py:
- Reads message and category CSVs.
- Merges datasets, fixes inconsistencies, removes duplicates.
- One‑hot‑encodes the 36 category columns.
- Saves the clean result to
DisasterResponse.db(SQLite).
train_classifier.py:
- Splits data, builds a
Pipeline(TF‑IDF → classifier). - Runs
GridSearchCVover key hyper‑parameters (n_estimators,max_depth, etc.). - Prints classification report per label and saves the best model to
classifier.pkl.
Test‑set highlights:
- Accuracy : 94.4 %
- Weighted F1 : 0.86
- Solid recall on frequent labels (related, request); room for improvement on rare ones (refugees, shops).
-
Clone the repo
git clone https://github.com/Soriano-R/disaster-response-pipeline.git cd disaster-response-pipeline -
Install dependencies
pip install -r requirements.txt
-
Build the database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
Train the model
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
Run the web app
cd app python run.pyOpen http://localhost:3001 in your browser.
├── app/
│ ├── run.py
│ └── templates/
│ ├── master.html
│ └── go.html
│
├── data/
│ ├── disaster_messages.csv
│ ├── disaster_categories.csv
│ ├── process_data.py
│ └── DisasterResponse.db ← generated
│
├── models/
│ ├── train_classifier.py
│ └── classifier.pkl ← generated
│
├── resources/
│ ├── Disaster_Response_Application_Interface.png
│ ├── Disaster_Response_Classification_Result.png
│ ├── ETL_Pipeline_Preparation.ipynb
│ ├── ML_Pipeline_Preparation.ipynb
│ ├── ETL_Pipeline_Preparation.html
│ └── ML_Pipeline_Preparation.html
│
├── requirements.txt
└── README.md
This project is released under the MIT License. See LICENSE for details.

