This project is part of the Data Science Nanodegree Program by Udacity. It involves building a Natural Language Processing (NLP) model to categorize messages from real-life disaster events in real-time. The dataset consists of pre-labelled tweets and messages.
The project is divided into the following key sections:
- Processing data: Building an ETL pipeline to extract, clean, and store the data in a SQLite database.
- Building a machine learning pipeline: Training a classifier to categorize text messages into various categories.
- Running a web app: Displaying model results in real-time.
- Python 3.5+
- NumPy, SciPy, Pandas, Scikit-Learn
- NLTK
- SQLAlchemy
- Pickle
- Flask, Plotly
Clone the git repository:
- Root Directory
- data: Contains data files and data processing scripts.
- disaster_categories.csv: Categories data file.
- disaster_messages.csv: Messages data file.
- DisasterResponse.db
- process_data.py: ETL pipeline script to clean and process data.
- models: Contains machine learning model scripts.
- train_classifier.py: Script to train the classifier and save the model.
- classifier.pkl
- screenshots: Contains screenshots of the web app.
- intro.png: Introduction screenshot.
- sample_input.png: Sample input screenshot.
- sample_output.png: Sample output screenshot.
- main_page.png: Main page screenshot.
- process_data.png: Process data screenshot.
- train_classifier_data.png: Train classifier screenshot.
- app: Contains the web application files.
- run.py: Script to run the web app.
- templates: HTML templates for the web app.
- master.html: Main page template.
- go.html: Classification result template.
- README.md: Project README file.
- data: Contains data files and data processing scripts.
-
Run the ETL pipeline to clean data and store the processed data in the database:
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
Run the ML pipeline to load data from the database, train the classifier, and save the classifier as a pickle file:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
Run the web app:
python run.py
Access the web app at:
- Running on http://127.0.0.1:3001
- Running on http://192.168.29.170:3001
M S Mohan Kumar
This project is licensed under the MIT License.
- Udacity for providing an excellent Data Science Nanodegree Program.