Skip to content

alirezakfz/Disaster_Response_Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster Response Pipeline Project

image

Table of Contents

  1. Description
  2. Instructions
    1. Dependencies
    2. Installation
    3. Executing Program
    4. Additional Material
  3. Authors
  4. License
  5. Acknowledgement

Description

This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweet and messages from real-life disaster events. The project aim is to build a Natural Language Processing (NLP) model to categorize messages on a real time basis.

The Project is divided in the following Sections:

  1. Data Processing, ETL Pipeline to extract data from source, clean data and save them in a proper databse structure
  2. Machine Learning Pipeline to train a model able to classify text message in categories
  3. Web App to show model results in real time.

Instructions:

Dependencies

  • Python 3.5+
  • Machine Learning Libraries: NumPy, SciPy, Pandas, Sciki-Learn
  • Natural Language Process Libraries: NLTK
  • SQLlite Database Libraqries: SQLalchemy
  • Model Loading and Saving Library: Pickle, joblib
  • Web App and Data Visualization: Flask, Plotly

Installation

Clone the git repository:

git clone https://github.com/alirezakfz/Disaster_Response_Pipelines.git

Executing Program:

  1. Run the following commands in the project's root directory to set up your database and model.

    • To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
    • To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.gzip
  2. Run the following command in the app's directory to run your web app. python run.py

  3. Go to http://0.0.0.0:3001/

Additional Material

In the Notebook_Workspace folder you can find two jupyter notebook that will help you understand how the model works step by step:

  1. ETL Preparation Notebook: learn everything about the implemented ETL pipeline
  2. ML Pipeline Preparation Notebook: look at the Machine Learning Pipeline developed with NLTK and Scikit-Learn

You can use ML Pipeline Preparation Notebook to re-train the model or tune it through a dedicated Grid Search section.

Important Files

app/templates/*: templates/html files for web app

data/process_data.py: Extract Train Load (ETL) pipeline used for data cleaning, feature extraction, and storing data in a SQLite database

models/train_classifier.py: A machine learning pipeline that loads data, trains a model, and saves the trained model as a .pkl file for later use

run.py: This file can be used to launch the Flask web app used to classify disaster messages

Authors

Alireza Khaksari

License

License: GPL v2

Acknowledgements

  • Udacity for providing an amazing Data Science Nanodegree Program
  • Figure Eight for providing the relevant dataset to train the model

About

Second project in udacity Datascience Nanodegree program

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published