Skip to content

Latest commit

 

History

History
76 lines (47 loc) · 2.63 KB

README.md

File metadata and controls

76 lines (47 loc) · 2.63 KB

Data Pipeline

made-with-python

Quick Links

Table of Contents

Introduction: Three experiments with Big data

In this project, we will develop a data pipeline to ingest, process, store it so you can access it through different means.

Data Explanation

SEVIR: The Storm EVent ImagRy (SEVIR) dataset is a collection of temporally and spatially aligned images containing weather events captured by satellite and radar.

The dataset contains thousands of samples of 4 hour events captured by one or more of these weather sensors. This loop shows one such event:

sevir_sample

Storm Events Database: The database currently contains data from January 1950 to November 2020, as entered by NOAA's National Weather Service (NWS). Data are available on the Registry of Open Data on AWS. Dataset and the Website

More to read:

Setup

  • Python 3.7+
  • Python IDE
  • Code editor
  • Amazon S3 Buckets
  • Amazon Glue
  • Amazon Athena
  • Amazon Quicksight
  • Google storage buckets
  • Google Dataflow
  • Google Bigquery
  • Data studio
  • Snowflake
  • Sql-alchemy
  • Apache Superset

Clone

Clone this repo to your local machine using https://github.com/goyal07nidhi/Data-Pipeline.git

Folder Contents

Refer README.md inside the respective directories for setup instructions.

  • ✅ AWS S3: AWS
  • ✅ GCP - Dataflow, Datalab: GCP
  • ✅ SNOWFLAKE: SNOWFLAKE

Team Members:

  1. Nidhi Goyal
  2. Kanika Damodarsingh Negi
  3. Rishvita Reddy Bhumireddy