Skip to content

The project aims to establish an efficient streaming data pipeline for retrieving real-time cryptocurrency market data from diverse market APIs, with current support specifically tailored for BinanceAPI. Leveraging a robust set of widely adopted tools and frameworks, including Kafka, Spark, Cassandra, and Airflow

Notifications You must be signed in to change notification settings

nhattan040102/Crypto_Market_Streaming_Data_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline. It covers each stage from data ingestion to processing and finally to storage, utilizing a robust tech stack that includes Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. Everything is containerized using Docker for ease of deployment and scalability.

Project architecture

Alt text

  • Data Source: We use Binance API to get the average price for 6 popular cryto currencies
  • Apache Airflow: Responsible for orchestrating the pipeline and storing fetched data in a PostgreSQL database.
  • Apache Kafka and Zookeeper: Used for streaming data from PostgreSQL to the processing engine.
  • Control Center and Schema Registry: Helps in monitoring and schema management of our Kafka streams.
  • Apache Spark: For data processing with its master and worker nodes.
  • Cassandra: Where the processed data will be stored.

Technologies

  • Apache Airflow
  • Python
  • Apache Kafka
  • Apache Zookeeper
  • Apache Spark
  • Cassandra
  • PostgreSQL
  • Docker

Usage

  1. Clone the github repo
    git clone https://github.com/nhattan040102/Crypto_Market_Streaming_Data_pipeline.git
  1. Navigate to the current project
    cd Crypto_Market_Streaming_Data_pipeline
  1. Build image from Dockerfile
    docker build -t my_airflow_img .
  1. Run docker-compose to start the service
    docker compose up -d --build

About

The project aims to establish an efficient streaming data pipeline for retrieving real-time cryptocurrency market data from diverse market APIs, with current support specifically tailored for BinanceAPI. Leveraging a robust set of widely adopted tools and frameworks, including Kafka, Spark, Cassandra, and Airflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published