Skip to content

joshc3453/NBA_ETL_Project

Repository files navigation

NBA ETL Project

Overview

This NBA ETL (Extract, Transform, Load) project automates the collection, processing, and storage of NBA data for analysis. The project showcases skills in integrating data sources, efficient processing, and scalable architecture, making it ideal for sports analytics and data engineering.

Key Objectives:

  • Automate real-time NBA data collection.
  • Enable data-driven decision-making with clean datasets.
  • Showcase advanced data engineering skills.

Features

  • Data Extraction: Collects NBA data, including team rosters, player profiles, draft history, and game box scores.
  • Data Transformation: Cleanses and standardizes data for consistency and insight generation.
  • Data Loading: Loads processed data into a database for easy access and analysis.

Project Structure

  • DAGs: Contains Airflow DAGs for scheduling ETL processes.
  • Scripts: Python scripts for extracting and processing datasets.
  • Docker: Docker configurations for reproducible environments.
  • Database: Schema definitions and SQL scripts for setting up the data warehouse.

Installation and Setup

Prerequisites:

  • Python 3.x
  • Docker
  • PostgreSQL (or another relational database)

Steps:

  1. Clone the repository:

    git clone https://github.com/joshc3453/NBA_ETL_Project.git
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up Docker containers:

    docker-compose up
    
  4. Initialize the database:

    • Run SQL scripts to create necessary tables and schemas.
    • Alternatively, use Docker to set up the database automatically.
  5. Run the ETL pipeline:

    • Trigger the Airflow DAGs to start the ETL process.
    • Monitor workflows via the Airflow web interface.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published