Build a data warehouse from scratch, including full load, daily incremental load, design schema, SCD Type 1 and 2.
-
Updated
Feb 1, 2023 - Python
Build a data warehouse from scratch, including full load, daily incremental load, design schema, SCD Type 1 and 2.
This project simulates a real-world enterprise data migration and modernization strategy. It extracts transactional data from a simulated "On-Premise" environment (hosted on AWS EC2), performs heavy distributed processing using a Hadoop/Spark cluster, and ultimately serves the data via a Cloud-Native, serverless architecture to optimize costs .
A Data Warehousing project for retail sales using dimension modelling best practices with SCD type 2 on AWS Redshift. Utilizing AWS Lambda, Glue Workflows and Python Shell jobs to create and automate an ELT pipeline where batch data coming into S3 is loaded onto Redshift and necessary transformations are performed to meet requirements.
End-to-end Ride Sharing Data Engineering project using PySpark, Delta Lake, Databricks Structured Streaming, dbt, SCD Type 2 Snapshots, and Dimensional Modeling.
End-to-end data lakehouse on Azure Databricks — Medallion Architecture, Star Schema, SCD Type 2, CI/CD, Data Quality Framework
End-to-end data engineering project using AWS S3, Snowflake, and dbt to implement Medallion Architecture with SCD Type 1 & Type 2 logic on Walmart sales data, followed by analytical visualizations using Seaborn and Plotly.
Enterprise-grade Microsoft Fabric Lakehouse project implementing Medallion architecture (Bronze, Silver, Gold) for financial transactio
A focused dbt Core project demonstrating proficiency in the dbt workflow: from raw source transformation to a final analytics-ready table. Features implementation of snapshots, tests, macros, and models..
End-to-end Azure stock market analytics pipeline - ADF ForEach fetches daily OHLCV via Alpha Vantage API for MSFT/AAPL/GOOGL, implements SCD Type 2 with Delta MERGE, Delta time travel, incremental loading and PySpark broadcast variables.
End-to-end AWS data engineering pipeline using SQL Server, AWS Glue, S3 Delta Lake, Streamlit, CloudWatch observability, and GitHub Actions CI/CD.
Employee wellbeing data warehouse — Snowflake star schema · SCD Type 2 · 1.4M+ rows · 16 tables
Databricks Lakehouse | Medallion Architecture | SCD Type 2 | Auto Loader | Unity Catalog | PySpark | Workflows | Genie AI/BI
E-Commerce Data Warehouse with SCD & PySpark
Enterprise security data warehouse with 3-layer architecture processing 10M+ records daily from 15+ security services. Snowflake | Python | Streamlit | 98% Automation | 46K Annual Savings
E-commerce analytics pipeline using snowflake and dbt - staging, intermediate, marts, scd2 snapshots
Databricks Lakehouse | CRM + ERP Integration | Medallion Architecture | SCD Type 2 | Star Schema | Power BI Direct Lake
End-to-End Retail BI Solution: Automated ETL pipeline transforming raw transaction data into a Star Schema Data Warehouse using SQL Server, Python, and SCD Type 2 logic.
Production-style Slowly Changing Dimension (SCD Type 2) pipeline built with Snowflake, dbt, and AWS S3. Demonstrates secure S3 ingestion, layered bronze/silver/gold modeling, dbt snapshots for historical tracking, and analytics-ready views identifying active vs historical records.
A production-grade Modern Data Stack (MDS) implementation featuring automated ELT, SCD Type 2 history tracking, and CI/CD quality guardrails using Dagster, dbt Core, DuckDB, and Soda.
Add a description, image, and links to the scd-type-2 topic page so that developers can more easily learn about it.
To associate your repository with the scd-type-2 topic, visit your repo's landing page and select "manage topics."