Welcome to the Data Warehouse and Analytics Project repository! 🚀
This project demonstrates how to build a complete data warehouse and analytics solution using SQL (MySQL Workbench), following the Medallion Architecture (Bronze, Silver, Gold) and applying ETL best practices.
This project follows the Medallion Architecture with three layers:
-
Bronze Layer: Stores raw data ingested from CSV source files (ERP and CRM systems) into the staging area as-is.
-
Silver Layer: Cleansed and standardized data. Includes removing duplicates, handling nulls, applying transformations, and building normalized tables.
-
Gold Layer: Business-ready, analytical models in a Star Schema with dimension and fact tables. This layer supports reporting and dashboards.
This project showcases:
- Data Architecture: Designing a data warehouse using the Medallion approach.
- ETL Pipelines: Loading and transforming raw CSV data into the data warehouse.
- Data Modeling: Creating fact and dimension tables for analytics.
- Analytics & Reporting: Querying business metrics such as sales, product performance, and customer insights.
This repository is perfect for demonstrating skills in:
- SQL Development (MySQL)
- Data Engineering & ETL
- Data Modeling (Star Schema)
- Analytics & Reporting
- Data Architecture Design
- MySQL Workbench: For database design, ETL scripts, and transformations.
- CSV Datasets: ERP and CRM sales data used as raw inputs.
- Draw.io (diagrams): For architecture and data modeling diagrams.
- Git & GitHub: Version control and project hosting.
-
Data Sources: Two systems:
- ERP System (Product, Sales details, Location data)
- CRM System (Customer information)
-
ETL:
- Load data into Bronze Layer (raw tables).
- Cleanse, standardize, and integrate data in Silver Layer.
- Create Star Schema models (Fact & Dimensions) in Gold Layer.
-
Data Quality:
- Remove duplicates, fill missing values, and unify reference data.
data-warehouse-project/
│
├── datasets/ # Raw datasets (ERP and CRM data)
│
├── docs/ # Documentation & diagrams
│ ├──DW - data_architecture.png # Data warehouse architecture diagram
│ ├── DW -data_integration.png
│ ├── ETL2.png # ETL data flow diagram
│ ├── connecting tables.png )
│
│
├── scripts/ # SQL scripts for each layer
│ ├── bronze/ # Raw load scripts
│ ├── silver/ # Cleansing & transformation scripts
│ ├── gold/ # Analytical model (dimension & fact tables)
│ └── gold_views.sql # Final views for gold layer
│
├── tests/ # Data quality & validation scripts
│
├── README.md # Project documentation (this file)
The Gold Layer consists of:
gold_dim_customers
– Customer dimension (demographics, location, etc.)gold_dim_products
– Product dimension (category, cost, product line)gold_fact_sales
– Fact table storing sales transactions
Using the Gold Layer, you can answer questions like:
- Who are our top customers by revenue?
- Which products and categories perform best?
- What are the monthly sales trends?
- Which countries or regions contribute the most sales?
This project is licensed under the MIT License. You are free to use, modify, and share it with attribution.