Skip to content

πŸ“š An End-to-End Advanced SQL Project covering Data Warehousing, ETL Pipeline (Bronze β†’ Silver β†’ Gold), Star Schema Modeling, EDA, and Advanced SQL Analytics. Built using PostgreSQL, this project simulates a real-world Data Engineering + Data Analytics workflow using raw ERP & CRM data to generate production-ready customer and product insights.

License

Notifications You must be signed in to change notification settings

Harsh-Belekar/Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Advance SQL Project

Data Warehousing + EDA + Advanced Data Analytics

A complete, end-to-end SQL-driven Data Warehouse & Analytics Project, covering:

  • Data Modeling & ETL Pipeline (Bronze β†’ Silver β†’ Gold)
  • Data Cleaning, Standardization & Transformation
  • EDA + Advanced SQL Analytics
  • Customer & Product Reporting
  • Business Insights & KPI Generation

This project simulates a real industry-grade data engineering + business analytics workflow, starting from raw ERP & CRM datasets and delivering production-ready insights using SQL.

Tool Tool Process Process Feature Feature Domain Type Status


πŸ” Summary

This project builds a complete SQL Data Warehouse (Bronze β†’ Silver β†’ Gold) and performs advanced EDA + analytics to generate customer and product insights for a retail business.


πŸ“š Table of Contents


🧩 Business Problem

πŸͺ Retail Company Issue:
Data scattered across ERP & CRM β†’ no unified reporting β†’ poor insights β†’ inconsistent decisions.

You built a scalable Data Warehouse + Analytics System to solve:

  • Scattered data
  • Inconsistent formats
  • No single source of truth
  • No customer or product performance tracking
  • No advanced reporting

The result is a clean, scalable, analytics-ready warehouse.


πŸš€ Project Overview

This project demonstrates how to build and analyze a data warehouse environment using SQL.
It includes:

βœ… 1. Data Warehouse Development

  • Data ingestion (Bronze layer)
  • Data cleaning & harmonization (Silver layer)
  • Business modeling & fact/dimension tables (Gold layer)
  • Stored procedures for ETL
  • Data quality testing
  • Documentation (data model, flow diagrams, architecture)

βœ… 2. EDA + Advanced SQL Data Analysis

Using advanced SQL analytics techniques:

  • Ranking
  • Segmentation
  • Cumulative metrics
  • Change-over-time analysis
  • Performance metrics
  • Customer & Product reports
  • KPI calculations
  • Exploratory Data Analysis insights

βœ… 3. Advanced Reporting

SQL-based dashboards & reports:

  • Customer Analytics Report
  • Product Performance Report

πŸ—οΈ Project Architecture & Diagrams

πŸ“Œ Overall Architecture

Architecture

πŸ•Έ Mesh Architecture Layers

Mesh Architecture

πŸ”— Data Integration Workflow

Data Integration

πŸ”„ Data Flow Diagram

Data Flow

🧩 Star Schema Data Model

Data Model


πŸ—οΈ Tech Stack

Layer Tools Used
Data Warehouse PostgreSQL / SQL
Data Modeling Star Schema, Dimensional Modeling
ETL Pipeline SQL Stored Procedures
EDA & Analytics SQL (Window functions, Aggregations, CTEs)
Documentation Markdown, PNG diagrams

🧠 Key Skills Demonstrated

  • Advanced SQL (Window Functions, CTEs, Ranking Functions)
  • Data Warehousing (Bronze–Silver–Gold architecture)
  • ETL Pipeline Development
  • Fact & Dimension Modeling
  • Data Cleaning & Standardization
  • Analytical Reporting & KPI Design
  • Data Architecture Documentation

πŸ—‚οΈ Project Folder Structure

Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics/
β”‚
│── πŸ“„ README.md                               ← Main Project Documentation
│── πŸ“‘ LICENSE                                  ← License for Project
β”‚
β”œβ”€β”€ 🧱 Data Warehouse/
β”‚   β”‚
β”‚   β”œβ”€β”€ scripts/                                ← ETL Scripts for Bronze β†’ Silver β†’ Gold
β”‚   β”‚   β”œβ”€β”€ bronze/
β”‚   β”‚   β”‚   β”œβ”€β”€ ddl_bronze.sql                  ← Create Bronze Layer Tables
β”‚   β”‚   β”‚   └── proc_load_bronze.sql            ← Load Raw ERP + CRM Data into Bronze
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ silver/
β”‚   β”‚   β”‚   β”œβ”€β”€ ddl_silver.sql                  ← Create Cleaned Silver Layer Tables
β”‚   β”‚   β”‚   └── proc_load_silver.sql            ← Transform Bronze β†’ Silver
β”‚   β”‚   β”‚
β”‚   β”‚   └── gold/
β”‚   β”‚       └── ddl_gold.sql                    ← Create Final Fact & Dimensions (DW)
β”‚   β”‚
β”‚   β”œβ”€β”€ tests/                                  ← Data Quality & Validation Scripts
β”‚   β”‚   β”œβ”€β”€ quality_checks_gold.sql             ← Gold Layer Validation Tests
β”‚   β”‚   └── quality_checks_silver.sql           ← Silver Layer Validation Tests
β”‚   β”‚
β”‚   β”œβ”€β”€ docs/                                   ← Architecture, Models & Pipeline Diagrams
β”‚   β”‚   β”œβ”€β”€ Analysing Source System.png         ← Source System Exploration
β”‚   β”‚   β”œβ”€β”€ data_architecture.png               ← Full Data Architecture Overview
β”‚   β”‚   β”œβ”€β”€ data_catalog.md                     ← Documentation for All Tables & Columns
β”‚   β”‚   β”œβ”€β”€ data_flow.png                       ← End-to-End Data Flow Diagram
β”‚   β”‚   β”œβ”€β”€ data_integration.png                ← ERP + CRM Integration Overview
β”‚   β”‚   β”œβ”€β”€ data_layers.pdf                     ← Bronze, Silver, Gold Explanation
β”‚   β”‚   β”œβ”€β”€ data_model.png                      ← Data Warehouse Star Schema
β”‚   β”‚   β”œβ”€β”€ ETL.png                             ← ETL Pipeline Overview
β”‚   β”‚   β”œβ”€β”€ Mesh_Architecture_Layers.png        ← Data Mesh Architecture Layers
β”‚   β”‚   └── naming_conventions.md               ← Standards for Naming Tables & Columns
β”‚   β”‚
β”‚   └── row_dataset/                            ← Raw ERP & CRM Source System Data
β”‚       β”œβ”€β”€ source_erp/
β”‚       β”‚   β”œβ”€β”€ CUST_AZ12.csv                   ← ERP Customer Data
β”‚       β”‚   β”œβ”€β”€ LOC_A101.csv                    ← ERP Location Data
β”‚       β”‚   └── PX_CAT_G1V2.csv                 ← ERP Product/Category Data
β”‚       β”‚
β”‚       └── source_crm/
β”‚           β”œβ”€β”€ cust_info.csv                   ← CRM Customer Info
β”‚           β”œβ”€β”€ prd_info.csv                    ← CRM Product Info
β”‚           └── sales_details.csv               ← CRM Sales Transactions
β”‚
└── πŸ“Š EDA + Advanced Data Analysis/
    β”‚
    β”œβ”€β”€ Data Analysis .png                       ← EDA Output Summary Diagram
    β”‚
    β”œβ”€β”€ scripts/                                 ← All SQL Scripts for Analysis
    β”‚   β”œβ”€β”€ 00_init_database.sql                 ← Initialize Analysis Schema
    β”‚   β”œβ”€β”€ 01_database_exploration.sql          ← Explore Tables & Metadata
    β”‚   β”œβ”€β”€ 02_dimensions_exploration.sql        ← Explore Dimension Tables
    β”‚   β”œβ”€β”€ 03_date_range_exploration.sql        ← Explore Date Ranges
    β”‚   β”œβ”€β”€ 04_measures_exploration.sql          ← Explore Key Business Metrics
    β”‚   β”œβ”€β”€ 05_magnitude_analysis.sql            ← Magnitude-Level Analysis
    β”‚   β”œβ”€β”€ 06_ranking_analysis.sql              ← Ranking & Ordering Analysis
    β”‚   β”œβ”€β”€ 07_change_over_time_analysis.sql     ← Trend + Time-Based Analysis
    β”‚   β”œβ”€β”€ 08_cumulative_analysis.sql           ← Running Totals & Rolling Sums
    β”‚   β”œβ”€β”€ 09_performance_analysis.sql          ← Performance & KPI Insights
    β”‚   β”œβ”€β”€ 10_part_to_whole_analysis.sql        ← Proportional Contribution Analysis
    β”‚   β”œβ”€β”€ 11_data_segmentation.sql             ← Customer & Product Segmentation
    β”‚   β”œβ”€β”€ 12_report_customers.sql              ← Generate Customer Report (Gold Layer)
    β”‚   └── 12_report_products.sql               ← Generate Product Report (Gold Layer)
    β”‚
    └── dataset/                                 ← Output Reports from Gold Layer
        β”œβ”€β”€ gold.dim_customers.csv               ← Cleaned Customer Dimension
        β”œβ”€β”€ gold.dim_products.csv                ← Cleaned Product Dimension
        β”œβ”€β”€ gold.fact_sales.csv                  ← Cleaned Fact Sales Table
        β”œβ”€β”€ gold.report_customers.csv            ← Final Customer Analytics Report
        └── gold.report_products.csv             ← Final Product Analytics Report

πŸ› οΈ Key Features

🟩 Data Warehouse (Bronze β†’ Silver β†’ Gold)

  • Raw data ingestion
  • Data profiling
  • Standardization & validation
  • Star schema modeling
  • Automated ETL procedures
  • Data quality tests
  • Complete documentation

🟦 EDA + Advanced SQL Analytics

  • Dimension exploration
  • Measures analysis
  • Ranking, segmentation
  • Time-series & cumulative trends
  • KPI calculations (Recency, AOV, Monthly Spend, etc.)
  • Customer & Product performance reports

πŸ“Š Reports Generated

πŸ“˜ Customer Report

Includes:

  • Customer segments (VIP, Regular, New)
  • Recency
  • Lifespan
  • Total orders, products, quantity, sales
  • Avg order value
  • Avg monthly spend

πŸ“™ Product Report

Includes:

  • Product segments (High Performer / Mid Range / Low Performer)
  • Recency
  • Lifespan
  • Unique customers
  • Avg selling price
  • Monthly revenue

🧬 Data Architecture Flow

RAW (ERP + CRM)
      ↓
BRONZE β†’ Clean storage
      ↓
SILVER β†’ Harmonized & Enriched Data
      ↓
GOLD β†’ Final Fact + Dimension Tables
      ↓
ANALYTICS β†’ Reports, KPIs, Dashboards

πŸ“ Important Files

πŸ”Ή Data Warehouse Scripts

bronze/
    ddl_bronze.sql
    proc_load_bronze.sql
silver/
    ddl_silver.sql
    proc_load_silver.sql
gold/
    ddl_gold.sql

πŸ”Ή EDA SQL Scripts

00_init_database.sql  
01_database_exploration.sql  
02_dimensions_exploration.sql  
...  
12_report_customers.sql  
12_report_products.sql 

πŸ“ˆ Key Outcomes

  • Built a fully functional SQL Data Warehouse
  • Designed & implemented ETL pipelines
  • Performed advanced SQL analytics
  • Designed star schema (Fact + Dimensions)
  • Developed customer & product analytical reports
  • Demonstrated real-world Data Engineer + Analyst workflow

πŸŽ“ What I Will Learn

  • How to design a Data Warehouse from scratch
  • How to build ETL pipelines (Bronze β†’ Silver β†’ Gold)
  • How to clean & transform raw data
  • How to write advanced SQL analytical scripts
  • How to generate customer & product insights using SQL
  • How to document a real-world data engineering project

πŸ“₯ Clone This Repository

git clone https://github.com/Harsh-Belekar/Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics.git
cd Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics

🏁 How to Run This Project

1. Initialize the Database

00_init_database.sql

2. Load Bronze Layer

proc_load_bronze.sql

3. Load Silver Layer

proc_load_silver.sql

4. Create Gold Layer

ddl_gold.sql

5. Run Analysis Scripts (00 β†’ 12)

EDA + Advanced Data Analysis/scripts/

⭐ Project Highlights (for Resume / Portfolio)

  • Real-world Data Engineering + Analytics workflow
  • End-to-end SQL project (Data Warehousing + EDA + Advanced Data Analysis)
  • Realistic ETL + Data Modeling experience
  • Clean architecture & documentation
  • Retail analytics insights
  • Strong Analytics + Business Insights generation
  • Showcases SQL expertise at scale

πŸ“‘ License

MIT License β€” see LICENSE file.


⚠️ Dataset Disclaimer

All datasets used are dummy, synthetic, or public, intended only for learning and portfolio demonstration.
No real customer or company data is used.


πŸ§‘β€πŸ’» Author

πŸ‘€ Harsh Belekar
πŸ“ Data Analyst | Python Developer | SQL | Power BI | Excel | Data Visualization
πŸ“¬ LinkedIn | πŸ”—GitHub

πŸ“§ harshbelekar74@gmail.com


⭐ If you found this project helpful, feel free to star the repo and connect with me for collaboration!

About

πŸ“š An End-to-End Advanced SQL Project covering Data Warehousing, ETL Pipeline (Bronze β†’ Silver β†’ Gold), Star Schema Modeling, EDA, and Advanced SQL Analytics. Built using PostgreSQL, this project simulates a real-world Data Engineering + Data Analytics workflow using raw ERP & CRM data to generate production-ready customer and product insights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published