A complete, end-to-end SQL-driven Data Warehouse & Analytics Project, covering:
- Data Modeling & ETL Pipeline (Bronze β Silver β Gold)
- Data Cleaning, Standardization & Transformation
- EDA + Advanced SQL Analytics
- Customer & Product Reporting
- Business Insights & KPI Generation
This project simulates a real industry-grade data engineering + business analytics workflow, starting from raw ERP & CRM datasets and delivering production-ready insights using SQL.
This project builds a complete SQL Data Warehouse (Bronze β Silver β Gold) and performs advanced EDA + analytics to generate customer and product insights for a retail business.
- π§© Business Problem
- π Project Overview
- ποΈ Project Architecture & Diagrams
- ποΈ Tech Stack
- π§ Key Skills Demonstrated
- ποΈ Project Folder Structure
- π οΈ Key Features
- π Reports Generated
- 𧬠Data Architecture Flow
- π Important Files
- π Key Outcomes
- π What I Will Learn
- π₯ Clone This Repository
- π How to Run This Project
- β Project Highlights (for Resume / Portfolio)
- π License
β οΈ Dataset Disclaimer- π§βπ» Author
πͺ Retail Company Issue:
Data scattered across ERP & CRM β no unified reporting β poor insights β inconsistent decisions.
You built a scalable Data Warehouse + Analytics System to solve:
- Scattered data
- Inconsistent formats
- No single source of truth
- No customer or product performance tracking
- No advanced reporting
The result is a clean, scalable, analytics-ready warehouse.
This project demonstrates how to build and analyze a data warehouse environment using SQL.
It includes:
- Data ingestion (Bronze layer)
- Data cleaning & harmonization (Silver layer)
- Business modeling & fact/dimension tables (Gold layer)
- Stored procedures for ETL
- Data quality testing
- Documentation (data model, flow diagrams, architecture)
Using advanced SQL analytics techniques:
- Ranking
- Segmentation
- Cumulative metrics
- Change-over-time analysis
- Performance metrics
- Customer & Product reports
- KPI calculations
- Exploratory Data Analysis insights
SQL-based dashboards & reports:
- Customer Analytics Report
- Product Performance Report
| Layer | Tools Used |
|---|---|
| Data Warehouse | PostgreSQL / SQL |
| Data Modeling | Star Schema, Dimensional Modeling |
| ETL Pipeline | SQL Stored Procedures |
| EDA & Analytics | SQL (Window functions, Aggregations, CTEs) |
| Documentation | Markdown, PNG diagrams |
- Advanced SQL (Window Functions, CTEs, Ranking Functions)
- Data Warehousing (BronzeβSilverβGold architecture)
- ETL Pipeline Development
- Fact & Dimension Modeling
- Data Cleaning & Standardization
- Analytical Reporting & KPI Design
- Data Architecture Documentation
Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics/
β
βββ π README.md β Main Project Documentation
βββ π LICENSE β License for Project
β
βββ π§± Data Warehouse/
β β
β βββ scripts/ β ETL Scripts for Bronze β Silver β Gold
β β βββ bronze/
β β β βββ ddl_bronze.sql β Create Bronze Layer Tables
β β β βββ proc_load_bronze.sql β Load Raw ERP + CRM Data into Bronze
β β β
β β βββ silver/
β β β βββ ddl_silver.sql β Create Cleaned Silver Layer Tables
β β β βββ proc_load_silver.sql β Transform Bronze β Silver
β β β
β β βββ gold/
β β βββ ddl_gold.sql β Create Final Fact & Dimensions (DW)
β β
β βββ tests/ β Data Quality & Validation Scripts
β β βββ quality_checks_gold.sql β Gold Layer Validation Tests
β β βββ quality_checks_silver.sql β Silver Layer Validation Tests
β β
β βββ docs/ β Architecture, Models & Pipeline Diagrams
β β βββ Analysing Source System.png β Source System Exploration
β β βββ data_architecture.png β Full Data Architecture Overview
β β βββ data_catalog.md β Documentation for All Tables & Columns
β β βββ data_flow.png β End-to-End Data Flow Diagram
β β βββ data_integration.png β ERP + CRM Integration Overview
β β βββ data_layers.pdf β Bronze, Silver, Gold Explanation
β β βββ data_model.png β Data Warehouse Star Schema
β β βββ ETL.png β ETL Pipeline Overview
β β βββ Mesh_Architecture_Layers.png β Data Mesh Architecture Layers
β β βββ naming_conventions.md β Standards for Naming Tables & Columns
β β
β βββ row_dataset/ β Raw ERP & CRM Source System Data
β βββ source_erp/
β β βββ CUST_AZ12.csv β ERP Customer Data
β β βββ LOC_A101.csv β ERP Location Data
β β βββ PX_CAT_G1V2.csv β ERP Product/Category Data
β β
β βββ source_crm/
β βββ cust_info.csv β CRM Customer Info
β βββ prd_info.csv β CRM Product Info
β βββ sales_details.csv β CRM Sales Transactions
β
βββ π EDA + Advanced Data Analysis/
β
βββ Data Analysis .png β EDA Output Summary Diagram
β
βββ scripts/ β All SQL Scripts for Analysis
β βββ 00_init_database.sql β Initialize Analysis Schema
β βββ 01_database_exploration.sql β Explore Tables & Metadata
β βββ 02_dimensions_exploration.sql β Explore Dimension Tables
β βββ 03_date_range_exploration.sql β Explore Date Ranges
β βββ 04_measures_exploration.sql β Explore Key Business Metrics
β βββ 05_magnitude_analysis.sql β Magnitude-Level Analysis
β βββ 06_ranking_analysis.sql β Ranking & Ordering Analysis
β βββ 07_change_over_time_analysis.sql β Trend + Time-Based Analysis
β βββ 08_cumulative_analysis.sql β Running Totals & Rolling Sums
β βββ 09_performance_analysis.sql β Performance & KPI Insights
β βββ 10_part_to_whole_analysis.sql β Proportional Contribution Analysis
β βββ 11_data_segmentation.sql β Customer & Product Segmentation
β βββ 12_report_customers.sql β Generate Customer Report (Gold Layer)
β βββ 12_report_products.sql β Generate Product Report (Gold Layer)
β
βββ dataset/ β Output Reports from Gold Layer
βββ gold.dim_customers.csv β Cleaned Customer Dimension
βββ gold.dim_products.csv β Cleaned Product Dimension
βββ gold.fact_sales.csv β Cleaned Fact Sales Table
βββ gold.report_customers.csv β Final Customer Analytics Report
βββ gold.report_products.csv β Final Product Analytics Report
- Raw data ingestion
- Data profiling
- Standardization & validation
- Star schema modeling
- Automated ETL procedures
- Data quality tests
- Complete documentation
- Dimension exploration
- Measures analysis
- Ranking, segmentation
- Time-series & cumulative trends
- KPI calculations (Recency, AOV, Monthly Spend, etc.)
- Customer & Product performance reports
Includes:
- Customer segments (VIP, Regular, New)
- Recency
- Lifespan
- Total orders, products, quantity, sales
- Avg order value
- Avg monthly spend
Includes:
- Product segments (High Performer / Mid Range / Low Performer)
- Recency
- Lifespan
- Unique customers
- Avg selling price
- Monthly revenue
RAW (ERP + CRM)
β
BRONZE β Clean storage
β
SILVER β Harmonized & Enriched Data
β
GOLD β Final Fact + Dimension Tables
β
ANALYTICS β Reports, KPIs, Dashboards
bronze/
ddl_bronze.sql
proc_load_bronze.sql
silver/
ddl_silver.sql
proc_load_silver.sql
gold/
ddl_gold.sql
00_init_database.sql
01_database_exploration.sql
02_dimensions_exploration.sql
...
12_report_customers.sql
12_report_products.sql
- Built a fully functional SQL Data Warehouse
- Designed & implemented ETL pipelines
- Performed advanced SQL analytics
- Designed star schema (Fact + Dimensions)
- Developed customer & product analytical reports
- Demonstrated real-world Data Engineer + Analyst workflow
- How to design a Data Warehouse from scratch
- How to build ETL pipelines (Bronze β Silver β Gold)
- How to clean & transform raw data
- How to write advanced SQL analytical scripts
- How to generate customer & product insights using SQL
- How to document a real-world data engineering project
git clone https://github.com/Harsh-Belekar/Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics.git
cd Advance-SQL-Project-Data-Warehousing-EDA-Advanced-Data-Analytics
00_init_database.sql
proc_load_bronze.sql
proc_load_silver.sql
ddl_gold.sql
EDA + Advanced Data Analysis/scripts/
- Real-world Data Engineering + Analytics workflow
- End-to-end SQL project (Data Warehousing + EDA + Advanced Data Analysis)
- Realistic ETL + Data Modeling experience
- Clean architecture & documentation
- Retail analytics insights
- Strong Analytics + Business Insights generation
- Showcases SQL expertise at scale
MIT License β see LICENSE file.
All datasets used are dummy, synthetic, or public, intended only for learning and portfolio demonstration.
No real customer or company data is used.
π€ Harsh Belekar
π Data Analyst | Python Developer | SQL | Power BI | Excel | Data Visualization
π¬ LinkedIn | πGitHub
β If you found this project helpful, feel free to star the repo and connect with me for collaboration!




