A comprehensive data warehouse project implementing the medallion architecture pattern with Bronze, Silver, and Gold layers using SQL Server.
This project implements a modern enterprise data warehouse architecture with three distinct layers:
- ERP System: CSV file-based data source
- CRM System: CSV file-based data source
- Object Type: Tables
- Load: Batch Processing, Full Load, Truncate & Insert
- Transformations: None (Raw Data, No Transformations)
- Data Model: None (As-Is)
- Object Type: Tables
- Load: Batch Processing, Full Load, Truncate & Insert
- Transformations:
- Data Cleansing
- Data Standardization
- Data Normalization
- Derived Columns
- Data Enrichment
- Data Model: None (As-Is)
- Object Type: Views
- Load: No Load (Logical Layer Only)
- Transformations:
- Data Integration
- Aggregations
- Business Logic
- Data Model:
- Star Schema
- Flat Tables
- Aggregated Tables
- Ad-Hoc Analysis: Interactive data exploration
- BI & Dashboards: Business intelligence and reporting
- Machine Learning Models: Advanced analytics and ML workloads
CSV Sources → Bronze Layer → Silver Layer → Gold Layer → Consumption Layer
The data flows left to right through the architecture:
- Source systems (ERP/CRM CSV files) feed into Bronze
- Bronze data is cleansed and standardized in Silver
- Silver data is transformed with business logic in Gold
- Gold layer serves BI, ML, and Ad-hoc analysis needs
SQL-DataWareHouse-Project/
├── Scripts/
│ ├── 01_Database_Initialization.sql # Creates database and schemas
│ ├── 02_Bronze_Layer_Tables.sql # Creates all Bronze layer tables
│ └── 03_Bronze_Layer_Data_Load.sql # Loads CSV data into Bronze tables
├── Datasets/
│ ├── source_crm/ # CRM system CSV files
│ │ ├── cust_info.csv # Customer information
│ │ ├── prd_info.csv # Product information
│ │ └── sales_details.csv # Sales transaction details
│ └── source_erp/ # ERP system CSV files
│ ├── CUST_AZ12.csv # Customer demographics
│ ├── LOC_A101.csv # Customer location data
│ └── PX_CAT_G1V2.csv # Product categories
├── Docs/ # Documentation and architecture diagrams
├── Tests/ # Test scripts and validation queries
└── README.md # This file
- SQL Server 2019 or later (Docker container recommended)
- SQL Server Management Studio (SSMS) or Azure Data Studio
- Docker (if using containerized SQL Server)
-
Clone the repository
git clone https://github.com/AnsariUsaid/SQL-DataWareHouse.git cd SQL-DataWareHouse -
Setup SQL Server (Docker recommended)
docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=YourPassword123" \ -p 1433:1433 --name sqlserver \ -d mcr.microsoft.com/mssql/server:2022-latest
-
Initialize the database
- Execute
Scripts/01_Database_Initialization.sql - Creates database 'DataWareHous' and Bronze, Silver, Gold schemas
- Execute
-
Create Bronze layer tables
- Execute
Scripts/02_Bronze_Layer_Tables.sql - Creates 6 tables for CRM and ERP source systems
- Execute
-
Load data into Bronze layer
- Copy CSV files to Docker container:
docker cp Datasets/source_crm/cust_info.csv sqlserver:/var/opt/mssql/data/ docker cp Datasets/source_crm/prd_info.csv sqlserver:/var/opt/mssql/data/ docker cp Datasets/source_crm/sales_details.csv sqlserver:/var/opt/mssql/data/ docker cp Datasets/source_erp/CUST_AZ12.csv sqlserver:/var/opt/mssql/data/ docker cp Datasets/source_erp/LOC_A101.csv sqlserver:/var/opt/mssql/data/ docker cp Datasets/source_erp/PX_CAT_G1V2.csv sqlserver:/var/opt/mssql/data/
- Execute
Scripts/03_Bronze_Layer_Data_Load.sql
- Copy CSV files to Docker container:
- Medallion Architecture: Progressive data refinement through Bronze → Silver → Gold
- Scalability: Designed to handle enterprise-scale data volumes
- Maintainability: Clear separation of concerns across layers
- Performance: Optimized for analytical workloads
- Data Quality: Built-in data validation and cleansing processes
- Raw data ingestion with no transformations
- Maintains data lineage and audit trail
- Supports data recovery and reprocessing
- Data quality improvements and standardization
- Business rule enforcement
- Data type conversions and formatting
- Business-ready datasets for analytics
- Optimized for query performance
- Implements dimensional modeling concepts
- Follow the established naming conventions
- Document all schema changes
- Test scripts before committing
- Update this README for architectural changes
This project is for educational and demonstration purposes.
Note: This is an active development project. The architecture and implementation details may evolve as the project progresses.