Modern Data Warehouse - SQL Server Implementation

A comprehensive data warehouse project implementing the medallion architecture pattern with Bronze, Silver, and Gold layers using SQL Server.

Architecture Overview

This project implements a modern enterprise data warehouse architecture with three distinct layers:

1. Source Systems Layer

ERP System: CSV file-based data source
CRM System: CSV file-based data source

2. Data Warehouse Layer (SQL Server)

A. Bronze Layer (Raw Zone)

Object Type: Tables
Load: Batch Processing, Full Load, Truncate & Insert
Transformations: None (Raw Data, No Transformations)
Data Model: None (As-Is)

B. Silver Layer (Cleansed & Standardized Zone)

Object Type: Tables
Load: Batch Processing, Full Load, Truncate & Insert
Transformations:
- Data Cleansing
- Data Standardization
- Data Normalization
- Derived Columns
- Data Enrichment
Data Model: None (As-Is)

C. Gold Layer (Analytics Layer)

Object Type: Views
Load: No Load (Logical Layer Only)
Transformations:
- Data Integration
- Aggregations
- Business Logic
Data Model:
- Star Schema
- Flat Tables
- Aggregated Tables

3. Consumption Layer

Ad-Hoc Analysis: Interactive data exploration
BI & Dashboards: Business intelligence and reporting
Machine Learning Models: Advanced analytics and ML workloads

Data Flow

CSV Sources → Bronze Layer → Silver Layer → Gold Layer → Consumption Layer

The data flows left to right through the architecture:

Source systems (ERP/CRM CSV files) feed into Bronze
Bronze data is cleansed and standardized in Silver
Silver data is transformed with business logic in Gold
Gold layer serves BI, ML, and Ad-hoc analysis needs

Project Structure

SQL-DataWareHouse-Project/
├── Scripts/
│   ├── 01_Database_Initialization.sql      # Creates database and schemas
│   ├── 02_Bronze_Layer_Tables.sql          # Creates all Bronze layer tables
│   └── 03_Bronze_Layer_Data_Load.sql       # Loads CSV data into Bronze tables
├── Datasets/
│   ├── source_crm/                         # CRM system CSV files
│   │   ├── cust_info.csv                   # Customer information
│   │   ├── prd_info.csv                    # Product information
│   │   └── sales_details.csv               # Sales transaction details
│   └── source_erp/                         # ERP system CSV files
│       ├── CUST_AZ12.csv                   # Customer demographics
│       ├── LOC_A101.csv                    # Customer location data
│       └── PX_CAT_G1V2.csv                 # Product categories
├── Docs/              # Documentation and architecture diagrams
├── Tests/             # Test scripts and validation queries
└── README.md          # This file

Getting Started

Prerequisites

SQL Server 2019 or later (Docker container recommended)
SQL Server Management Studio (SSMS) or Azure Data Studio
Docker (if using containerized SQL Server)

Installation Steps

Clone the repository

git clone https://github.com/AnsariUsaid/SQL-DataWareHouse.git
cd SQL-DataWareHouse

Setup SQL Server (Docker recommended)

docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=YourPassword123" \
   -p 1433:1433 --name sqlserver \
   -d mcr.microsoft.com/mssql/server:2022-latest

Initialize the database
- Execute Scripts/01_Database_Initialization.sql
- Creates database 'DataWareHous' and Bronze, Silver, Gold schemas
Create Bronze layer tables
- Execute Scripts/02_Bronze_Layer_Tables.sql
- Creates 6 tables for CRM and ERP source systems

Load data into Bronze layer

Copy CSV files to Docker container:

docker cp Datasets/source_crm/cust_info.csv sqlserver:/var/opt/mssql/data/
docker cp Datasets/source_crm/prd_info.csv sqlserver:/var/opt/mssql/data/
docker cp Datasets/source_crm/sales_details.csv sqlserver:/var/opt/mssql/data/
docker cp Datasets/source_erp/CUST_AZ12.csv sqlserver:/var/opt/mssql/data/
docker cp Datasets/source_erp/LOC_A101.csv sqlserver:/var/opt/mssql/data/
docker cp Datasets/source_erp/PX_CAT_G1V2.csv sqlserver:/var/opt/mssql/data/

Execute Scripts/03_Bronze_Layer_Data_Load.sql

Design Principles

Medallion Architecture: Progressive data refinement through Bronze → Silver → Gold
Scalability: Designed to handle enterprise-scale data volumes
Maintainability: Clear separation of concerns across layers
Performance: Optimized for analytical workloads
Data Quality: Built-in data validation and cleansing processes

Layer Responsibilities

Bronze Layer

Raw data ingestion with no transformations
Maintains data lineage and audit trail
Supports data recovery and reprocessing

Silver Layer

Data quality improvements and standardization
Business rule enforcement
Data type conversions and formatting

Gold Layer

Business-ready datasets for analytics
Optimized for query performance
Implements dimensional modeling concepts

Contributing

Follow the established naming conventions
Document all schema changes
Test scripts before committing
Update this README for architectural changes

License

This project is for educational and demonstration purposes.

Note: This is an active development project. The architecture and implementation details may evolve as the project progresses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Data Warehouse - SQL Server Implementation

Architecture Overview

1. Source Systems Layer

2. Data Warehouse Layer (SQL Server)

A. Bronze Layer (Raw Zone)

B. Silver Layer (Cleansed & Standardized Zone)

C. Gold Layer (Analytics Layer)

3. Consumption Layer

Data Flow

Project Structure

Getting Started

Prerequisites

Installation Steps

Design Principles

Layer Responsibilities

Bronze Layer

Silver Layer

Gold Layer

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Datasets		Datasets
Docs		Docs
Scripts		Scripts
.DS_Store		.DS_Store
README.md		README.md

AnsariUsaid/SQL-DataWareHouse

Folders and files

Latest commit

History

Repository files navigation

Modern Data Warehouse - SQL Server Implementation

Architecture Overview

1. Source Systems Layer

2. Data Warehouse Layer (SQL Server)

A. Bronze Layer (Raw Zone)

B. Silver Layer (Cleansed & Standardized Zone)

C. Gold Layer (Analytics Layer)

3. Consumption Layer

Data Flow

Project Structure

Getting Started

Prerequisites

Installation Steps

Design Principles

Layer Responsibilities

Bronze Layer

Silver Layer

Gold Layer

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages