Skip to content

rochiekop/data-tranformation-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏬 Small Business Staging & Data Warehouse Schema

This project contains SQL scripts for managing and transforming data for a small business website. It focuses on capturing, staging, and warehousing key business data related to product categories, subcategories, products, customers, orders, and order details.

The solution is designed in two layers:

  1. Staging Layer: Temporarily holds raw transactional data.
  2. Data Warehouse Layer: Organizes data into a dimensional model for analytics and reporting.

πŸ“ Contents

1. init_staging.sql

Initializes the staging schema, which includes:

  • Table structures for categories, subcategories, products, customers, orders, and order_details
  • Sample or initial data for simulation/testing purposes

This script is meant to reflect the operational data as it might be captured from a small e-commerce website.


2. wh_schema.sql

Defines the data warehouse schema, based on a star/snowflake dimensional model. It includes:

  • Dimension Tables: dim_product, dim_category, dim_customer, dim_date, etc.
  • Fact Tables: fact_sales, fact_orders, etc.

This schema is optimized for analytics, BI reporting, and time-series queries.


3. wh_populating-data.sql

Performs the initial data load from the staging area into the data warehouse. It includes:

  • Data transformations
  • Surrogate key mapping
  • Lookup and join logic to build dimension and fact tables
  • Ensures referential integrity across DW tables

4. wh_trigger.sql

Contains trigger functions and logic to:

  • Detect and handle new records or updates in the staging tables
  • Ensure that the data warehouse stays up-to-date with minimal delay
  • Automate incremental loading to reduce the need for full reloads

This is essential for near real-time syncing between staging and warehouse layers.


🧱 Database Features

  • βœ… Supports separation between operational (staging) and analytical (warehouse) environments
  • πŸ”„ Designed for incremental data updates via triggers
  • πŸ“Š Enables simplified reporting and BI through a structured dimensional model
  • πŸ§ͺ Easy-to-test and extend schema for growing business requirements

πŸš€ Getting Started

  1. Run init_staging.sql to create and populate the staging environment.
  2. Run wh_schema.sql to set up the data warehouse structure.
  3. Run wh_populating-data.sql to load the initial dataset into the warehouse.
  4. Run wh_trigger.sql to enable automatic updates from staging to warehouse.

πŸ›  Requirements

  • PostgreSQL (version 12+ recommended)
  • A database client (e.g., pgAdmin, DBeaver, or psql CLI)

πŸ“¬ Contributing

Feel free to submit issues, suggest improvements, or fork this repo if you'd like to add features like:

  • CDC mechanisms
  • Data quality checks
  • Integration with ETL tools (e.g., Airflow, dbt)

About

SQL scripts for managing and transforming data for a small business website

Topics

Resources

Stars

Watchers

Forks