This project contains SQL scripts for managing and transforming data for a small business website. It focuses on capturing, staging, and warehousing key business data related to product categories, subcategories, products, customers, orders, and order details.
The solution is designed in two layers:
- Staging Layer: Temporarily holds raw transactional data.
- Data Warehouse Layer: Organizes data into a dimensional model for analytics and reporting.
Initializes the staging schema, which includes:
- Table structures for
categories,subcategories,products,customers,orders, andorder_details - Sample or initial data for simulation/testing purposes
This script is meant to reflect the operational data as it might be captured from a small e-commerce website.
Defines the data warehouse schema, based on a star/snowflake dimensional model. It includes:
- Dimension Tables:
dim_product,dim_category,dim_customer,dim_date, etc. - Fact Tables:
fact_sales,fact_orders, etc.
This schema is optimized for analytics, BI reporting, and time-series queries.
Performs the initial data load from the staging area into the data warehouse. It includes:
- Data transformations
- Surrogate key mapping
- Lookup and join logic to build dimension and fact tables
- Ensures referential integrity across DW tables
Contains trigger functions and logic to:
- Detect and handle new records or updates in the staging tables
- Ensure that the data warehouse stays up-to-date with minimal delay
- Automate incremental loading to reduce the need for full reloads
This is essential for near real-time syncing between staging and warehouse layers.
- β Supports separation between operational (staging) and analytical (warehouse) environments
- π Designed for incremental data updates via triggers
- π Enables simplified reporting and BI through a structured dimensional model
- π§ͺ Easy-to-test and extend schema for growing business requirements
- Run
init_staging.sqlto create and populate the staging environment. - Run
wh_schema.sqlto set up the data warehouse structure. - Run
wh_populating-data.sqlto load the initial dataset into the warehouse. - Run
wh_trigger.sqlto enable automatic updates from staging to warehouse.
- PostgreSQL (version 12+ recommended)
- A database client (e.g., pgAdmin, DBeaver, or psql CLI)
Feel free to submit issues, suggest improvements, or fork this repo if you'd like to add features like:
- CDC mechanisms
- Data quality checks
- Integration with ETL tools (e.g., Airflow, dbt)