Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
-
Updated
Oct 10, 2019 - Python
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
Build a data warehouse from scratch, including full load, daily incremental load, design schema, SCD Type 1 and 2.
This is a flask application that converts an informational model of a decision problem to a snow-flaked star schema
Building Data Warehouse and ETL pipelines using Amazon S3 and Redshift
Simple scripts for data cleaning, etl transformations and data reorganisations
Batch & streaming data pipelines built using Databricks with Pyspark and modeled the data into star schema to analyze in PowerBI, Formula-1 racing data from multiple data sources, APIs.
Transformed raw HR data into a star schema using GCP & Cloud SQL, wrote SQL queries for business reporting and analyzed trends like age vs. income, performance, and hiring by gender. Visualized insights in Tableau for data-driven HR decisions. Tools: Google Cloud SQL(Postgres), GCP, Tableau.
Open-source Supply Chain analytics on Microsoft Fabric: a scalable Bronze-Silver-Gold pipeline with automated CSV ingestion, Delta Lake transforms, semantic modeling (DAX & RLS) and interactive Power BI reports. Join to enhance pipelines, refine models, and build next-gen supply-chain insights!
Model an star schema from raw normalized Olympic Games data using dbt - postgres, airflow and docker
Data Modeling with Apache Cassandra
ETL pipeline that extracts and transforms student athlete academic performance data, then populates a data warehouse using a star schema dimensional model.
ETL Pipeline that Scrapes, Cleans, and Loads Book Data into PostgreSQL, then builds a Star-Schema Data Warehouse for Optimized Analysis.
University lab exercises with processing big data.
Udacity project: implementing an ETL process on a PostgreSQL DB to create a star schema data model
Creating a Data Warehouse using Aws Redshift.
All in one slice and dice module
A full-stack data engineering portfolio project with ingestion, batch processing, star schema modeling, orchestration, and analytics dashboard.
An Airflow + dbt project that models e-commerce data using DuckDB and a mini star schema.
End-to-end sales pipeline: CSV → Parquet → star schema → RDS Postgres. Orchestrated with Airflow; infra via Terraform on AWS
This project builds a real-time food delivery analytics pipeline using AWS Kinesis, PySpark, Redshift, and QuickSight, with automated deployments via CodeBuild.
Add a description, image, and links to the star-schema topic page so that developers can more easily learn about it.
To associate your repository with the star-schema topic, visit your repo's landing page and select "manage topics."