MS_Fabric_MDD

A declarative metadata-driven ETL framework developed using PySpark and Lakehouse and/or data warehouse architecture on Microsoft Fabric, enabling end-to-end data process automation based on configurations defined in YAML files.

Key Features

Declarative ETL, data engineers define the desired end state of the transformed data, and the ETL tool automatically generates the code to transform the data into that end state
Align with medallion architecture with customizable bronze, silver and gold layers
Support the data ingestion of different sources and flat files of csv, text, json, orc, parquet, table, jdbc, excel, xml and dbf
Support multiple data refresh strategies: full, incremental and backfill
Support multiple incremental data refresh options: CDF and timestamp, auto fallback to backfill or full data refresh options if in recent not-synced CDF data is purged
Support multiple data write options: append, merge, overwrite
Support multiple transformation code: sql, python and notebook
Support SCD type 2 dimensions
Built-in micro-batches support for both data ingestion and data transformation
Built-in rule engine to support data validation and data correctness
Built-in data lineage and table dependency tracking

Medallion Architecture

Sample Architecture

Metadata Driven Framework

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md
mdd.zip		mdd.zip
metadata.zip		metadata.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MS_Fabric_MDD

About

Uh oh!

Releases

Packages

License

julianwucn/ms_fabric_mdd

Folders and files

Latest commit

History

Repository files navigation

MS_Fabric_MDD

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages