Skip to content

julianwucn/ms_fabric_mdd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MS_Fabric_MDD

A declarative metadata-driven ETL framework developed using PySpark and Lakehouse and/or data warehouse architecture on Microsoft Fabric, enabling end-to-end data process automation based on configurations defined in YAML files.

Key Features

  • Declarative ETL, data engineers define the desired end state of the transformed data, and the ETL tool automatically generates the code to transform the data into that end state
  • Align with medallion architecture with customizable bronze, silver and gold layers
  • Support the data ingestion of different sources and flat files of csv, text, json, orc, parquet, table, jdbc, excel, xml and dbf
  • Support multiple data refresh strategies: full, incremental and backfill
  • Support multiple incremental data refresh options: CDF and timestamp, auto fallback to backfill or full data refresh options if in recent not-synced CDF data is purged
  • Support multiple data write options: append, merge, overwrite
  • Support multiple transformation code: sql, python and notebook
  • Support SCD type 2 dimensions
  • Built-in micro-batches support for both data ingestion and data transformation
  • Built-in rule engine to support data validation and data correctness
  • Built-in data lineage and table dependency tracking

Medallion Architecture image

Sample Architecture image

Metadata Driven Framework image image

About

A Declarative Metadata Driven Framework for MS Fabric

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published