Databricks framework to validate Data Quality of pySpark DataFrames and Tables
-
Updated
Feb 9, 2026 - Python
Databricks framework to validate Data Quality of pySpark DataFrames and Tables
Automated migrations to Unity Catalog
A production-ready PySpark project template with medallion architecture, Python packaging, unit tests, integration tests, CI/CD automation, Databricks Asset Bundles, and DQX data quality framework.
Production-ready support ticket classification using Unity Catalog AI Functions, Vector Search, and RAG. Features 6-phase workflow, knowledge base integration, and Streamlit dashboard.
Real Estate ELT pipeline using Databricks Asset Bundles on GCP. Ingests, transforms, and analyzes property data via Delta Live Tables. Follows medallion architecture (Bronze/Silver/Gold), modular Python design, CI/CD automation with GitHub Actions, and full Unit and Integration tests coverage.
End-to-end Databricks Lakehouse pipeline using Auto Loader, Delta Lake, Unity Catalog, Bronze–Silver–Gold, and business marts (Daily Sales, Top Categories, Customer LTV).
This is a Web API to connect to your Lakehouse with Unity
263 tools for Databricks via MCP. SDK-first, covers Unity Catalog, SQL, Compute, Jobs, Serving, Vector Search, Apps, Lakebase, and more.
Enterprise-grade transit analytics platform built on Databricks. Implements Medallion Architecture (Bronze-Silver-Gold) with Delta Live Tables, Unity Catalog governance, and 20+ data quality rules. Demonstrates production-ready data engineering patterns including Liquid Clustering, Change Data Feed, and automated pipeline orchestration.
End-to-end Azure Data Engineering project using ADF for incremental ingestion, Databricks (DLT) for Medallion Architecture, and Delta Lake for CDC (SCD Type 1). Managed via Databricks Asset Bundles (DABs) for professional CI/CD. Focuses on real-time streaming, scalability, and Star Schema modeling.
Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers and data analysts with a simple collaborative environment to run interactive and scheduled data analysis workloads.
Production-grade utilities for Delta Lake table management and optimization
MLS 2.0 - Qobrix to RESO Data Dictionary ETL pipeline on Databricks
Azure DE mini-project: ADF + ADLS Gen2 + Databricks to ingest public CMS hospital registry data and build Bronze/Silver/Gold Delta layers using Managed Identity only (no secrets). Outputs Gold aggregates by state, rating, and hospital type.
A comprehensive collection of tutorials and best practices for integrating Azure Databricks with enterprise Azure services. Learn through hands-on notebooks and Infrastructure-as-Code examples.
Automated metadata extraction and population tool for AEMO (Australian Energy Market Operator) electricity market data tables in Databricks Unity Catalog.
Add a description, image, and links to the unity-catalog topic page so that developers can more easily learn about it.
To associate your repository with the unity-catalog topic, visit your repo's landing page and select "manage topics."