Hands-on labs for building data pipelines with Delta Live Tables and the medallion architecture
Delta Live Tables | Expectations | Streaming | Medallion Architecture | CDC
FREE Databricks Edition: All examples and labs in this course work with the Databricks Community Edition (free). No paid account required.
- Overview
- Quick Start
- Usage
- Capstone Project
- Course Outline
- Project Structure
- Resources
- Contributing
- License
Learn to build production data pipelines using Databricks. This course covers:
- Delta Live Tables — Declarative ETL pipelines with SQL and Python
- Data Quality — Expectations for validation, cleaning, and failure handling
- Streaming — Auto Loader for incremental file processing
- Medallion Architecture — Bronze, Silver, and Gold layers for organized data
- Change Data Capture — SCD Type 1 and Type 2 with
apply_changes()
- Sign up for the Databricks Community Edition (free)
- Clone this repository:
git clone https://github.com/alfredodeza/databricks-data-engineering.git
- Upload example files from
examples/to your Databricks workspace - Create a Delta Live Tables pipeline and point it to the uploaded file
- Run the pipeline
Important: All example files reference
/Volumes/workspace/default/for data paths. You must update these paths to match your own workspace volume configuration.
Upload pipeline files to your Databricks workspace and create DLT pipelines:
| Example | Files | Description |
|---|---|---|
| DLT Basics | examples/dlt-basics/my_transformation.sql |
SQL Bronze-Silver-Gold pipeline |
| Streaming | examples/streaming/streaming_transformation.py |
Batch vs streaming with Auto Loader |
| Simple Pipeline | examples/simple-pipeline/*.py |
Wine ratings with full medallion layers |
| Inventory System | examples/wine-pricing-inventory/*.py |
End-to-end pipeline with CDC |
| Lab | Topic | Examples |
|---|---|---|
| Lab 1 | DLT Foundations | dlt-basics/ |
| Lab 2 | Data Quality with Expectations | dlt-basics/, simple-pipeline/ |
| Lab 3 | Streaming with DLT | streaming/ |
| Lab 4 | Bronze Layer Fundamentals | simple-pipeline/ |
| Lab 5 | Silver and Gold Layers | simple-pipeline/, wine-pricing-inventory/ |
| Lab 6 | End-to-End Application | wine-pricing-inventory/ |
After completing all labs, build your own production-style pipeline in the Capstone Project. Choose a dataset and implement a complete medallion architecture with expectations, streaming, and CDC.
- DLT Foundations — Creating pipelines with SQL and Python
- Data Quality with Expectations — Validation and constraint handling
- Streaming with DLT — Auto Loader and incremental processing
- Bronze Layer — Raw data ingestion from volumes
- Silver Layer — Data cleaning and normalization
- Gold Layer — Business logic and aggregations
- Wine Pricing Inventory — Complete pipeline with CDC
- Capstone Project — Build your own pipeline
See the full course outline for detailed lesson breakdowns.
databricks-data-engineering/
├── examples/
│ ├── dlt-basics/ # SQL-based DLT pipeline
│ ├── streaming/ # Batch vs streaming ingestion
│ ├── simple-pipeline/ # Wine ratings Bronze-Silver-Gold
│ └── wine-pricing-inventory/ # End-to-end pipeline with CDC
├── labs/ # Hands-on lab instructions
├── docs/ # Course outline and capstone
└── tmp/ # Original pipeline files (reference)
- Databricks Documentation
- Delta Live Tables Guide
- Databricks Community Edition (Free)
- Delta Lake Documentation
Related Courses:
See CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Submit a pull request
Apache License 2.0 — see LICENSE for details.
Made with care by Pragmatic AI Labs