Data Engineering with Databricks

Hands-on labs for building data pipelines with Delta Live Tables and the medallion architecture
Delta Live Tables | Expectations | Streaming | Medallion Architecture | CDC

FREE Databricks Edition: All examples and labs in this course work with the Databricks Community Edition (free). No paid account required.

Overview

Learn to build production data pipelines using Databricks. This course covers:

Delta Live Tables — Declarative ETL pipelines with SQL and Python
Data Quality — Expectations for validation, cleaning, and failure handling
Streaming — Auto Loader for incremental file processing
Medallion Architecture — Bronze, Silver, and Gold layers for organized data
Change Data Capture — SCD Type 1 and Type 2 with apply_changes()

Quick Start

Sign up for the Databricks Community Edition (free)

Clone this repository:

git clone https://github.com/alfredodeza/databricks-data-engineering.git

Upload example files from examples/ to your Databricks workspace
Create a Delta Live Tables pipeline and point it to the uploaded file
Run the pipeline

Important: All example files reference /Volumes/workspace/default/ for data paths. You must update these paths to match your own workspace volume configuration.

Usage

Run Examples

Upload pipeline files to your Databricks workspace and create DLT pipelines:

Example	Files	Description
DLT Basics	`examples/dlt-basics/my_transformation.sql`	SQL Bronze-Silver-Gold pipeline
Streaming	`examples/streaming/streaming_transformation.py`	Batch vs streaming with Auto Loader
Simple Pipeline	`examples/simple-pipeline/*.py`	Wine ratings with full medallion layers
Inventory System	`examples/wine-pricing-inventory/*.py`	End-to-end pipeline with CDC

Complete the Labs

Lab	Topic	Examples
Lab 1	DLT Foundations	`dlt-basics/`
Lab 2	Data Quality with Expectations	`dlt-basics/`, `simple-pipeline/`
Lab 3	Streaming with DLT	`streaming/`
Lab 4	Bronze Layer Fundamentals	`simple-pipeline/`
Lab 5	Silver and Gold Layers	`simple-pipeline/`, `wine-pricing-inventory/`
Lab 6	End-to-End Application	`wine-pricing-inventory/`

Capstone Project

After completing all labs, build your own production-style pipeline in the Capstone Project. Choose a dataset and implement a complete medallion architecture with expectations, streaming, and CDC.

Course Outline

Module 1: Delta Live Tables Fundamentals

DLT Foundations — Creating pipelines with SQL and Python
Data Quality with Expectations — Validation and constraint handling
Streaming with DLT — Auto Loader and incremental processing

Module 2: Medallion Architecture

Bronze Layer — Raw data ingestion from volumes
Silver Layer — Data cleaning and normalization
Gold Layer — Business logic and aggregations

Module 3: End-to-End Application

Wine Pricing Inventory — Complete pipeline with CDC
Capstone Project — Build your own pipeline

See the full course outline for detailed lesson breakdowns.

Project Structure

databricks-data-engineering/
├── examples/
│   ├── dlt-basics/              # SQL-based DLT pipeline
│   ├── streaming/               # Batch vs streaming ingestion
│   ├── simple-pipeline/         # Wine ratings Bronze-Silver-Gold
│   └── wine-pricing-inventory/  # End-to-end pipeline with CDC
├── labs/                        # Hands-on lab instructions
├── docs/                        # Course outline and capstone
└── tmp/                         # Original pipeline files (reference)

Resources

Related Courses:

Contributing

See CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch
Submit a pull request

License

Apache License 2.0 — see LICENSE for details.

Made with care by Pragmatic AI Labs

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
data		data
docs		docs
examples		examples
labs		labs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering with Databricks

Table of Contents

Overview

Quick Start

Usage

Run Examples

Complete the Labs

Capstone Project

Course Outline

Module 1: Delta Live Tables Fundamentals

Module 2: Medallion Architecture

Module 3: End-to-End Application

Project Structure

Resources

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

paiml/databricks-data-engineering

Folders and files

Latest commit

History

Repository files navigation

Data Engineering with Databricks

Table of Contents

Overview

Quick Start

Usage

Run Examples

Complete the Labs

Capstone Project

Course Outline

Module 1: Delta Live Tables Fundamentals

Module 2: Medallion Architecture

Module 3: End-to-End Application

Project Structure

Resources

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages