Skip to content

Learn data engineering with Databricks course with examples and labs

License

Notifications You must be signed in to change notification settings

paiml/databricks-data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Data Engineering with Databricks

Data Engineering with Databricks

Python 3.10+ License Databricks

Hands-on labs for building data pipelines with Delta Live Tables and the medallion architecture
Delta Live Tables | Expectations | Streaming | Medallion Architecture | CDC

FREE Databricks Edition: All examples and labs in this course work with the Databricks Community Edition (free). No paid account required.


Table of Contents


Overview

Learn to build production data pipelines using Databricks. This course covers:

  • Delta Live Tables — Declarative ETL pipelines with SQL and Python
  • Data Quality — Expectations for validation, cleaning, and failure handling
  • Streaming — Auto Loader for incremental file processing
  • Medallion Architecture — Bronze, Silver, and Gold layers for organized data
  • Change Data Capture — SCD Type 1 and Type 2 with apply_changes()

Quick Start

  1. Sign up for the Databricks Community Edition (free)
  2. Clone this repository:
    git clone https://github.com/alfredodeza/databricks-data-engineering.git
  3. Upload example files from examples/ to your Databricks workspace
  4. Create a Delta Live Tables pipeline and point it to the uploaded file
  5. Run the pipeline

Important: All example files reference /Volumes/workspace/default/ for data paths. You must update these paths to match your own workspace volume configuration.


Usage

Run Examples

Upload pipeline files to your Databricks workspace and create DLT pipelines:

Example Files Description
DLT Basics examples/dlt-basics/my_transformation.sql SQL Bronze-Silver-Gold pipeline
Streaming examples/streaming/streaming_transformation.py Batch vs streaming with Auto Loader
Simple Pipeline examples/simple-pipeline/*.py Wine ratings with full medallion layers
Inventory System examples/wine-pricing-inventory/*.py End-to-end pipeline with CDC

Complete the Labs

Lab Topic Examples
Lab 1 DLT Foundations dlt-basics/
Lab 2 Data Quality with Expectations dlt-basics/, simple-pipeline/
Lab 3 Streaming with DLT streaming/
Lab 4 Bronze Layer Fundamentals simple-pipeline/
Lab 5 Silver and Gold Layers simple-pipeline/, wine-pricing-inventory/
Lab 6 End-to-End Application wine-pricing-inventory/

Capstone Project

After completing all labs, build your own production-style pipeline in the Capstone Project. Choose a dataset and implement a complete medallion architecture with expectations, streaming, and CDC.


Course Outline

Module 1: Delta Live Tables Fundamentals

Module 2: Medallion Architecture

Module 3: End-to-End Application

See the full course outline for detailed lesson breakdowns.


Project Structure

databricks-data-engineering/
├── examples/
│   ├── dlt-basics/              # SQL-based DLT pipeline
│   ├── streaming/               # Batch vs streaming ingestion
│   ├── simple-pipeline/         # Wine ratings Bronze-Silver-Gold
│   └── wine-pricing-inventory/  # End-to-end pipeline with CDC
├── labs/                        # Hands-on lab instructions
├── docs/                        # Course outline and capstone
└── tmp/                         # Original pipeline files (reference)

Resources

Related Courses:


Contributing

See CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License

Apache License 2.0 — see LICENSE for details.


Made with care by Pragmatic AI Labs

About

Learn data engineering with Databricks course with examples and labs

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published