GitHub - nathadriele/data-engineering-zoomcamp: The Data Engineering Zoomcamp covers essential skills in containerization, workflow orchestration, data warehousing, analytics engineering, batch, and streaming processing. It includes tools like Docker, Terraform, BigQuery, dbt, Spark, Kafka, Kestra, Postgres, Google Data Studio, and Metabase.

Data Engineering Zoomcamp

The Data Engineering Zoomcamp offers essential concepts, tools, and hands-on skills required for modern data engineering. Covering a broad spectrum of topics — including containerization, infrastructure as code, and advanced batch and streaming processing — the course takes a practical, project-based approach. This ensures that participants not only understand the theory but also apply their knowledge by developing real-world data pipelines.

Featured Tools and Technologies

Docker: Containerization platform for building, shipping, and running applications.
Terraform: Infrastructure as code tool for building, changing, and versioning infrastructure.
Google BigQuery: Serverless, highly scalable, and cost-effective multi-cloud data warehouse.
dbt (data build tool): Analytics engineering tool providing a transformation-focused query runner.
Apache Spark: Open-source distributed computing system for big data processing.
Apache Kafka: Distributed event streaming platform for building real-time data pipelines and streaming applications.
Kestra: Flexible and scalable workflow orchestration and automation tool.
PostgreSQL: Powerful open-source relational database system.
Google Data Studio: Data visualization and reporting tool to turn data into informative dashboards and reports.
Metabase: Open-source business intelligence and analytics tool for easy data visualization and exploration.

Module 1: Containerization and Infrastructure as Code

GCP
Docker and docker-compose
Running Postgres locally with Docker
Setting up infrastructure on GCP with Terraform
Preparing the environment

Module 2: Workflow Orchestration

Data Lake
Workflow orchestration
Workflow orchestration with Kestra

Workshop 1: Data Ingestion

Reading from apis
Building scalable pipelines
Normalising data
Incremental loading

Module 3: Data Warehouse

Data Warehouse
BigQuery
Partitioning and clustering
BigQuery best practices
Internals of BigQuery
BigQuery Machine Learning

Module 4: Analytics engineering

Basics of analytics engineering
dbt (data build tool)
BigQuery and dbt
Postgres and dbt
dbt models
Testing and documenting
Deployment to the cloud and locally
Visualizing the data with google data studio and metabase

Module 5: Batch processing

Batch processing
What is Spark
Spark Dataframes
Spark SQL
Internals: GroupBy and joins

Module 6: Streaming

Introduction to Kafka
Schemas (avro)
Kafka Streams
Kafka Connect and KSQL

Project

Week 1 and 2: working on your project
Week 3: reviewing your peers

https://github.com/DataTalksClub/data-engineering-zoomcamp

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
01-docker-terraform		01-docker-terraform
02-workflow-orchestration		02-workflow-orchestration
03-data-warehouse		03-data-warehouse
04-analytics-engineering		04-analytics-engineering
05-batch		05-batch
06-streaming		06-streaming
workshop-01-ingestion-with-dlt		workshop-01-ingestion-with-dlt
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp

Featured Tools and Technologies

Module 1: Containerization and Infrastructure as Code

Module 2: Workflow Orchestration

Workshop 1: Data Ingestion

Module 3: Data Warehouse

Module 4: Analytics engineering

Module 5: Batch processing

Module 6: Streaming

Project

About

Releases

Packages

Languages

nathadriele/data-engineering-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp

Featured Tools and Technologies

Module 1: Containerization and Infrastructure as Code

Module 2: Workflow Orchestration

Workshop 1: Data Ingestion

Module 3: Data Warehouse

Module 4: Analytics engineering

Module 5: Batch processing

Module 6: Streaming

Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages