Data ingestion is the cornerstone of Data Engineering — it’s where every data journey begins. In this hands-on workshop, you’ll learn how to move data from anywhere to anywhere using the open-source modern data stack.
We’ll focus on practical skills, leveraging Python library dlt (data load tool) to ingest data from a REST API and load it into DuckDB, a fast and lightweight database. Whether you're just getting started with data pipelines or looking to modernize your current stack, this session will give you a solid foundation for building reliable, open-source ingestion workflows.
Come ready to write some code, get your hands dirty, and walk away with real-world ingestion superpowers.
- PyLadies Amsterdam uses uv for dependency management
- Google account if you want to use Google Colab
There are two ways of running this workshop:
- With Google Colab (recommended)
- On local instance of Jupyter Notebook
You can open direct links to Colab:
Or you can add .ipynb
from this repository to your Google Drive the following way:
- Visit Google Colab
- In the top left corner select "File" → "Open Notebook"
- Under "GitHub", enter the URL of the repo of this workshop
- Select one of the notebooks within the repo.
- At the top of the notebook, add a Code cell and run the following code:
!git clone <github-url-of-workshop-repo>
%cd <name-of-repo>
!pip install -r requirements.txt
Run the following code:
git clone <github-url-of-workshop-repo>
cd <name-of-repo>
# create and activate venv, install dependencies
uv sync
And start the Jupyter Notebook:
uv run jupyter notebook
Re-watch this YouTube stream
This workshop was set up by @pyladiesams and @VioletM
To ensure our code looks beautiful, PyLadies uses pre-commit hooks. You can enable them by running pre-commit install
. You may have to install pre-commit
first, using uv sync
, uv pip install pre-commit
or pip install pre-commit
.
Happy Coding :)