This starter helps you spin up a DuckLake catalog backed by a local DuckDB metadata DB and MinIO (S3-compatible) for Parquet data files.
It also includes a CLI to generate and load a TPC-H dataset into DuckLake using DuckDB's ducklake and tpch extensions (no manual data loads).
docker-compose.minio.yml— launches a local MinIO server + console.config.example.yaml— minimal config for metadata path, MinIO, and dataset options.bootstrap_ducklake.py— Python CLI to attach a DuckLake catalog and load TPC-H tables.- This
README.md.
# One-time env setup (install dependencies, source environment, etc)
./setup_env.sh
# Automate calling main script: startup ducklake, minio, load tpch dataset into ducklake
./run_ducklake.shAt this point, you'll have a working ducklake! If you want, you can validate the tpch dataset with the standard tpch queries:
./run_tpch_queries.py- Run MinIO (Docker required)
docker compose -f docker-compose.minio.yml up -d
# Console: http://localhost:9001 | S3 endpoint: http://localhost:9000
# Default creds in compose file: minioadmin / minioadmin
# Create a bucket, e.g. 'ducklake-data' (via the console)- Copy config & edit
cp config.example.yaml config.yaml
# set: bucket, prefix, region, and (optionally) MINIO creds via env or config- Use the CLI
python3 bootstrap_ducklake.py attach --config config.yaml
python3 bootstrap_ducklake.py load-tpch --config config.yaml --scale 1
# Re-run load-tpch with a different scale if you want (will CTAS into DuckLake)- Extensible backends: The CLI is structured so you can add new metadata backends (e.g., Postgres) and storage backends (e.g., AWS S3) later.
- Pure DuckDB/DuckLake: We use only DuckDB SQL:
CREATE SECRET ...,ATTACH 'ducklake:...' (DATA_PATH ...), andCALL dbgen(...)followed byCREATE TABLE ... AS SELECT ...into DuckLake. - No manual file writing: Parquet files are created by DuckLake within your object store path.
- Python 3.9+
pip install duckdb pyyaml- Docker (for running MinIO).