Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ bin/
.mypy_cache/
htmlcov

# Jupyter notebook checkpoints
.ipynb_checkpoints/

pyiceberg/avro/decoder_fast.c
pyiceberg/avro/*.html
pyiceberg/avro/*.so
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ repos:
- id: ruff
args: [ --fix, --exit-non-zero-on-fix ]
- id: ruff-format
- repo: https://github.com/nbQA-dev/nbQA
rev: 1.9.1
hooks:
- id: nbqa-ruff
args: [ --fix, --exit-non-zero-on-fix ]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.18.2
hooks:
Expand Down
19 changes: 18 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ test: ## Run all unit tests (excluding integration)

test-integration: test-integration-setup test-integration-exec test-integration-cleanup ## Run integration tests

test-integration-setup: ## Start Docker services for integration tests
test-integration-setup: install ## Start Docker services for integration tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the make install pre-req here because otherwise dev/provision.py will fail

docker compose -f dev/docker-compose-integration.yml kill
docker compose -f dev/docker-compose-integration.yml rm -f
docker compose -f dev/docker-compose-integration.yml up -d --wait
Expand Down Expand Up @@ -153,6 +153,21 @@ docs-serve: ## Serve local docs preview (hot reload)
docs-build: ## Build the static documentation site
uv run $(PYTHON_ARG) mkdocs build -f mkdocs/mkdocs.yml --strict

# ========================
# Experimentation
# ========================

##@ Experimentation

notebook-install: ## Install notebook dependencies
uv sync $(PYTHON_ARG) --all-extras --group notebook

notebook: notebook-install ## Launch notebook for experimentation
uv run jupyter lab --notebook-dir=notebooks

notebook-infra: notebook-install test-integration-setup ## Launch notebook with integration test infra (Spark, Iceberg Rest Catalog, object storage, etc.)
uv run jupyter lab --notebook-dir=notebooks

# ===================
# Project Maintenance
# ===================
Expand All @@ -167,4 +182,6 @@ clean: ## Remove build artifacts and caches
@find . -name "__pycache__" -exec echo Deleting {} \; -exec rm -rf {} +
@find . -name "*.pyd" -exec echo Deleting {} \; -delete
@find . -name "*.pyo" -exec echo Deleting {} \; -delete
@echo "Cleaning up Jupyter notebook checkpoints..."
@find . -name ".ipynb_checkpoints" -exec echo Deleting {} \; -exec rm -rf {} +
@echo "Cleanup complete."
1 change: 1 addition & 0 deletions dev/.rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ build
.gitignore
uv.lock
mkdocs/*
notebooks/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but then we have to make sure that they are not bundled in the release. The notebooks do contain code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea agreed. i double check the artifacts, the new notebooks/ dir is not included. Similar to how the mkdocs/ dir is not included.

Feels like this can be a potential footgun where a folder is included in the artifact but RAT check is ignored in .rat-excludes. I think we can add a CI check to prevent this. I'll track this as a separate issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

35 changes: 35 additions & 0 deletions mkdocs/docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,41 @@ export PYICEBERG_CATALOG__TEST_CATALOG__ACCESS_KEY_ID=username
export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password
```

## Notebooks for Experimentation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


PyIceberg provides Jupyter notebooks for quick experimentation and learning. Two Make commands are available depending on your needs:

### PyIceberg Examples (`make notebook`)

For basic PyIceberg experimentation without additional infrastructure:

```bash
make notebook
```

This will install notebook dependencies and launch Jupyter Lab in the `notebooks/` directory.

**PyIceberg Example Notebook** (`notebooks/pyiceberg_example.ipynb`) is based on the [Getting Started with PyIceberg](https://py.iceberg.apache.org/#getting-started-with-pyiceberg) page. It demonstrates basic PyIceberg operations like creating catalogs, schemas, and querying tables without requiring any external services.

### Spark Integration Examples (`make notebook-infra`)

For working with PyIceberg alongside Spark, use the infrastructure-enabled notebook environment:

```bash
make notebook-infra
```

This command spins up the full integration test infrastructure via Docker Compose, including:

- **Spark** (with Spark Connect)
- **Iceberg REST Catalog** (using the [`apache/iceberg-rest-fixture`](https://hub.docker.com/r/apache/iceberg-rest-fixture) image)
- **Hive Metastore**
- **S3-compatible object storage** (Minio)

**Spark Example Notebook** (`notebooks/spark_integration_example.ipynb`) is based on the [Spark Getting Started](https://iceberg.apache.org/docs/nightly/spark-getting-started/) guide. This notebook demonstrates how to work with PyIceberg alongside Spark, leveraging the Docker-based testing setup for a complete local development environment.

After running `make notebook-infra`, open `spark_integration_example.ipynb` in the Jupyter Lab interface to explore Spark integration capabilities.

## Code standards

Below are the formalized conventions that we adhere to in the PyIceberg project. The goal of this is to have a common agreement on how to evolve the codebase, but also using it as guidelines for newcomers to the project.
Expand Down
4 changes: 4 additions & 0 deletions mkdocs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,10 @@ Since the catalog was configured to use the local filesystem, we can explore how
find /tmp/warehouse/
```

## Try it yourself with Jupyter Notebooks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the frontpage, https://py.iceberg.apache.org/
Screenshot 2026-01-06 at 12 19 11 PM


PyIceberg provides Jupyter notebooks for hands-on experimentation with the examples above and more. Check out the [Notebooks for Experimentation](contributing.md#notebooks-for-experimentation) guide.

## More details

For the details, please check the [CLI](cli.md) or [Python API](api.md) page.
Loading