Skip to content

duckdb/duckdb-python

Repository files navigation

DuckDB logo

discord PyPi Latest Release


DuckDB.org | User Guide (Python) - API Docs (Python)

DuckDB: A Fast, In-Process, Portable, Open Source, Analytical Database System

  • Simple: DuckDB is easy to install and deploy. It has zero external dependencies and runs in-process in its host application or as a single binary.
  • Portable: DuckDB runs on Linux, macOS, Windows, Android, iOS and all popular hardware architectures. It has idiomatic client APIs for major programming languages.
  • Feature-rich: DuckDB offers a rich SQL dialect. It can read and write file formats such as CSV, Parquet, and JSON, to and from the local file system and remote endpoints such as S3 buckets.
  • Fast: DuckDB runs analytical queries at blazing speed thanks to its columnar engine, which supports parallel execution and can process larger-than-memory workloads.
  • Extensible: DuckDB is extensible by third-party features such as new data types, functions, file formats and new SQL syntax. User contributions are available as community extensions.
  • Free: DuckDB and its core extensions are open-source under the permissive MIT License. The intellectual property of the project is held by the DuckDB Foundation.

Installation

Install the latest release of DuckDB directly from PyPi:

pip install duckdb

Install with all optional dependencies:

pip install 'duckdb[all]'

Development

Building wheels and sdists

To build a wheel and sdist for your system and the default Python version:

uv build

To build a wheel for a different Python version:

# E.g. for Python 3.9
uv build -p 3.9

Editable installs (general)

It's good to be aware of the following when creating an editable install:

  • uv sync or uv run [tool] create editable installs by default, however, it work the way you expect. We have configured the project so that scikit-build-core will use a persistent build-dir, but since the build itself happens in an isolated, ephemeral environment, cmake's paths will point to non-existing directories. CMake itself will be missing.
  • You should install all development dependencies, and then build the project without build isolation, in two separate steps. After this you can happily keep building and running, as long as you don't forget to pass in the --no-build-isolation flag.
# install all dev dependencies without building the project (needed once)
uv sync -p 3.9 --no-install-project
# build and install without build isolation
uv sync  --no-build-isolation

Editable installs (IDEs)

If you're using an IDE then life is a little simpler. You install build dependencies and the project in the two steps outlined above, and from that point on you can rely on e.g. CLion's cmake capabilities to do incremental compilation and editable rebuilds. This will skip scikit-build-core's build backend and all of uv's dependency management, so for "real" builds you better revert to the CLI. However, this should work fine for coding and debugging.

Running tests

Run all pytests:

uv run --no-build-isolation pytest ./tests --verbose

Exclude the test/slow directory:

uv run --no-build-isolation pytest ./tests --verbose --ignore=./tests/slow

Test coverage

Run with coverage (during development you probably want to specify which tests to run):

COVERAGE=1 uv run --no-build-isolation coverage run -m pytest ./tests --verbose

The COVERAGE env var will compile the extension with --coverage, allowing us to collect coverage stats of C++ code as well as Python code.

Check coverage for Python code:

uvx coverage html -d htmlcov-python
uvx coverage report --format=markdown

Check coverage for C++ code (note: this will clutter your project dir with html files, consider saving them in some other place):

uvx gcovr \
  --gcov-ignore-errors all \
  --root "$PWD" \
  --filter "${PWD}/src/duckdb_py" \
  --exclude '.*/\.cache/.*' \
  --gcov-exclude '.*/\.cache/.*' \
  --gcov-exclude '.*/external/.*' \
  --gcov-exclude '.*/site-packages/.*' \
  --exclude-unreachable-branches \
  --exclude-throw-branches \
  --html --html-details -o coverage-cpp.html \
  build/coverage/src/duckdb_py \
  --print-summary

Typechecking and linting

  • We're not running any mypy typechecking tests at the moment
  • We're not running any ruff / linting / formatting at the moment

Cibuildwheel

You can run cibuildwheel locally for linux. E.g. limited to Python 3.9:

CIBW_BUILD='cp39-*' uvx cibuildwheel --platform linux .

Code conventions

Tooling

This codebase is developed with the following tools:

  • Astral UV - for dependency management across all platforms we provide wheels for, and for Python environment management. It will be hard to work on this codebase without having UV installed.
  • Scikit-build-core - the build backend for building the extension. On the background, scikit-build-core uses cmake and ninja for compilation.
  • pybind11 - a bridge between C++ and Python.
  • CMake - the build system for both DuckDB itself and the DuckDB Python module.
  • Cibuildwheel

Merging changes to pythonpkg from duckdb main

  1. Checkout main 2Identify the merge commits that brought in tags to main:
git log --graph --oneline --decorate main --simplify-by-decoration
  1. Get the log of commits
git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/
  1. Checkout v1.3-ossivalis
  2. Get the log of commits
git log --oneline v1.3.0..v1.3.1 -- tools/pythonpkg/

git diff --name-status 71c5c07cdd c9254ecff2 -- tools/pythonpkg/

git log --oneline 71c5c07cdd..c9254ecff2 -- tools/pythonpkg/
git diff --name-status <HASH_A> <HASH_B> -- tools/pythonpkg/

Versioning and Releases

The DuckDB Python package versioning and release scheme follows that of DuckDB itself. This means that a X.Y.Z[. postN] release of the Python package ships the DuckDB stable release X.Y.Z. The optional .postN releases ship the same stable release of DuckDB as their predecessors plus Python package-specific fixes and / or features.

Types DuckDB Version Resulting Python Extension Version
Stable release: DuckDB stable release 1.3.1 1.3.1
Stable post release: DuckDB stable release + Python fixes and features 1.3.1 1.3.1.postX
Nightly micro: DuckDB next micro nightly + Python next micro nightly 1.3.2.devM 1.3.2.devN
Nightly minor: DuckDB next minor nightly + Python next minor nightly 1.4.0.devM 1.4.0.devN

Note that we do not ship nightly post releases (e.g. we don't ship 1.3.1.post2.dev3).

Branch and Tag Strategy

We cut releases as follows:

Type Tag How
Stable minor release vX.Y.0 Adding a tag on main
Stable micro release vX.Y.Z Adding a tag on a minor release branch (e.g. v1.3-ossivalis)
Stable post release vX.Y.Z-postN Adding a tag on a post release branch (e.g. v1.3.1-post)
Nightly micro not tagged Combining HEAD of the micro release branches of DuckDB and the Python package
Nightly minor not tagged Combining HEAD of the minor release branches of DuckDB and the Python package

Release Runbooks

We cut a new stable minor release with the following steps:

  1. Create a PR on main to pin the DuckDB submodule to the tag of its current release.
  2. Iff all tests pass in CI, merge the PR.
  3. Manually start the release workflow with the hash of this commit, and the tag name.
  4. Iff all goes well, create a new PR to let the submodule track DuckDB main.

We cut a new stable micro release with the following steps:

  1. Create a PR on the minor release branch to pin the DuckDB submodule to the tag of its current release.
  2. Iff all tests pass in CI, merge the PR.
  3. Manually start the release workflow with the hash of this commit, and the tag name.
  4. Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.

We cut a new stable post release with the following steps:

  1. Create a PR on the post release branch to pin the DuckDB submodule to the tag of its current release.
  2. Iff all tests pass in CI, merge the PR.
  3. Manually start the release workflow with the hash of this commit, and the tag name.
  4. Iff all goes well, create a new PR to let the submodule track DuckDB's minor release branch.

Dynamic Versioning Integration

The package uses setuptools_scm with scikit-build for automatic version determination, and implements a custom versioning scheme.

  • pyproject.toml configuration:

    [tool.scikit-build]
    metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"
    
    [tool.setuptools_scm]
    version_scheme = "duckdb_packaging._setuptools_scm_version:version_scheme"
  • Environment variables:

    • MAIN_BRANCH_VERSIONING=0: Use release branch versioning (patch increments)
    • MAIN_BRANCH_VERSIONING=1: Use main branch versioning (minor increments)
    • OVERRIDE_GIT_DESCRIBE: Override version detection

About

The DuckDB Python package

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages