duckdb_engine

Very very very basic sqlalchemy driver for duckdb

Usage

$ pip install duckdb-engine

Once you've installed this package, you should be able to just use it, as sqlalchemy does a python path search

from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session

Base = declarative_base()


class FakeModel(Base):  # type: ignore
    __tablename__ = "fake"

    id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
    name = Column(String)


eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

session.add(FakeModel(name="Frank"))
session.commit()

frank = session.query(FakeModel).one()

assert frank.name == "Frank"

How to register a pandas DataFrame

eng = create_engine("duckdb:///:memory:")
eng.execute("register", ("dataframe_name", pd.DataFrame(...)))

eng.execute("select * from dataframe_name")

Things to keep in mind

Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_engine dialect is derived from the postgresql dialect, sqlalchemy may try to use PostgreSQL-only features. Below are some caveats to look out for.

Auto-incrementing ID columns

When defining an Integer column as a primary key, sqlalchemy uses the SERIAL datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the sqlalchemy.Sequence() object to auto-increment the key. For more information on sequences, you can find the sqlalchemy Sequence documentation here.

The following example demonstrates how to create an auto-incrementing ID column for a simple table:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
...     'users',
...     metadata,
...     sqlalchemy.Column(
...         'id',
...         sqlalchemy.Integer,
...         user_id_seq,
...         server_default=user_id_seq.next_value(),
...         primary_key=True,
...     ),
... )
>>> metadata.create_all(bind=engine)

Pandas `read_sql()` chunksize

The pandas.read_sql() method can read tables from duckdb_engine into DataFrames, but the sqlalchemy.engine.result.ResultProxy trips up when fetchmany() is called. Therefore, for now chunksize=None (default) is necessary when reading duckdb tables into DataFrames. For example:

>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine)                ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception

The name

Yes, I'm aware this package should be named duckdb-driver or something, I wasn't thinking when I named it and it's too hard to change the name now

Name		Name	Last commit message	Last commit date
Latest commit History 526 Commits
.github		.github
.hypothesis		.hypothesis
docs		docs
duckdb_engine		duckdb_engine
.gitignore		.gitignore
.kodiak.toml		.kodiak.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.whitesource		.whitesource
History.md		History.md
LICENSE.txt		LICENSE.txt
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
renovate.json		renovate.json
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

duckdb_engine

Usage

How to register a pandas DataFrame

Things to keep in mind

Auto-incrementing ID columns

Pandas `read_sql()` chunksize

The name

About

Releases

Packages

Languages

License

gforsyth/duckdb_engine

Folders and files

Latest commit

History

Repository files navigation

duckdb_engine

Usage

How to register a pandas DataFrame

Things to keep in mind

Auto-incrementing ID columns

Pandas read_sql() chunksize

The name

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pandas `read_sql()` chunksize

Packages