Very very very basic sqlalchemy driver for duckdb
$ pip install duckdb-engine
Once you've installed this package, you should be able to just use it, as sqlalchemy does a python path search
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
class FakeModel(Base): # type: ignore
__tablename__ = "fake"
id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
name = Column(String)
eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)
session.add(FakeModel(name="Frank"))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == "Frank"
eng = create_engine("duckdb:///:memory:")
eng.execute("register", ("dataframe_name", pd.DataFrame(...)))
eng.execute("select * from dataframe_name")
Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_engine
dialect is derived from the postgresql
dialect, sqlalchemy
may try to use PostgreSQL-only features. Below are some caveats to look out for.
When defining an Integer column as a primary key, sqlalchemy
uses the SERIAL
datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the sqlalchemy.Sequence()
object to auto-increment the key. For more information on sequences, you can find the sqlalchemy Sequence
documentation here.
The following example demonstrates how to create an auto-incrementing ID column for a simple table:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
... 'users',
... metadata,
... sqlalchemy.Column(
... 'id',
... sqlalchemy.Integer,
... user_id_seq,
... server_default=user_id_seq.next_value(),
... primary_key=True,
... ),
... )
>>> metadata.create_all(bind=engine)
The pandas.read_sql()
method can read tables from duckdb_engine
into DataFrames, but the sqlalchemy.engine.result.ResultProxy
trips up when fetchmany()
is called. Therefore, for now chunksize=None
(default) is necessary when reading duckdb tables into DataFrames. For example:
>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine) ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25) ### Throws an exception
Yes, I'm aware this package should be named duckdb-driver
or something, I wasn't thinking when I named it and it's too hard to change the name now