Ever wondered if MotherDuck offers a REST API? This project steps in to bridge that gap, enabling you to interact with DuckDB databases through a RESTful interface without writing a single line of backend code. Designed for deployment at the edge, it's a cost-effective solution for hobbyists and small projects.
The DuckDB Data API facilitates direct access to your DuckDB database, allowing for CRUD operations via HTTP.
- Simplicity and Rapid Prototyping: Offers a dynamic, RESTful interface to cater to various project requirements with minimal or no coding.
- Adaptation to Serverless and Edge Computing: Aims to provide a fitting database solution for applications operating at the internet's edge.
- Leveraging DuckDB’s Speed: While DuckDB excels in performance, integrating it directly into web applications posed challenges.
- Filling the Gap Left by MotherDuck: Although MotherDuck allows DuckDB to run serverlessly, it lacked a REST interface. This project aims to bridge that gap.
- Inspiration from MongoDB's Atlas Data Proxy: The convenience and developer-friendly nature of MongoDB's Atlas Data Proxy influenced the development of this project, promoting quick backend support for rapid prototyping.
- Cost-Effective Deployment on Vercel: The project prioritizes a deployment strategy that incurs minimal to no costs on platforms like Vercel.
- Adopting the ActiveRecord Pattern: Mimics the ActiveRecord design pattern to offer an intuitive way of interacting with database records.
- Automated CRUD Operations: Streamlines Create, Read, Update, and Delete operations to save development time and effort.
- Flexibility and Dynamic Nature: Ensures the proxy can adapt to varying project needs without rigid constraints.
- Single-File Implementation: Strives for a lightweight solution, ideally encapsulated within a single file for ease of use and integration.
-
Create a Virtual Environment:
python3 -m venv env source env/bin/activate pip install -r requirements.txt
-
Create a
.env
file at the root of your project to configure the environment settings:
# .env file
DUCKDB_DATABASE_URL=duckdb:///tickit.duckdb
DUCKDB_SCHEMA_NAME=main
QUERY_BLACKLIST=DELETE,DROP,TRUNCATE,ALTER
DUCKDB_DATABASE_URL
: Specifies the connection URL to your local DuckDB database file or to a Motherduck connectionstring. Alternatively, you can set this to:memory:
to use an in-memory database.DUCKDB_SCHEMA_NAME
: Sets the default schema for database operations within the data api. If left unset, it defaults to the "main" schema.QUERY_BLACKLIST
: Defines a comma-separated list of SQL keywords that the data api will reject to prevent potentially destructive operations. If this list is empty or not set, no commands will be blocked, and all types of queries will be permitted.
-
Install Packages
pip install -r requirements.txt
-
Run the project:
uvicorn main:app --reload
Physical DuckDB file:
DUCKDB_DATABASE_URL=duckdb:///path/to/your/database.duckdb
In-memory DuckDB instance:
DUCKDB_DATABASE_URL=duckdb:///:memory:
Motherduck:
DUCKDB_DATABASE_URL=duckdb:///md:[motherduck-token]@[db-name]
Interact with your DuckDB database through the following RESTful routes by replacing entity
with your table name:
Method | Route | Description | Query Parameter Examples |
---|---|---|---|
GET | /entity |
List entities | ?limit=10&skip=20?select=field1..?order=field1 asc?field1.eq=value |
POST | /entity |
Create a new entity | N/A |
GET | /entity/:id |
Get a single entity by ID | N/A |
PUT | /entity/:id |
Replace an entity by ID (full update) | N/A |
PATCH | /entity/:id |
Update an entity by ID (partial update) | N/A |
DELETE | /entity/:id |
Delete an entity by ID | N/A |
- Filtering:
?field1.eq=value
filters the list byfield1
equal tovalue
. - Sorting:
?order=field1 asc
sorts the list byfield1
in ascending order. - Pagination:
?limit=10&skip=20
limits the list to 10 entities, skipping the first 20. - Selecting Fields:
?select=field1,field2
selects onlyfield1
andfield2
to be returned in each entity in the list.
The DuckDB Data Proxy supports a range of filter operators for querying data, allowing for precise data retrieval based on specific criteria:
.eq
: Equals.neq
: Not equals.gt
: Greater than.gte
: Greater than or equal to.lt
: Less than.lte
: Less than or equal to.like
: Like (for pattern matching).ilike
: Case-insensitive pattern matching
These operators can be used in query parameters to filter the data retrieved from the database. For example, ?name.like=%john%
would filter records where the name
field contains "john".
It would be more organized to place execute/sql
in the "Additional Endpoints" section if it serves a different or more specific purpose than the standard CRUD operations. It's common to separate utility or administrative endpoints from the main CRUD operations to clarify their use cases. Here's how you could mention it:
In addition to the core RESTful routes, the DuckDB Data Proxy provides several utility endpoints for diagnostics, metadata, and system health checks:
Method | Route | Description | Query Parameter Examples |
---|---|---|---|
GET | / |
Root endpoint returning a welcome message. | N/A |
GET | /health |
Health check endpoint. | N/A |
GET | /debug/connection |
Tests database connection. | N/A |
POST | /execute/sql |
Execute a custom SQL query (SELECT or DDL statement). | N/A |
The POST /execute/sql
endpoint is for advanced users who need to execute custom SQL queries or DDL statements that are not covered by the standard CRUD operations. Please use this endpoint with caution, as improper use can affect database integrity and security.
Method | Route | Description |
---|---|---|
GET | /metadata/databases |
Lists all databases in the current instance. |
GET | /metadata/schemas |
Lists all schemas in the current database. |
GET | /metadata/tables |
Lists all tables in the current schema. |
GET | /metadata/columns |
Lists all columns in the current schema. |
GET | /metadata/views |
Lists all views in the current schema. |
GET | /metadata/constraints |
Lists all constraints in the current schema. |
Path Format | Description |
---|---|
/metadata/{database} |
Lists all schemas in the specified database. |
/metadata/{database}/{schema} |
Lists all tables in the specified schema of a database. |
/metadata/{database}/{schema}/{table} |
Lists all columns in the specified table of a schema. |
/metadata/{database}/{schema}/{table}/{column} |
Fetches metadata for the specific column in the specified table. |
Method | Route | Description |
---|---|---|
GET | /metadata/{catalog}/{schema}/{table}/summarize |
Fetch statistical summaries for all columns in the table. |
GET | /metadata/{catalog}/{schema}/{table}/column/{column}/summarize |
Fetch statistical summaries for a specific column in a table. |
GET | /profile |
Profiles a table or a specific column. Use object=db.schema.table or object=db.schema.table.column . |
GET | /describe |
Fetches metadata for a specific object using object=db.schema.table . |
Interact with the following tables from tickit db: sale
, event
, data
, category
, user
, listing
, venue
GET https://duckdb-data-api.vercel.app/user?state.eq=NL&liketheatre=true&limit=10&offset=5&order=firstname%20asc - List, filter, sort and paginate users
POST https://duckdb-data-api.vercel.app/user - Create a new user
GET https://duckdb-data-api.vercel.app/user/{id} - Get a single user by ID
PUT https://duckdb-data-api.vercel.app/user/{id} - Replace a user by ID (full update)
PATCH https://duckdb-data-api.vercel.app/user/{id} - Update a user by ID (partial update)
DELETE https://duckdb-data-api.vercel.app/user/{id} - Delete a user by ID
- Performance Considerations: Because of the way we jump from the edge to MotherDuck and back, and how we fetch data, especially when counting items for pagination, there might be a slight delay.
- Primary Key Expectations: Right now, we expect the primary key in your tables to be named "id". We know that’s not always the case, so we’re thinking of ways to work around this in future updates.
- FastAPI:
- DuckDB and MotherDuck:
- Python 3.9:
- SQLAlchemy:
flowchart LR
Client -->|REST API Call| FastAPI
FastAPI -->|SQLAlchemy ORM| DuckDB
DuckDB -->|Process Query| MotherDuck
MotherDuck -->|Return Results| DuckDB
DuckDB -->|ORM| FastAPI
FastAPI -->|JSON Response| Client
style FastAPI fill:#f9f,stroke:#333,stroke-width:2px
style DuckDB fill:#bbf,stroke:#333,stroke-width:2px
style MotherDuck fill:#fbf,stroke:#333,stroke-width:2px
style Client fill:#dfd,stroke:#333,stroke-width:2px
Deploying your application to Vercel can significantly simplify the process, thanks to its support for serverless functions. For a detailed guide on deploying FastAPI applications to Vercel, check out this insightful blog post. It walks you through the steps to ensure your application runs smoothly in a serverless environment, making your DuckDB Data Proxy accessible from anywhere.
Before diving into this Python project, I launched Goduck, a similar initiative built with Golang. It aimed to provide REST API interaction with DuckDB and MotherDuck, much like what we're doing here but in the Go ecosystem.
While trying to deploy Goduck across various environments, including serverless platforms, I faced hurdles due to the C-go dependency of the Go duckDB driver, which made the build process tricky for different Linux systems. This challenge highlighted the benefits of Python's straightforwardness and the extensive support from its community. Here are the main insights:
- Simpler Python Driver: Python's approach to DuckDB felt more straightforward and developer-friendly.
- Larger Python Community: The vast Python community meant more potential users and contributors for this project.
- Inspiration from MongoDB: MongoDB's Atlas Data Proxy, which simplifies database operations, inspired me to offer a similar experience for DuckDB users, facilitating quick backend setups for rapid prototyping.