Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How open SQLite database? #62

Closed
chyzwar opened this issue Dec 13, 2024 · 6 comments
Closed

How open SQLite database? #62

chyzwar opened this issue Dec 13, 2024 · 6 comments
Assignees

Comments

@chyzwar
Copy link

chyzwar commented Dec 13, 2024

How to install extensions?

I am trying to open SQLite database.

import { DuckDBInstance } from '@duckdb/node-api';

const db = await DuckDBInstance.create("../data/database.db");
const connection = db.connect();

I get the following error:

node:internal/modules/run_main:122
    triggerUncaughtException(
    ^

[Error: IO Error: Extension "/home/raziel/.duckdb/extensions/v1.1.3/linux_amd64_gcc4/sqlite_scanner.duckdb_extension" not found.
Extension "sqlite" is an existing extension.

Install it first using "INSTALL sqlite".]

Node.js v23.3.0
@jraymakers
Copy link
Contributor

A couple options to try:

  • Don't pass the SQLite database to DuckDBInstance.create. Instead, after creating the instance and a connection, run install sqlite followed by attach '../data/database.db'.
  • Install the SQLite extension (using install sqlite) in an earlier run, and then try passing the SQLite database to DuckDBInstance.create in a later run. I'm less sure whether this will work, but it might.

@jraymakers jraymakers self-assigned this Dec 16, 2024
@chyzwar
Copy link
Author

chyzwar commented Dec 17, 2024

// Install SQLite extension
const install = await DuckDBInstance.create();
const installConnection = await install.connect();
await installConnection.run("INSTALL sqlite;");

// Open database and create connection
const path = "../data/10m_urban_areas.mbtiles"
const db = await DuckDBInstance.create(path, {
  threads: 4,
});
const connection = await db.connect();

Thanks, I managed to fix it, but queries are really slow.

0,0,0: 242.801ms
1,0,0: 235.962ms
1,0,1: 285.763ms
1,1,0: 244.896ms
1,1,1: 237.353ms
2,0,0: 234.29ms
2,0,1: 239.029ms
2,0,2: 235.906ms
2,0,3: 240.602ms
2,1,0: 238.652ms
2,1,1: 238.997ms

Where on SQLite queries are 0.021ms in average. Duck db is, 10000x slower for these queries. Is this expected for SQLite extension? Or this specific query is not a good fit for duckdb?

See reproduction: https://github.com/mapka-dev/mbtiles-benchmark/blob/master/duckdb-node-js/default-seq.js

@jraymakers
Copy link
Contributor

I don't have much experience with the sqlite DuckDB extension; for the best answers I'd recommend DuckDB's Discord server, probably the #extensions channel.

However, looking at the queries you're running, it seems you're doing point lookups. That is, you're using a WHERE clause to find and return a single, specific item of data, rather than returning many rows or doing some sort aggregation. SQLite, as an OLTP database, is optimized for the former, while DuckDB, as an OLAP database, is optimized for the latter.

Also, I don't know if it's relevant in this case, but I'd be curious why you're setting threads to 4 explicitly, rather than letting DuckDB select a good default based on your machine's specifications (e.g. number of cores).

@chyzwar
Copy link
Author

chyzwar commented Dec 18, 2024

I am running this do benchmark and to evaluate different alternatives for database backend for mbtiles. In fact, 4 threads in this case gives DuckDB unfair advantage. I still need to implement thread pool for SQLite.

From system monitor, CPU, and disk utilization is very low, I think that threads setting have very little effect. I am running this on AMD Ryzen 7 5800X with 16 logical cores.

With threads 1:
2,3,0: 319.497ms
With threads auto
3,0,3: 231.374ms
With threads 4
2,3,2: 240.347ms

My guest is that there is bottleneck in node event loop or bindings that prevent from higher utilization. There is reason why better-sqlite decided to use sync interface instead of async/promise based one. Maybe for duck async interface is better in context of typical queries. WiseLibs/better-sqlite3#32

I am aware that duckdb is OLAP database, but I wanted to try for this case.

@jraymakers
Copy link
Contributor

If I modify your benchmark script to just make a simple SELECT statement (e.g. SELECT $1, $2, $3) instead of using the sqlite extension or your tile data, then I get times around 0.1ms.

One possible explanation could be the tile_data being returned. I'm assuming this is binary data? About how big? I could see how returning lots of big BLOBs with individual point lookups could result in this kind of latency.

@jraymakers
Copy link
Contributor

Since there's no reason to suspect that the observed performance is specific to Node, I'm going to close this issue. If we do discover something specific to Node, we can open a more specific issue.

I'm happy to continue the discussion, to help understand what's going on. But I'm no expert on DuckDB performance, nor the sqlite extension, so you may get better information on the DuckDB Discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants