-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Cancellation and query lifecycle clarification #74
Comments
The best way to cancel query execution is to call Each connection can only execute one query at a time. (Executing queries in parallel is one of the main reasons to create multiple connections.) Calling If you run another query on a connection with an active query, the old one will be interrupted. That's another way to cancel, though it's not as explicit as calling I believe if you run a query "normally" (i.e. in non-streaming mode, using, say, You can have even more fine-grained control over query execution by using the |
(Closing because this is a discussion, not an issue, but I'm happy to continue discussing.) |
Ah thank you, that's very helpful. The lazy evaluation of I'm writing some tests to explore these edges. Are queries like Additionally, accessing |
Interesting. It seems either something changed, or I was mistaken: I thought Unfortunately I don't see a non-deprecated way to create a streaming result through the C API. Let me ask around and see what the best path forward is. |
Regarding |
Hmm, I've run
Thank you for looking into it. Streaming results are a very desirable feature for our use case. For some context, I'm working on a low latency, real-time signal processing and control UI for hardware. It has an incremental compute and rendering engine. I'm considering DuckDB as the primary save file format (to extend the use case beyond just real-time). Appender support gets us in the territory of write performance that I'm looking for. Streaming results are a good fit for the incremental compute and rendering engine. Additionally, the ability to offload more traditional aggregations of historical data to DuckDB is a big win. |
I also just experimented with Definitely understand the value of streaming results. I've used them in other contexts, using the C++ API directly, so I know they work. I'm pretty sure it used to work with Node Neo (a while ago), so I think the behavior of the C API may have changed at some point after I wrote the relevant part of the bindings. I'm inquiring as to the best way to make it work again. |
Finally got something to work regarding query progress. The follow script prints some meaningful progress numbers: import { DuckDBInstance, DuckDBPendingResultState } from '@duckdb/node-api';
async function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
const instance = await DuckDBInstance.create();
const connection = await instance.connect();
await connection.run('pragma enable_progress_bar');
await connection.run('create table tbl as select range a, mod(range,10) b from range(10000000)');
await connection.run('create table tbl_2 as select range a from range(10000000)');
const query = 'select count(*) from tbl inner join tbl_2 on (tbl.a = tbl_2.a)';
const prepared = await connection.prepare(query);
const pending = prepared.start();
while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
console.log('not ready', connection.progress);
await sleep(10);
}
console.log('ready', connection.progress);
await pending.getResult();
console.log('got result', connection.progress); I got those queries from a test case in the duckdb repo. It seems the query plan matters for query progress. Probably only some operators report progress. |
Regarding streaming results, it seems that, while the |
Filed #76 to track implementing streaming results. |
You may have noticed already, but I published a new version (1.1.3-alpha.8) that provides access to streaming results. Details are in the README. Let me know how it works for you! |
Thank you! It seems to work really well. I'm able to run queries like I've written some benchmarks for the various streaming API combinations to explore the state-space, and a benchmark for query parameterisation as well. The results are pretty much as expected. Happy holidays! |
I'm unfamiliar with the specific lifecycle of a query within DuckDB.
Is it possible to cancel an ongoing, streaming query (perhaps by not calling
await result.fetchChunk()
when there are chunks left, and by letting the result become garbage collected / by explicitly destroying the result)? If so, could that explicit destruction be considered as an addition to #55?Is 'backpressure' applied by not calling fetchChunk? If say, many rows were selected in a query, when exactly are they 'materialised' in relation to the fetchChunk call? If the database is remote for example, would a 'cancelled' query prevent later HTTP calls?
The text was updated successfully, but these errors were encountered: