-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how can I speed up the mysql query with duckdb mysql extenstion? #55
Comments
Thanks for the report! This extension does not push down aggregations into MySQL, meaning the table is fully loaded into DuckDB before processing the aggregation in DuckDB. The new |
thanks for reply, I m chinese, my English is so poor, In my case, my data is so small ( 3 rows) and my query is so simple( 1 line), so I can not understand why my query preparestatment run 350ms (just prepare not execute), and load 3 rows data run 400ms? (I just want to build a remote server jdbc with duckdb, mysql_query just a simple function , can not setParameter in the query string, like: select * from tb where a = ? and b= ?; stat.setString(1, 'jr' ); stat.setInt(2, 100);) |
On running the first query the catalog information is loaded which could be where the time is going. Try running the query a second time perhaps? |
I run the query many time, it execute 350ms + 400ms the same, |
Hm, that's a bit excessive yes. Could you try this:
|
I try 2 query in the TRANSACTION, the first query is slow: 350ms + 400ms, the second query is fast (prepare with 2ms, run with 20ms) |
@Mytherin hello,can give me some advice? or this problem will fix in next duckdb_mysql version? then I close this issuses |
I encounter the same performance decrease issue on MySQL. TPC-H(sf=1) Query 17 in PG+DuckDB costs 2.15s while MySQL costs 30.37s! And I sample both scenarios and check their I notice that MySQL + DuckDB EXPLAIN
PG + DuckDB EXPLAIN
|
I also set debug_show_queries to both and find that mysql scanner does not implement the parallel scan used in postgres |
Hi Is this issue still relevant for mysql ? meaning as per above comments, the table is fully loaded in a sequential scan. postgres_scanner has a parallel ctid scan since postgres 14 https://github.com/duckdb/postgres_scanner/blame/03eaed75f0ec5500609b7a97aa05468493b229d1/src/postgres_scanner.cpp#L111 If we have a massive mysql table, say, 10M tuples 100 columns, what would be the preferred way to COPY its contents from a SELECT clause ? Shall we use a JDBC ResultSet instead of the C libmysql client and stream to a DuckDB Appender ? ty |
According to my test and observation, this mysql extension did have obvious performance problem. First, comparing with JDBC clients to connection to mysql, it took at least 4 times more cost on most of big queries. It means if you can execute with 100ms on some mysql client, you have to take 400ms at least with duckdb mysql extension. Second, this extension seems not be able to take good use of streaming reading capability. Instead, it loads all of data from mysql database and then takes effect. So, it needs to take huge amount of memories for big data query, and you cannot get streaming response in case you need. Third, you cannot not count it as good as pg extension of duckdb, which owns binary reading and parallel reading, and result to an extraordinary performance of pg reading which even much better than all kinds of PG JDBC reading. Finally, as summary, I hope as long as this extension can act as good as common JDBC client, and behavior as streaming read way as pg extension, it will satisfy for most of users. Reference to issue Duckdb mysql extension took extreme longer time to access data than other components |
What happens?
I want to use duckdb to speed up the mysql query in mysql db,just like: select sum(amt) from mysqldb.order.order_info (3rows for test), the query prepare run 350 ms, and execute query with 400ms, I think it is too slow , compare with direct use mysql jdbc client ( run with 2ms), I expect the mysql extension can run less than 100ms in small data(less than 1000rows), and speed up 10x in bigdata(more than 100000 rows),compare with mysql jdbc, how can I do for this?
To Reproduce
install mysql;
load mysql;
attack "......"
use mysqldb
select sum(amt) from mysqldb.order.order_info
OS:
mac
MySQL Version:
8.1
DuckDB Version:
0.10.1
DuckDB Client:
java
Full Name:
何俊然
Affiliation:
有信科技 youxin china
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: