Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how can I speed up the mysql query with duckdb mysql extenstion? #55

Open
2 tasks done
junranhe opened this issue Mar 20, 2024 · 11 comments
Open
2 tasks done

how can I speed up the mysql query with duckdb mysql extenstion? #55

junranhe opened this issue Mar 20, 2024 · 11 comments

Comments

@junranhe
Copy link

What happens?

I want to use duckdb to speed up the mysql query in mysql db,just like: select sum(amt) from mysqldb.order.order_info (3rows for test), the query prepare run 350 ms, and execute query with 400ms, I think it is too slow , compare with direct use mysql jdbc client ( run with 2ms), I expect the mysql extension can run less than 100ms in small data(less than 1000rows), and speed up 10x in bigdata(more than 100000 rows),compare with mysql jdbc, how can I do for this?

To Reproduce

install mysql;
load mysql;

attack "......"
use mysqldb

select sum(amt) from mysqldb.order.order_info

OS:

mac

MySQL Version:

8.1

DuckDB Version:

0.10.1

DuckDB Client:

java

Full Name:

何俊然

Affiliation:

有信科技 youxin china

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@Mytherin
Copy link
Contributor

Thanks for the report! This extension does not push down aggregations into MySQL, meaning the table is fully loaded into DuckDB before processing the aggregation in DuckDB. The new mysql_query functionality allows you to directly run a query within MySQL, see #50. This is effectively what other connectors (e.g. the JDBC connector) are doing.

@junranhe
Copy link
Author

junranhe commented Mar 20, 2024

thanks for reply, I m chinese, my English is so poor, In my case, my data is so small ( 3 rows) and my query is so simple( 1 line), so I can not understand why my query preparestatment run 350ms (just prepare not execute), and load 3 rows data run 400ms? (I just want to build a remote server jdbc with duckdb, mysql_query just a simple function , can not setParameter in the query string, like: select * from tb where a = ? and b= ?; stat.setString(1, 'jr' ); stat.setInt(2, 100);)

@Mytherin
Copy link
Contributor

On running the first query the catalog information is loaded which could be where the time is going. Try running the query a second time perhaps?

@junranhe
Copy link
Author

I run the query many time, it execute 350ms + 400ms the same,

@Mytherin Mytherin reopened this Mar 20, 2024
@Mytherin
Copy link
Contributor

Hm, that's a bit excessive yes. Could you try this:

BEGIN TRANSACTION;
.timer on
select sum(amt) from mysqldb.order.order_info;
select sum(amt) from mysqldb.order.order_info;

@junranhe
Copy link
Author

I try 2 query in the TRANSACTION, the first query is slow: 350ms + 400ms, the second query is fast (prepare with 2ms, run with 20ms)

@junranhe
Copy link
Author

@Mytherin hello,can give me some advice? or this problem will fix in next duckdb_mysql version? then I close this issuses

@qsliu2017
Copy link
Contributor

qsliu2017 commented Apr 15, 2024

I encounter the same performance decrease issue on MySQL. TPC-H(sf=1) Query 17 in PG+DuckDB costs 2.15s while MySQL costs 30.37s! And filter_pushdown does not help.

I sample both scenarios and check their explain analyze, as the following attaches.

I notice that mysql_scan nodes have wrong EC=1, causing a different join order. What's more, postgres flamegraph has a PerfectHashJoinExecutor while mysql one does not. Might they be related?

MySQL + DuckDB Flamegraph

mysql+duckdb-perf

PG + DuckDB Flamegraph

pg+duckdb-perf

MySQL + DuckDB EXPLAIN
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
explain analyze SELECT     sum(l_extendedprice) / 7.0 AS avg_yearly FROM     lineitem,     part WHERE     p_partkey = l_partkey     AND p_brand = 'Brand#23'     AND p_container = 'MED BOX'     AND l_quantity < (         SELECT             0.2 * avg(l_quantity)         FROM             lineitem         WHERE             l_partkey = p_partkey);
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         Total Time: 30.37s        ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│      RESULT_COLLECTOR     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         avg_yearly        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             1             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          sum(#0)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             1             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│      L_EXTENDEDPRICE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│             #6            │
│             #7            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│(CAST(L_QUANTITY AS DOUBLE)│
│         < SUBQUERY)       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 1           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      RIGHT_DELIM_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           RIGHT           │
│ P_PARTKEY IS NOT DISTINCT │
│       FROM P_PARTKEY      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├───────────────────────────────────────────┐──────────────────────────────────────────────────────────────────────────────────────┐
│           EC: 1           │                                           │                                                                                      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                           │                                                                                      │
│             0             │                                           │                                                                                      │
│          (0.00s)          │                                           │                                                                                      │
└─────────────┬─────────────┘                                           │                                                                                      │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
│         HASH_JOIN         │                             │         HASH_JOIN         │                                                          │       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │                             │           RIGHT           │                                                          │             #0            │
│   P_PARTKEY = L_PARTKEY   │                             │ P_PARTKEY IS NOT DISTINCT │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │       FROM P_PARTKEY      │                                                          │            204            │
│           EC: 1           ├──────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├───────────────────────────────────────────┐              │          (0.00s)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │              │           EC: 1           │                                           │              │                           │
│            6088           │              │              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                           │              │                           │
│          (0.30s)          │              │              │            6088           │                                           │              │                           │
│                           │              │              │          (0.01s)          │                                           │              │                           │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘                                           │              └───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│           FILTER          ││        MYSQL_SCAN         ││         PROJECTION        │                             │         DUMMY_SCAN        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│((P_BRAND = 'Brand#23') AND││          lineitem         ││  (0.2 * avg(L_QUANTITY))  │                             │             0             │
│ (P_CONTAINER = 'MED BOX'))││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││         P_PARTKEY         │                             │          (0.00s)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││         L_PARTKEY         ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │                           │
│           EC: 1           ││         L_QUANTITY        ││            204            │                             │                           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││      L_EXTENDEDPRICE      ││          (0.00s)          │                             │                           │
│            204            ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │                             │                           │
│          (0.00s)          ││           EC: 1           ││                           │                             │                           │
│                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │                             │                           │
│                           ││          6001215          ││                           │                             │                           │
│                           ││          (1.28s)          ││                           │                             │                           │
└─────────────┬─────────────┘└───────────────────────────┘└─────────────┬─────────────┘                             └───────────────────────────┘
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│        MYSQL_SCAN         │                             │       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            part           │                             │             #0            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │          avg(#1)          │
│         P_PARTKEY         │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          P_BRAND          │                             │            204            │
│        P_CONTAINER        │                             │          (0.00s)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │                           │
│           EC: 1           │                             │                           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │                           │
│           200000          │                             │                           │
│          (0.03s)          │                             │                           │
└───────────────────────────┘                             └─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐
                                                          │         PROJECTION        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │         P_PARTKEY         │
                                                          │         L_QUANTITY        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │            6088           │
                                                          │          (0.00s)          │
                                                          └─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐
                                                          │         HASH_JOIN         │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │           INNER           │
                                                          │   L_PARTKEY = P_PARTKEY   │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├──────────────┐
                                                          │           EC: 1           │              │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
                                                          │            6088           │              │
                                                          │          (0.09s)          │              │
                                                          └─────────────┬─────────────┘              │
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │        MYSQL_SCAN         ││         DELIM_SCAN        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │          lineitem         ││             0             │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││          (0.00s)          │
                                                          │         L_PARTKEY         ││                           │
                                                          │         L_QUANTITY        ││                           │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                                                          │           EC: 1           ││                           │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                                                          │          6001215          ││                           │
                                                          │          (0.82s)          ││                           │
                                                          └───────────────────────────┘└───────────────────────────┘
PG + DuckDB EXPLAIN
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
explain analyze SELECT     sum(l_extendedprice) / 7.0 AS avg_yearly FROM     lineitem,     part WHERE     p_partkey = l_partkey     AND p_brand = 'Brand#23'     AND p_container = 'MED BOX'     AND l_quantity < (         SELECT             0.2 * avg(l_quantity)         FROM             lineitem         WHERE             l_partkey = p_partkey);
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         Total Time: 2.15s         ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│      RESULT_COLLECTOR     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         avg_yearly        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             1             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          sum(#0)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             1             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│      l_extendedprice      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│             #3            │
│             #4            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│(CAST(l_quantity AS DOUBLE)│
│         < SUBQUERY)       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        EC: 2531736        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            587            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      RIGHT_DELIM_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           RIGHT           │
│ p_partkey IS NOT DISTINCT │
│       FROM p_partkey      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├───────────────────────────────────────────┐──────────────────────────────────────────────────────────────────────────────────────┐
│        EC: 2531736        │                                           │                                                                                      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                           │                                                                                      │
│             0             │                                           │                                                                                      │
│          (0.10s)          │                                           │                                                                                      │
└─────────────┬─────────────┘                                           │                                                                                      │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
│         HASH_JOIN         │                             │         HASH_JOIN         │                                                          │       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │                             │           RIGHT           │                                                          │             #0            │
│   l_partkey = p_partkey   │                             │ p_partkey IS NOT DISTINCT │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │       FROM p_partkey      │                                                          │            204            │
│        EC: 1012694        ├──────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├───────────────────────────────────────────┐              │          (0.01s)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │              │        EC: 2531736        │                                           │              │                           │
│            6088           │              │              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                           │              │                           │
│          (0.23s)          │              │              │            6088           │                                           │              │                           │
│                           │              │              │          (0.03s)          │                                           │              │                           │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘                                           │              └───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│       POSTGRES_SCAN       ││           FILTER          ││         PROJECTION        │                             │         DUMMY_SCAN        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          lineitem         ││((p_brand = 'Brand#23') AND││  (0.2 * avg(l_quantity))  │                             │             0             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││ (p_container = 'MED BOX'))││         p_partkey         │                             │          (0.00s)          │
│         l_partkey         ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             │                           │
│         l_quantity        ││         EC: 30073         ││            204            │                             │                           │
│      l_extendedprice      ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││          (0.00s)          │                             │                           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││            204            ││                           │                             │                           │
│        EC: 5063472        ││          (0.00s)          ││                           │                             │                           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           ││                           │                             │                           │
│          6001215          ││                           ││                           │                             │                           │
│          (18.13s)         ││                           ││                           │                             │                           │
└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘                             └───────────────────────────┘
                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                             │       POSTGRES_SCAN       ││       HASH_GROUP_BY       │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │            part           ││             #0            │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││          avg(#1)          │
                             │         p_partkey         ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │          p_brand          ││            204            │
                             │        p_container        ││          (0.05s)          │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                             │         EC: 30073         ││                           │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                             │           200000          ││                           │
                             │          (0.00s)          ││                           │
                             └───────────────────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐
                                                          │         PROJECTION        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │         p_partkey         │
                                                          │         l_quantity        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │            6088           │
                                                          │          (0.00s)          │
                                                          └─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐
                                                          │         HASH_JOIN         │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │           INNER           │
                                                          │   l_partkey = p_partkey   │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├──────────────┐
                                                          │        EC: 5063472        │              │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
                                                          │            6088           │              │
                                                          │          (0.11s)          │              │
                                                          └─────────────┬─────────────┘              │
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │       POSTGRES_SCAN       ││         DELIM_SCAN        │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                                                          │          lineitem         ││             0             │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││          (0.00s)          │
                                                          │         l_partkey         ││                           │
                                                          │         l_quantity        ││                           │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                                                          │        EC: 5063472        ││                           │
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
                                                          │          6001215          ││                           │
                                                          │          (4.83s)          ││                           │
                                                          └───────────────────────────┘└───────────────────────────┘

@qsliu2017
Copy link
Contributor

I also set debug_show_queries to both and find that mysql scanner does not implement the parallel scan used in postgres

@laurentperez
Copy link

Hi

Is this issue still relevant for mysql ? meaning as per above comments, the table is fully loaded in a sequential scan.

postgres_scanner has a parallel ctid scan since postgres 14 https://github.com/duckdb/postgres_scanner/blame/03eaed75f0ec5500609b7a97aa05468493b229d1/src/postgres_scanner.cpp#L111

If we have a massive mysql table, say, 10M tuples 100 columns, what would be the preferred way to COPY its contents from a SELECT clause ? Shall we use a JDBC ResultSet instead of the C libmysql client and stream to a DuckDB Appender ?

ty

@dawsondong
Copy link

dawsondong commented Dec 16, 2024

According to my test and observation, this mysql extension did have obvious performance problem. First, comparing with JDBC clients to connection to mysql, it took at least 4 times more cost on most of big queries. It means if you can execute with 100ms on some mysql client, you have to take 400ms at least with duckdb mysql extension. Second, this extension seems not be able to take good use of streaming reading capability. Instead, it loads all of data from mysql database and then takes effect. So, it needs to take huge amount of memories for big data query, and you cannot get streaming response in case you need. Third, you cannot not count it as good as pg extension of duckdb, which owns binary reading and parallel reading, and result to an extraordinary performance of pg reading which even much better than all kinds of PG JDBC reading. Finally, as summary, I hope as long as this extension can act as good as common JDBC client, and behavior as streaming read way as pg extension, it will satisfy for most of users. Reference to issue Duckdb mysql extension took extreme longer time to access data than other components

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants